Utility Functions
Collection of utility functions for data processing, model construction, and analysis.
Data Processing Functions
normalize_histogram
normalize_histogram(histogram::Vector{Float64}) -> Vector{Float64}Normalize a histogram to create a probability distribution.
Arguments:
histogram: Vector of histogram counts
Returns:
- Normalized probability distribution (sums to 1.0)
Example:
# Normalize RNA count histogram
raw_counts = [100, 150, 200, 120, 80, 50]
prob_dist = normalize_histogram(raw_counts)
println("Sum: ", sum(prob_dist)) # Should be 1.0make_array
make_array(data) -> ArrayConvert various data types to arrays for processing.
Arguments:
data: Data to convert (Vector, DataFrame, etc.)
Returns:
- Array representation of the data
Example:
# Convert DataFrame column to array
df = DataFrame(counts = [1, 2, 3, 4, 5])
arr = make_array(df.counts)make_mat
make_mat(data) -> MatrixConvert data to matrix format for analysis.
Arguments:
data: Data to convert
Returns:
- Matrix representation of the data
Example:
# Convert vector of vectors to matrix
traces = [[1.0, 2.0, 3.0], [2.0, 3.0, 4.0]]
mat = make_mat(traces)digit_vector
digit_vector(number::Int, base::Int=10) -> Vector{Int}Convert a number to a vector of digits in specified base.
Arguments:
number: Number to convertbase: Base for conversion (default: 10)
Returns:
- Vector of digits
Example:
# Convert number to digits
digits = digit_vector(12345) # [1, 2, 3, 4, 5]
binary = digit_vector(10, 2) # [1, 0, 1, 0]Model Construction Functions
prepare_rates
prepare_rates(rates::Vector{Float64}, model::AbstractModel) -> Vector{Float64}Prepare rate parameters for model fitting by applying transformations.
Arguments:
rates: Raw rate parametersmodel: Model structure
Returns:
- Transformed rates ready for fitting
Example:
# Prepare rates for fitting
raw_rates = [0.1, 0.2, 0.3]
prepared = prepare_rates(raw_rates, model)get_rates
get_rates(parameters::Vector{Float64}, model::AbstractModel, log_scale::Bool=true) -> Vector{Float64}Extract rate parameters from fitted parameters.
Arguments:
parameters: Fitted parameter vectormodel: Model structurelog_scale: Whether parameters are in log scale
Returns:
- Rate parameters
Example:
# Extract rates from fitted parameters
fitted_params = [log(0.1), log(0.2), log(0.3)]
rates = get_rates(fitted_params, model, true)get_param
get_param(parameters::Vector{Float64}, model::AbstractModel, param_type::String) -> Vector{Float64}Extract specific parameter types from fitted parameters.
Arguments:
parameters: Fitted parameter vectormodel: Model structureparam_type: Type of parameters to extract ("rates", "noise", "coupling")
Returns:
- Requested parameters
Example:
# Extract noise parameters
noise_params = get_param(fitted_params, model, "noise")num_rates
num_rates(transitions::Tuple, R::Int, S::Int, insertstep::Int) -> IntCount the number of transition rates in a model.
Arguments:
transitions: Model transitionsR: Number of pre-RNA stepsS: Number of splice sitesinsertstep: Reporter insertion step
Returns:
- Total number of rates
Example:
# Count rates for a GRS model
n_rates = num_rates(([1,2], [2,1]), 3, 2, 1)numallparameters
num_all_parameters(transitions::Tuple, R::Int, S::Int, insertstep::Int,
reporter, coupling::Tuple, grid) -> IntCount total parameters including rates, noise, coupling, and grid parameters.
Arguments:
transitions: Model transitionsR,S,insertstep: Model structurereporter: Reporter configurationcoupling: Coupling specificationgrid: Grid specification
Returns:
- Total parameter count
Example:
# Count all parameters
n_total = num_all_parameters(transitions, 2, 1, 1, reporter, coupling, grid)Statistical Functions
prob_Gaussian
prob_Gaussian(y::Float64, μ::Float64, σ::Float64) -> Float64Calculate Gaussian probability density.
Arguments:
y: Observed valueμ: Meanσ: Standard deviation
Returns:
- Probability density
Example:
# Calculate Gaussian probability
prob = prob_Gaussian(2.0, 1.5, 0.5)probGaussiangrid
prob_Gaussian_grid(y::Vector{Float64}, μ::Vector{Float64}, σ::Vector{Float64}) -> Vector{Float64}Calculate Gaussian probabilities on a grid.
Arguments:
y: Observed valuesμ: Mean valuesσ: Standard deviations
Returns:
- Vector of probabilities
Example:
# Calculate probabilities on grid
y_vals = [1.0, 2.0, 3.0]
mu_vals = [1.1, 2.1, 2.9]
sigma_vals = [0.2, 0.3, 0.4]
probs = prob_Gaussian_grid(y_vals, mu_vals, sigma_vals)mean_elongationtime
mean_elongationtime(rates::Vector{Float64}, R::Int) -> Float64Calculate mean elongation time for pre-RNA steps.
Arguments:
rates: Rate parametersR: Number of pre-RNA steps
Returns:
- Mean elongation time
Example:
# Calculate elongation time
rates = [0.1, 0.2, 0.3, 0.4]
elong_time = mean_elongationtime(rates, 2)on_states
on_states(G::Int, R::Int, S::Int, insertstep::Int) -> Vector{Int}Identify transcriptionally active states.
Arguments:
G: Number of gene statesR: Number of pre-RNA stepsS: Number of splice sitesinsertstep: Reporter insertion step
Returns:
- Vector of active state indices
Example:
# Find active states
active = on_states(2, 3, 1, 1)source_states
source_states(transitions::Tuple) -> Vector{Int}Identify source states for transitions.
Arguments:
transitions: Model transitions
Returns:
- Vector of source state indices
Example:
# Find source states
sources = source_states(([1,2], [2,1]))File I/O Functions
folder_path
folder_path(folder::String, root::String=".", subfolder::String=""; make::Bool=false) -> StringConstruct folder paths with optional creation.
Arguments:
folder: Folder nameroot: Root directorysubfolder: Subfolder namemake: Whether to create the folder
Returns:
- Full folder path
Example:
# Create results folder path
results_path = folder_path("results", ".", "analysis", make=true)folder_setup
folder_setup(base_path::String, folders::Vector{String}) -> NothingSet up directory structure for analysis.
Arguments:
base_path: Base directoryfolders: Vector of folder names to create
Example:
# Set up analysis folders
folder_setup("project", ["data", "results", "figures"])datapdf
datapdf(data::AbstractExperimentalData, filename::String) -> NothingGenerate PDF summary of data.
Arguments:
data: Data structurefilename: Output filename
Example:
# Generate data summary PDF
datapdf(rna_data, "data_summary.pdf")make_dataframes
make_dataframes(results, model::AbstractModel) -> DataFrameCreate DataFrames from analysis results.
Arguments:
results: Analysis resultsmodel: Model structure
Returns:
- DataFrame with results
Example:
# Convert results to DataFrame
df = make_dataframes(fit_results, model)writedataframesonly
write_dataframes_only(dataframes::Vector{DataFrame}, folder::String, label::String) -> NothingWrite DataFrames to CSV files.
Arguments:
dataframes: Vector of DataFramesfolder: Output folderlabel: File label
Example:
# Write results to CSV
write_dataframes_only([df1, df2], "results", "analysis")Advanced Utility Functions
zero_median
zero_median(traces::Vector{Vector{Float64}}, zero::Bool) -> Tuple{Vector{Vector{Float64}}, Vector{Float64}}Zero-center traces by subtracting median values.
Arguments:
traces: Vector of trace vectorszero: Whether to zero-center
Returns:
- Tuple of (zero-centered traces, scale factors)
Example:
# Zero-center traces
centered_traces, scales = zero_median(traces, true)fix
fix(parameters::Vector{Float64}, fixedeffects::Tuple) -> Vector{Float64}Apply fixed effects to parameters.
Arguments:
parameters: Parameter vectorfixedeffects: Fixed effects specification
Returns:
- Parameters with fixed effects applied
Example:
# Apply fixed effects
fixed_params = fix(parameters, fixedeffects)fix_filenames
fix_filenames(folder::String, pattern::String, replacement::String) -> NothingFix filenames in a folder by pattern replacement.
Arguments:
folder: Folder pathpattern: Pattern to replacereplacement: Replacement string
Example:
# Fix filenames
fix_filenames("data", "old_prefix", "new_prefix")Model Analysis Functions
large_deviance
large_deviance(chains::Vector, threshold::Float64=1000.0) -> Vector{Int}Identify chains with unusually large deviance.
Arguments:
chains: MCMC chainsthreshold: Deviance threshold
Returns:
- Indices of chains with large deviance
Example:
# Find problematic chains
bad_chains = large_deviance(mcmc_chains, 500.0)large_rhat
large_rhat(stats, threshold::Float64=1.1) -> Vector{Int}Identify parameters with large R-hat values.
Arguments:
stats: MCMC statisticsthreshold: R-hat threshold
Returns:
- Indices of parameters with large R-hat
Example:
# Find non-converged parameters
unconverged = large_rhat(mcmc_stats, 1.05)assemblemeasuresmodel
assemble_measures_model(measures::Vector, model::AbstractModel) -> DictAssemble model measures for analysis.
Arguments:
measures: Vector of measure objectsmodel: Model structure
Returns:
- Dictionary of assembled measures
Example:
# Assemble measures
assembled = assemble_measures_model(measures, model)assemble_all
assemble_all(fits, stats, measures, model::AbstractModel) -> DictAssemble all analysis results.
Arguments:
fits: Fit resultsstats: Statisticsmeasures: Measuresmodel: Model structure
Returns:
- Dictionary of all results
Example:
# Assemble all results
all_results = assemble_all(fits, stats, measures, model)Performance Utilities
set_indices
set_indices(model::AbstractModel) -> Vector{Int}Set parameter indices for efficient computation.
Arguments:
model: Model structure
Returns:
- Vector of parameter indices
T_dimension
T_dimension(model::AbstractModel) -> IntCalculate transition matrix dimension.
Arguments:
model: Model structure
Returns:
- Matrix dimension
sparse
sparse(data::Matrix) -> SparseMatrixCSCConvert matrix to sparse format for efficiency.
Arguments:
data: Dense matrix
Returns:
- Sparse matrix
Usage Examples
Complete Analysis Pipeline
using StochasticGene
# Load and process data
data = load_data("rna", String[], "data/", "exp1", "MYC", "ctrl", (), 1)
normalized = normalize_histogram(data.histRNA)
# Set up model
rates = prepare_rates([0.1, 0.2], model)
n_params = num_all_parameters(transitions, 0, 0, 1, reporter, (), nothing)
# Analyze results
active_states = on_states(2, 0, 0, 1)
elong_time = mean_elongationtime(rates, 0)
# Output results
folder_setup("analysis", ["results", "figures"])
results_path = folder_path("results", "analysis", make=true)Parameter Extraction
# Extract different parameter types
all_rates = get_rates(fitted_params, model, true)
noise_params = get_param(fitted_params, model, "noise")
coupling_params = get_param(fitted_params, model, "coupling")
# Count parameters
n_rates = num_rates(transitions, 2, 1, 1)
n_total = num_all_parameters(transitions, 2, 1, 1, reporter, coupling, grid)Data Processing Pipeline
# Process trace data
traces, scales = zero_median(raw_traces, true)
trace_matrix = make_mat(traces)
normalized_traces = normalize_histogram.(traces)
# Statistical analysis
probs = [prob_Gaussian(t, μ, σ) for t in trace_values]
active = on_states(G, R, S, insertstep)See Also
fit: Main fitting functionsimulator: Model simulationload_data: Data loadingwrite_traces: Trace generation