API Reference
Core Functions
Model Fitting
fit: Main function for fitting models to dataloglikelihood: Calculate log-likelihood for model parametersrun_mh: Run Metropolis-Hastings MCMC samplingrun_mcmc_parallel: Run parallel MCMC chains
Model Simulation
simulator: Simulate stochastic gene expression modelssimulate_trace: Generate intensity traces from modelssimulate_trace_data: Generate trace data with metadatasimulate_trace_vector: Generate vectors of tracessimulate_trials: Simulate multiple model realizations
Data Loading and Management
load_data: Load experimental data from filesload_model: Load model parameters from filesrna_setup: Set up project directory structurereadrates: Read rate parameters from filesread_run_spec,read_run_spec_for_rates_file,info_toml_path_for_rates_file: Load run specification from info TOML (see Run specification (info TOML))readfile: Read data files with error handling
Analysis and Visualization
write_traces: Generate model-predicted intensity traceswrite_ONOFFhistograms: Generate ON/OFF dwell time histogramswrite_residency_G_folder: Generate G state residency probabilitieswrite_histograms: Write RNA histogram predictionswrite_dataframes: Write results to CSV files
Comprehensive Function Documentation
Core Function Libraries
Utilities: Data processing, model construction, and utility functionsAnalysis: Post-fitting analysis, model comparison, and visualization functions
Concepts (coupled systems)
- Units and models: Units vs models, the unit–model map (
coupling[1]), and flat rate vector order
Data Types
RNA Data Structures
RNAData
struct RNAData{nType,hType} <: AbstractRNAData{hType}
label::String # Data set label
gene::String # Gene name (case sensitive)
nRNA::nType # Length of histogram
histRNA::hType # RNA histogram data
endA structure for storing steady-state RNA count distributions from techniques like smFISH or scRNA-seq.
Example:
# Create RNA data from histogram
data = RNAData(
"control", # label
"MYC", # gene
50, # nRNA
[10,20,30,25,15] # histRNA
)RNACountData
struct RNACountData <: AbstractRNAData{Vector{Int}}
label::String
gene::String
nRNA::Int
countsRNA::Vector{Int}
yieldfactor::Vector{Float64}
endA structure for storing individual RNA count measurements with yield factors.
Example:
# Create RNA count data with yield correction
data = RNACountData(
"single_cell", # label
"ACTB", # gene
100, # nRNA
[1,2,3,4,5], # countsRNA
[0.8,0.9,0.85] # yieldfactor
)RNAOnOffData
struct RNAOnOffData <: AbstractHistogramData
label::String
gene::String
nRNA::Int
histRNA::Vector
bins::Vector
ON::Vector # ON time probability density
OFF::Vector # OFF time probability density
endA structure for storing combined RNA count and ON/OFF state duration data.
Example:
# Create combined RNA and ON/OFF data
data = RNAOnOffData(
"live_cell", # label
"SOX2", # gene
30, # nRNA
[5,10,15,20], # histRNA
[0,1,2,3,4], # bins
[0.1,0.3,0.4,0.2], # ON
[0.2,0.4,0.3,0.1] # OFF
)RNADwellTimeData
struct RNADwellTimeData <: AbstractHistogramData
label::String
gene::String
nRNA::Int
histRNA::Array
bins::Vector{Vector}
DwellTimes::Vector{Vector}
DTtypes::Vector
endA structure for storing RNA counts with dwell time distributions.
Trace Data Structures
TraceData
struct TraceData{labelType,geneType,traceType} <: AbstractTraceData
label::labelType # Data set label
gene::geneType # Gene name
interval::Float64 # Time interval between trace points
trace::traceType # Trace data
endA structure for storing fluorescence intensity time series data.
Example:
# Create trace data
traces = [[1.0, 2.0, 3.0], [2.0, 3.0, 4.0]]
data = TraceData(
"live_imaging", # label
"MYC", # gene
1.0, # interval (minutes)
traces # trace data
)TraceRNAData
struct TraceRNAData{traceType,hType} <: AbstractTraceHistogramData
label::String
gene::String
interval::Float64
trace::traceType
nRNA::Int
histRNA::hType
endA structure for storing both trace and RNA histogram data.
DwellTimeData
struct DwellTimeData <: AbstractHistogramData
label::String
gene::String
bins::Vector
DwellTimes::Vector
DTtypes::Vector
endA structure for storing dwell time distributions only.
Model Types
Gene Model (GM)
GMmodel
struct GMmodel{RateType,PriorType,ProposalType,ParamType,MethodType,ComponentType,ReporterType} <: AbstractGMmodel
rates::RateType # Transition rates
Gtransitions::Tuple # G state transitions
G::Int # Number of G states
nalleles::Int # Number of alleles
rateprior::PriorType # Rate prior distributions
proposal::ProposalType # MCMC proposal distribution
fittedparam::ParamType # Fitted parameter indices
fixedeffects::Tuple # Fixed effects specification
method::MethodType # Solution method
components::ComponentType # Model components
reporter::ReporterType # Reporter configuration
endA structure for Gene (G) models with arbitrary numbers of gene states.
Example:
# Create a simple two-state gene model
model = GMmodel(
rates = [0.1, 0.2], # G1->G2, G2->G1
Gtransitions = ([1,2], [2,1]), # State transitions
G = 2, # Two gene states
nalleles = 2, # Diploid
# ... other parameters
)Gene-Reporter-Splice Model (GRSM)
GRSMmodel
struct GRSMmodel{TraitType,RateType,nratesType,GType,PriorType,ProposalType,ParamType,MethodType,ComponentType,ReporterType} <: AbstractGRSMmodel{TraitType}
trait::TraitType # Model traits (hierarchical, coupling, etc.)
rates::RateType # Transition rates
transforms::Transformation # Rate transformations
nrates::nratesType # Number of rates
Gtransitions::Tuple # G state transitions
G::GType # Number of G states
R::GType # Number of R (pre-RNA) steps
S::GType # Number of S (splice) sites
insertstep::GType # Reporter insertion step
nalleles::Int # Number of alleles
splicetype::String # Splicing type
rateprior::PriorType # Rate prior distributions
proposal::ProposalType # MCMC proposal distribution
fittedparam::ParamType # Fitted parameter indices
fixedeffects::Tuple # Fixed effects specification
method::MethodType # Solution method
components::ComponentType # Model components
reporter::ReporterType # Reporter configuration
endA comprehensive structure for Gene-Reporter-Splice models with arbitrary numbers of gene states (G), pre-RNA steps (R), and splice sites (S).
Example:
# Create a GRS model with 2 gene states, 3 pre-RNA steps, 2 splice sites
model = GRSMmodel(
rates = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6], # G, R, S transitions
Gtransitions = ([1,2], [2,1]), # G state transitions
G = 2, # Two gene states
R = 3, # Three pre-RNA steps
S = 2, # Two splice sites
insertstep = 1, # Reporter visible from step 1
nalleles = 2, # Diploid
splicetype = "offeject", # Splicing type
# ... other parameters
)Supporting Types
HMMReporter
struct HMMReporter
n::Int # Number of noise parameters
per_state::Vector # Reporters per state
probfn::Function # Noise distribution function
weightind::Int # Mixture weight index
offstates::Vector{Int} # Off states
noiseparams::Vector{Int} # Noise parameter indices
endA structure for configuring reporter properties in Hidden Markov Models.
Transformation
struct Transformation
f::Vector{Function} # Forward transformations
f_inv::Vector{Function} # Inverse transformations
f_cv::Vector{Function} # CV transformations
endA structure for parameter transformations during fitting.
Utility Functions
Data Processing
normalize_histogram: Normalize probability distributionsmake_array: Convert data to arraysmake_mat: Convert data to matricesdigit_vector: Convert numbers to digit vectors
Model Construction
prepare_rates: Prepare rate parameters for fittingget_rates: Extract rates from fitted parametersget_param: Extract specific parametersnum_rates: Count number of rates in modelnum_all_parameters: Count total parameters
Statistical Functions
prob_Gaussian: Gaussian probability densityprob_Gaussian_grid: Gaussian probability on gridmean_elongationtime: Calculate mean elongation timeon_states: Identify transcriptionally active statessource_states: Identify source states for transitions
File I/O
folder_path: Construct folder pathsfolder_setup: Set up directory structuredatapdf: Generate data PDFsmake_dataframes: Create DataFrames from resultswrite_dataframes_only: Write DataFrames to files
Cluster and combined-rate batch workflows
Narrative guide (when to use which function, file layout, order of operations): Cluster and batch workflows.
Relevant APIs include makeswarm, makeswarm_models, makeswarmfiles, makeswarmfiles_h3_latent, write_run_spec_preset, normalize_trace_specs_legacy_t_end!, create_combined_file, create_combined_file_mult, combined_rates_key, read_combined_file_specs_csv, create_combined_files_driver, create_combined_files, create_combined_files_h3_latent, read_rates_table, write_rates_table.
Analysis Functions
Model Diagnostics
large_deviance: Identify chains with large deviancelarge_rhat: Identify parameters with large R-hatassemble_measures_model: Assemble model measuresassemble_all: Assemble all results
Post-fitting Analysis
predictedarray: Generate model predictionspredictedfn: Generate prediction functionsmake_traces: Generate trace predictionsmake_traces_dataframe: Convert traces to DataFrames
Visualization Support
write_cov: Write covariance matriceswrite_augmented: Write augmented resultswrite_winners: Write best-fit results
Hierarchical and Coupling Models
Hierarchical Models
For hierarchical models, use the hierarchical parameter in fit():
# Fit hierarchical model
fits = fit(
hierarchical = (2, [1,2]), # 2 hyperparameter sets, fit rates 1,2
# ... other parameters
)Coupled Models
For coupled transcriptional units, use tuples for model parameters:
# Fit coupled model with different G states
fits = fit(
G = (2, 3), # Unit 1: 2 states, Unit 2: 3 states
transitions = (([1,2], [2,1]), ([1,2], [2,3], [3,1])),
coupling = ((1, 2), [(1, 3, 2, 4)]), # (unit_model, connections); each connection (β, s, α, t)
# ... other parameters
)Model Components
Gene States (G)
- Arbitrary number of gene states
- User-specified transitions between states
- One active state for transcription initiation
- Support for multiple alleles
Pre-RNA Steps (R)
- Irreversible forward transitions
- mRNA ejection from final R step
- Optional reporter insertion at any step
- Support for elongation dynamics
Splicing (S)
- Up to R splice sites
- PreRNA with or without spliced intron
- Multiple configurations per R step
- Support for different splicing types:
"": No splicing"offeject": Splice then eject"offdecay": Splice then decay
Supported Data Types
The package handles multiple experimental data types:
mRNA Count Distributions
- Single molecule FISH (smFISH)
- Single cell RNA sequencing (scRNA-seq)
- Bulk RNA measurements
Intensity Traces
- Live cell fluorescence microscopy
- Time-lapse imaging
- Multiple reporter constructs
Dwell Time Distributions
- ON/OFF state durations
- Transcriptional burst analysis
- Reporter visibility periods
Combined Data Types
- RNA + trace data
- RNA + dwell time data
- Multi-modal experiments
Error Handling
The package includes comprehensive error handling:
- Data validation: Checks for consistent data formats
- Parameter validation: Ensures valid model specifications
- Convergence monitoring: Tracks MCMC convergence
- Memory management: Handles large datasets efficiently
Performance Considerations
- Parallel processing: Use
nchains > 1for parallel MCMC - Memory usage: Large datasets may require cluster computing
- Convergence: Monitor R-hat values for chain convergence
- Optimization: Use appropriate priors for faster convergence