fit Function
The full keyword list, coupling description, key-based loading, and return values are maintained in the in-source docstring for fit. In the Julia REPL, use ?fit after using StochasticGene. This page is a short summary; details may lag the code.
Fit steady state or transient GM/GRSM model to RNA data for a single gene, write the result (through function finalize), and return fit results and diagnostics.
For coupled transcribing units, arguments transitions, G, R, S, insertstep, and trace become tuples of the single unit type, e.g. If two types of transcription models are desired with G=2 and G=3 then G = (2,3).
Syntax
fits = fit(; kwargs...)Arguments
Basic Model Parameters
G::Int = 2: Number of gene statesR::Int = 0: Number of pre-RNA stepsS::Int = 0: Number of splice sites (must be ≤ R - insertstep + 1)insertstep::Int = 1: Reporter insertion step (must be ≥ 1; ignored when R = 0)transitions::Tuple: Tuple of vectors specifying state transitions
Data Parameters
datatype::String = "rna": Data type:- "rna": mRNA count distributions
- "trace": Intensity traces
- "rnadwelltime": Combined RNA and dwell time data
- "tracejoint": Simultaneous recorded traces
datapath::String = "": Path to data file or folderdatacond::String = "": Data condition identifiercell::String = "": Cell typegene::String = "MYC": Gene namenalleles::Int = 1: Number of alleles
Fitting Parameters
nchains::Int = 2: Number of MCMC chainsmaxtime::Float64 = 60: Maximum wall time (minutes)samplesteps::Int = 1000000: Number of MCMC sampling stepswarmupsteps::Int = 0: Number of warmup stepspropcv::Float64 = 0.01: Proposal distribution coefficient of variationannealsteps::Int = 0: Number of annealing stepstemp::Float64 = 1.0: MCMC temperature
Prior Parameters
priormean::Vector{Float64} = Float64[]: Mean rates for prior distributionpriorcv::Float64 = 10.0: Prior distribution coefficient of variationnoisepriors::Vector = []: Observation noise priorsfittedparams::Vector{Int} = Int[]: Indices of rates to fitfixedeffects::Tuple = tuple(): Fixed effects specification
Trace Parameters (for trace data)
traceinfo::Tuple = (1.0, 1., -1, 1.): Trace parameters:- Frame interval (minutes)
- Starting frame time (minutes)
- Ending frame time (-1 for last)
- Fraction of active traces
datacol::Int = 3: Data column indexprobfn::Function = prob_Gaussian: Observation probability functionnoiseparams::Int = 4: Number of noise parameterszeromedian::Bool = true: Subtract median from traces
Output Parameters
resultfolder::String = "test": Results output folderlabel::String = "": Output file labelinfolder::String = "": Folder for initial parametersinlabel::String = "": Label of initial parameter fileswritesamples::Bool = false: Write MCMC samples
Run specification and key-based naming
key = nothing: When nothing, fit uses the keyword arguments you pass (and defaults). When a string (e.g.key = "33il"), fit looks forinfo_<key>.tomlin the results folder; if found, loads that spec (kwargs override spec); if not found, uses kwargs and defaults. All outputs use that stem (e.g.rates_<key>.txt,info_<key>.toml). See Run specification (info TOML).
Returns
fits: MCMC fit results containing:- Posterior samples
- Log-likelihoods
- Acceptance rates
Examples
Basic RNA Histogram Fit
fits = fit(
G = 2,
R = 0,
transitions = ([1,2], [2,1]),
datatype = "rna",
datapath = "data/HCT116_testdata/",
gene = "MYC",
datacond = "MOCK"
)Trace Data Fit
fits = fit(
G = 3,
R = 2,
S = 2,
insertstep = 1,
transitions = ([1,2], [2,1], [2,3], [3,1]),
datatype = "trace",
datapath = "data/testtraces",
cell = "TEST",
gene = "test",
datacond = "testtrace",
traceinfo = (1.0, 1., -1, 1.),
noisepriors = [40., 20., 200., 10.],
nchains = 4
)Notes
Rate Order
- G transitions
- R transitions
- S transitions
- Decay
- Noise parameters
MCMC Convergence
- R-hat should be close to 1 (ideally < 1.05)
- Increase
maxtimeorsamplestepsif R-hat is high - Use
warmupstepsto improve proposal distribution