Getting Started

Folder Structure

StochasticGene expects a specific directory structure for data and results:

project_root/
├── data/
│   ├── alleles/      # Contains allele count information
│   └── halflives/    # Contains mRNA half-life information
└── results/          # Output directory for analysis results

Setting Up

Use the built-in helper to set up a new project folder with the required data structure and reference files:

using StochasticGene
rna_setup()           # Sets up folders in the current directory
# or, to set up in a subfolder:
rna_setup("scRNA")    # Sets up folders in ./scRNA
cd("scRNA")           # Move into the project directory (if you want to work there)

This will create the necessary folders and download reference CSV files for alleles and halflives. It does not generate arbitrary mock data, but provides the standard structure and example reference files needed to get started.

Data Preparation

RNA Count Data

RNA count data should be provided as a text file with the following format:

# gene_condition.txt
count1
count2
count3
...

The data will be loaded into an RNAData structure with the following fields:

histRNA: Vector of RNA counts
gene: Gene name
condition: Experimental condition
nalleles: Number of alleles

Live Cell Imaging Data

For live cell imaging data, provide a text file with time series data:

# gene_condition.txt
time1 intensity1
time2 intensity2
time3 intensity3
...

Fitting a Model

Basic Two-State Model

Fit a simple two-state model to RNA count data:

fits = fit(
    G = 2,                    # Two gene states
    R = 0,                    # No pre-RNA steps
    transitions = ([1,2], [2,1]),  # Transitions between states
    datatype = "rna",         # RNA count data
    datapath = "data/HCT116_testdata/",
    gene = "MYC",            # Gene name
    datacond = "MOCK"        # Experimental condition
)

Model with Pre-RNA Steps

Fit a model with pre-RNA processing steps:

fits = fit(
    G = 2,                    # Two gene states
    R = 3,                    # Three pre-RNA steps
    S = 2,                    # Two splice sites
    insertstep = 1,           # Reporter insertion at step 1
    transitions = ([1,2], [2,1]),  # Gene state transitions
    datatype = "trace",       # Live cell imaging data
    datapath = "data/testtraces",
    gene = "MS2",            # Gene name
    datacond = "testtrace",   # Experimental condition
    traceinfo = (1.0, 1., -1, 1.),  # Frame interval, start, end, active fraction
    noisepriors = [40., 20., 200., 10.],  # Noise parameters
    nchains = 4              # Number of MCMC chains
)

Model Components

Gene States (G)

Arbitrary number of gene states
User-specified transitions between states
One active state for transcription initiation
Example transitions:
- Two-state: ([1,2], [2,1]) (telegraph model)
- Three-state: ([1,2], [2,1], [2,3], [3,1]) (cyclic model)

Pre-RNA Steps (R)

Irreversible forward transitions
mRNA ejection from final R step
Optional reporter insertion step
Example: R=3 with reporter at step 1:
```
R = 3
insertstep = 1
```

Splicing (S)

Up to R splice sites
PreRNA with or without spliced intron
Multiple configurations per R step
Example: S=2 with R=3:
```
R = 3
S = 2
```

Model Types

StochasticGene supports several model types:

G Models (Telegraph Models)

Basic two-state model
Multiple gene states

Example:

G = 2
R = 0
transitions = ([1,2], [2,1])

GR Models (with Pre-RNA Steps)

Gene states plus pre-RNA processing

Example:

G = 2
R = 3
transitions = ([1,2], [2,1])

GRS Models (with Splicing)

Gene states, pre-RNA steps, and splicing
Example:
```
G = 2
R = 3
S = 2
insertstep = 1
```

Coupled Models (Multiple Alleles)

Multiple alleles with coupling

Example:

G = (2,2)  # Two alleles, each with 2 states
R = (3,3)  # Three pre-RNA steps for each
coupling = (
    (1,1),  # Model indices
    ([2], [1]),  # Source units
    (["G2"], ["G1"]),  # Source states
    ([2], [1])  # Target transitions
)

Data Types

The package can handle:

RNA Count Data

Single molecule FISH (smFISH)
Single cell RNA sequencing (scRNA-seq)

Example:

datatype = "rna"
datapath = "data/rna_counts/"

Live Cell Imaging

MS2 reporter data
PP7 reporter data

Example:

datatype = "trace"
datapath = "data/ms2_traces/"
traceinfo = (1.0, 1., -1, 1.)  # Frame interval, start, end, active fraction

Dwell Time Analysis

ON/OFF state durations

Example:

datatype = "dwelltime"
datapath = "data/dwell_times/"

Combined Data

RNA counts with dwell times

Example:

datatype = "rnadwelltime"
datapath = ["data/rna_counts/", "data/dwell_times/"]

Next Steps

Check the Examples section for more complex usage scenarios
Read the API Reference for detailed function documentation
Join the GitHub discussions for community support