RNA Histogram Analysis
This example demonstrates how to analyze RNA count distributions (histograms) from single-molecule FISH (smFISH) or single-cell RNA sequencing (scRNA-seq) data.
Setup
First, let's set up our project directory and load the package:
using StochasticGene
# Create project directory
mkdir("histogram_example")
cd("histogram_example")
# Generate example data using test_fit_simrna
fitted_rates, target_rates = test_fit_simrna(
rtarget=[0.33, 0.19, 2.5, 1.0], # Target rates for simulation
transitions=([1, 2], [2, 1]), # Simple two-state model
G=2, # 2 gene states
nRNA=100, # Number of RNA molecules
nalleles=2, # Number of alleles
fittedparam=[1, 2, 3], # Parameters to fit
nchains=1 # Single MCMC chain for example
)
# Print results
println("Fitted rates: ", fitted_rates)
println("Target rates: ", target_rates)
Data Preparation
Place your RNA count data in the data/
directory. The data should be organized as follows:
data/
├── gene_name/
│ ├── condition1/
│ │ ├── counts.csv
│ │ └── metadata.csv
│ └── condition2/
│ ├── counts.csv
│ └── metadata.csv
The counts.csv
file should contain:
- Cell IDs
- RNA counts per cell
- Optional: Additional cell metadata
Model Definition
We'll fit a model with:
- 2 gene states (G=2)
- No pre-RNA steps (R=0)
- Simple transitions between states
# Define model parameters
G = 2 # Number of gene states
R = 0 # Number of pre-RNA steps
transitions = ([1,2], [2,1]) # Gene state transitions
Fitting the Model
Now we can fit the model to our RNA count data:
# Fit the model
fits, stats, measures, data, model, options = fit(
G = G,
R = R,
transitions = transitions,
datatype = "rna",
datapath = "data/",
gene = "MYC",
datacond = "CONTROL"
)
Analyzing Results
Basic Analysis
# Print basic statistics
println(stats)
# Plot the results
using Plots
plot(fits)
# Save results
save_results(fits, "results/")
Histogram Analysis
# Generate and plot histograms
plot_histograms(fits, "results/histograms/")
# Calculate histogram statistics
hist_stats = analyze_histograms(fits)
println(hist_stats)
Burst Analysis
# Analyze transcriptional bursts
burst_stats = analyze_bursts(fits)
println(burst_stats)
# Plot burst statistics
plot_bursts(fits, "results/bursts/")
Advanced Analysis
Multiple Conditions
# Compare different conditions
conditions = ["CONTROL", "TREATMENT"]
fits_list = []
for cond in conditions
fits, stats, measures, data, model, options = fit(
G = G,
R = R,
transitions = transitions,
datatype = "rna",
datapath = "data/",
gene = "MYC",
datacond = cond
)
push!(fits_list, fits)
end
# Compare conditions
compare_conditions(fits_list, conditions, "results/comparison/")
Model Comparison
# Fit different models
models = [
(G=2, R=0), # Basic telegraph
(G=3, R=0), # Three-state
(G=2, R=1) # With pre-RNA
]
model_fits = []
for (G, R) in models
fits, stats, measures, data, model, options = fit(
G = G,
R = R,
transitions = ([1,2], [2,1]),
datatype = "rna",
datapath = "data/",
gene = "MYC",
datacond = "CONTROL"
)
push!(model_fits, (fits, stats))
end
# Compare models
compare_models(model_fits, models, "results/model_comparison/")
Best Practices
Data Quality
- Check for outliers
- Verify count normalization
- Consider batch effects
Model Selection
- Start with simple models
- Use model selection criteria
- Validate assumptions
Analysis Pipeline
- Document analysis steps
- Save intermediate results
- Use consistent naming conventions
Common Issues and Solutions
Zero-Inflation
# Handle zero-inflation
fits = fit(
G = G,
R = R,
transitions = transitions,
datatype = "rna",
datapath = "data/",
gene = "MYC",
datacond = "CONTROL",
zero_inflation = true
)
Batch Effects
# Account for batch effects
fits = fit(
G = G,
R = R,
transitions = transitions,
datatype = "rna",
datapath = "data/",
gene = "MYC",
datacond = "CONTROL",
batch_correction = true
)
Next Steps
- Try different model configurations
- Experiment with more complex models
- Compare results across different genes
For more advanced examples, see: