Cluster and batch workflows
This chapter is for everyone who installs StochasticGene.jl (Pkg.add("StochasticGene")) and needs to:
- run many
fitjobs on a cluster (including NIH Biowulf) using swarm files andmakeswarm; - follow the recommended coupled-model workflow: fit individual units first, then merge those fitted rates into one initial rate file for the coupled model.
The implementations live in biowulf.jl (swarms, run-spec presets) and io.jl (merging rate tables). Function signatures and defaults are in the docstrings; this page is the narrative guide published with the GitHub-hosted documentation.
Coupled models: recommended workflow (single units → merge → coupled fit)
For coupled transcriptional units (e.g. enhancer + gene, with or without a hidden unit), the suggested workflow is:
Fit each unit as its own single-unit model Run separate fits (e.g. enhancer-only and gene-only traces or histograms) so each produces a standard MCMC
rates_*.txt(rows = posterior samples, columns = rate headers for that unit). Use normalfitcalls or batch them withmakeswarm_models/makeswarmfilesin single-unit mode.Merge the fitted rates into one wide table Stack the columns from the two (or more) unit files and append coupling placeholder columns using
create_combined_file(two units) orcreate_combined_file_mult(more than two). You chooseNenh/Ngene(or per-unit column counts) to match how each set of rates is laid out in your files (see docstrings). For many keys (e.g. from a CSV of model names), usecreate_combined_filesorcreate_combined_files_driver, which callcreate_combined_fileonce per key and name outputs withcombined_rates_key.Run the coupled fit using the combined file as the starting rates Point
infolder/inlabel(or your run spec) at that merged file so the coupled MCMC warm-starts from the stacked single-unit posteriors. The coupledfituses tupleG,R,coupling, joint datatype (e.g.tracejoint), etc. Coupling strengths are then estimated in the coupled run (the placeholder columns from step 2 get updated).Optional: batch everything on the cluster Use
makeswarmormakeswarmfilesso each job runsfit(; key=..., ...)from prewritteninfo_<key>specs (see Run specification (info TOML)).
This order—individual fits → merge → coupled fit—is the standard way to get a sensible initial combined rate file for coupled models without fitting all parameters cold.
NIH Biowulf: using makeswarm
makeswarm does not submit jobs to the scheduler by itself. It writes files you submit with Biowulf’s swarm (or your own sbatch wrappers):
<swarmfile>.swarm— one command line per run key (each line runsjuliawith your project and one fit script).fitscript_<key>.jlper key — typically callsfit(; key="<key>", ...)with shared options (resultfolder,maxtime,samplesteps, etc.).
Typical use on Biowulf
- Install StochasticGene in your Julia environment (see Installation, including Biowulf Installation).
- From Julia (interactive session or batch script), run something like:
using StochasticGene
makeswarm(
["runA", "runB"]; # keys; must match info_<key> / rates_<key> naming you use
filedir = "my_swarm", # directory where .swarm and .jl files are written
resultfolder = "my_results",
root = ".",
project = "/path/to/your/StochasticGene.jl", # or "" if using the default environment
nchains = 4,
nthreads = 1,
maxtime = 72000.0,
samplesteps = 1_000_000,
)- Submit the swarm from the shell (example):
cd my_swarm
swarm -g 4 -t 16 -b 1 --time 24:00:00 --module julialang -f fit.swarmAdjust -g, -t, time, and module to match your allocation and Julia module name on Biowulf.
Generating keys and info_<key> in bulk
write_run_spec_preset— writeinfo_<key>.jld2+ marker TOML for one key.makeswarm_models— sweep single-unitG,R,S,insertstep, write presets, then callmakeswarm.makeswarmfiles— unified entry: coupled key lists (CSV, explicitbase_keys, or H3 grids) or single-unit sweeps; writes presets and runsmakeswarm. See its docstring for the mutually exclusive modes.makeswarmfiles_h3_latent— convenience for H3 latent key grids.
Swarm julia -p, nchains, merged info_<key>, and root
Parallel workers: The swarm command should use
-p N(or equivalent) consistent with how many chains run in parallel. Formakeswarmfiles/makeswarm_models, if you do not pass an explicit swarm-onlynchains=in kwargs, the generated-pis taken from each run spec’snchains(e.g. coupled defaults often use 16), so it stays aligned withfit(; …, nchains=…). See themakeswarmfilesdocstring.Merged presets: With
merge_existing_info=true(default), olderinfo_<key>.jld2files are merged into new specs. Legacytrace_specssometimes used a huget_end(historical “open end” sentinel). When saving,write_run_spec_presetrunsnormalize_trace_specs_legacy_t_end!so those values are rewritten tot_end = -1.0, matching currentdefault_trace_specs_for_coupledand avoiding invalid frame indices inread_tracefiles.rootin generated fit scripts: Scripts listrootexactly as in the run spec (no forcedabspath). Useroot="."if the job’s working directory is the project root (setcdin the swarm or submit from the right folder). Paths resolved in an interactive Biowulf session can differ from batch jobs;"."avoids baking in an interactive-only absolute path.
Key-based naming
Many batch helpers assume a string key per run:
results/<resultfolder>/info_<key>.tomlandinfo_<key>.jld2rates_<key>.txt
See Run specification (info TOML). Presets for cluster reruns are written with write_run_spec_preset.
Combined rate files (io.jl) — reference
read_rates_table,write_rates_tablemerge_coupled_two_unit_rates,merge_coupled_stacked_unitscreate_combined_file,create_combined_file_multread_combined_file_specs_csv,create_combined_files_driver,create_combined_files,create_combined_files_h3_latent
After the coupled fit
Post-processing examples: write_correlation_functions, write_traces, and other analysis functions in the API Reference.
See also
- Run specification (info TOML)
- Coupled model analysis (example-focused; batch mechanics are on this page)
- Installation (includes a Biowulf subsection)
- Model fitting (
fit)
Maintainer note: where to document what
| Topic | Canonical place |
|---|---|
| User workflows, Biowulf, coupled merge order | This page (hosted docs) |
info_<key> file format | runspectoml.md |
| README on GitHub | Short pointer + link to stable docs |
| Exact function signatures | Docstrings in biowulf.jl / io.jl |