Usage
CompareM2 is built on top of Snakemake. Pipeline parameters are passed via --config, and all Snakemake command-line options are available.
comparem2 [ --config KEY=VALUE [KEY2=VALUE]... ]
[ --until RULE [RULE2]... ]
[ --forcerun RULE [RULE2]... ]
[ --downloads ]
[ --printshellcmds ]
[ --dry-run ]
[ --status ]
[ --version ] [ --help ] [ --cite ]
Examples
Run all analyses on all genome files (*.fna *.fa *.fasta *.fas) in the current directory:
comparem2
Run only the fast analyses:
comparem2 --until fast
Specify input and output paths:
comparem2 --config input_genomes="path/to/genomes_*.fna" output_directory="my_analysis"
Use a file-of-filenames (fofn):
ls path/to/*.fna > my_fofn.txt
comparem2 --config fofn="my_fofn.txt"
Dry run (preview what will run without executing):
comparem2 --config input_genomes="path/to/genomes_*.fna" --dry-run
Analyze NCBI reference genomes by accession:
comparem2 --config add_ncbi="GCF_009734005.1,GCF_029023785.1"
Use Prokka instead of the default Bakta annotator:
comparem2 --config annotator="prokka"
Combine options — run fast analyses plus panaroo with Bakta annotation:
comparem2 --config input_genomes="path/to/genomes_*.fna" --until fast panaroo
Pass a parameter directly to an underlying tool:
comparem2 --config set_panaroo--threshold=0.95 --until fast panaroo
Configuration reference
Input genomes (input_genomes)
A glob pattern specifying which genome files to analyze. The default picks up all common FASTA extensions in the current directory:
input_genomes="*.fna *.fa *.fasta *.fas"(default)input_genomes="path/to/my/genomes*.fna"input_genomes="path/genome1.fna path/genome2.fna"
File of file names (fofn)
For larger sets of genomes, list paths in a text file (one per line). When set, fofn overrides input_genomes:
ls *.fna > fofn.txt
comparem2 --config fofn="fofn.txt"
Pre-annotated NCBI reference genomes (add_ncbi)
Add reference genomes from NCBI/GenBank by accession. Genomes and their PGAP annotations are downloaded automatically via NCBI Datasets. Multiple accessions are comma-separated:
comparem2 --config add_ncbi="GCF_029023785.1,GCF_009734005.1"
Output directory (output_directory)
Where results are written. Default: results_comparem2.
comparem2 --config output_directory="my_results"
Annotation tool (annotator)
CompareM2 ships with two annotators:
bakta(default) — recommended for bacteriaprokka— also supports archaea
The choice of annotator affects many downstream analyses, as tools like panaroo, eggnog, and interproscan consume its output.
comparem2 --config annotator="prokka"
Report title (title)
Custom title for the HTML report. Defaults to the name of the current working directory.
Passthrough arguments
CompareM2 can forward arbitrary command-line arguments to any underlying tool using a set_ prefix in the config. The syntax is:
set_<rule><option>=<value>
Where <rule> is the Snakemake rule name, <option> is the tool's command-line flag (including dashes), and <value> is the parameter value.
Example: Set Prokka's --kingdom flag to archaea:
comparem2 --config set_prokka--kingdom=archaea
Flag-only arguments (no value) use an empty string:
comparem2 --config set_prokka--rfam=""
Multiple passthrough arguments can be combined:
comparem2 --config set_prokka--kingdom=archaea set_panaroo--threshold=0.95 --until panaroo fast
The default passthrough arguments can be overridden by specifying the same key on the command line.
Validating passthrough arguments
Use -p --dry-run to preview the generated shell commands and verify that your arguments are being passed correctly:
comparem2 --config set_panaroo--threshold=0.99 --until panaroo -p --dry-run
#> [...]
#> panaroo \
#> -o results_comparem2/panaroo \
#> -t 16 \
#> --clean-mode sensitive \
#> --core_threshold 0.95 \
#> --threshold 0.99 \
#> [...]
Snakemake options
--until RULE [RULE2]...
Run only the specified rule(s) and their dependencies. Multiple rules can be listed. Available rules:
abricate amrfinder annotate antismash assembly_stats bakta bootstrap_mashtree carveme checkm2 copy dbcan eggnog fasttree gapseq_find gapseq_fill gtdbtk interproscan iqtree kegg_pathway mashtree mlst panaroo prokka sequence_lengths snp_dists treecluster
Download rules: antismash_download bakta_download checkm2_download dbcan_download eggnog_download gtdb_download
Pseudo-rules
Pseudo-rules are shortcuts that run a curated set of analyses:
| Pseudo-rule | Description | Included analyses |
|---|---|---|
fast |
Completes in seconds; useful for testing | sequence_lengths, assembly-stats, mashtree |
meta |
Analyses relevant for MAGs | annotation, assembly-stats, sequence_lengths, checkm2, eggnog, kegg_pathway, dbcan, interproscan, gtdbtk, mashtree |
isolate |
Analyses relevant for clinical isolates | annotation, assembly-stats, sequence_lengths, eggnog, kegg_pathway, gtdbtk, mlst, amrfinder, panaroo, fasttree, snp-dists, mashtree |
downloads |
Download and set up all databases | All database download rules |
report |
Re-render the report only | Report generation |
Usage: comparem2 --until meta or comparem2 --until isolate
--forcerun RULE [RULE2]...
Force re-execution of completed rules. Necessary when changing config parameters for a rule that has already run.
--printshellcmds, -p
Print the generated shell command for each rule.
--dry-run
Show what would run without executing anything.
CompareM2-specific options
These options do not invoke the Snakemake pipeline.
| Option | Description |
|---|---|
--downloads |
Download all databases without running analyses |
--status |
Show completion status of each rule in the current project directory |
--version, -v |
Show version |
--help, -h |
Show help |
--cite |
Show citation information |
Output structure
CompareM2 writes all results to the output directory (default: results_comparem2/). Per-sample results are in samples/<sample>/, and cross-sample results are in the root.
The report is named report_<title>.html, where <title> defaults to the current working directory name.
results_comparem2/
├── amrfinder/
├── assembly-stats/
├── benchmarks/
├── checkm2/
├── fasttree/
├── gtdbtk/
├── iqtree/
├── kegg_pathway/
├── mashtree/
├── metadata.tsv
├── mlst/
├── panaroo/
├── report_<title>.html
├── samples/
│ └── <sample>/
│ ├── <sample>.fna
│ ├── antismash/
│ ├── bakta/
│ ├── dbcan/
│ ├── eggnog/
│ ├── gapseq/
│ ├── interproscan/
│ ├── prokka/
│ └── sequence_lengths/
├── snp-dists/
├── visuals/
├── treecluster/
└── version_info.txt
For per-tool file details, consult the respective tool's documentation.