Snakemake - 代码片段模板(VS Code)

Snakemake 代码片段按维度组织:每个 snippet 只提供一个主框架,不同应用场景以注释形式并列,按需取消注释即可。默认组合为 container(Docker)+ shell

配合 snakemake介绍参数扩展实例 使用。

段末注释:User Snippets 是 VS Code / Cursor 的用户自定义代码片段,输入前缀后按 Tab 展开。


1. 配置方式

全局 User Snippets(已配置),任意项目均可使用:

编辑器 路径
Cursor ~/Library/Application Support/Cursor/User/snippets/
VS Code ~/Library/Application Support/Code/User/snippets/
文件 前缀
snakemake.code-snippets smk-pipeline / smk-rule / smk-cluster / smk-run
wdl.code-snippets wdl-file / wdl-workflow / wdl-task / wdl-tooling

语言关联(User Settings):

1
2
3
4
5
"files.associations": {
"*.smk": "python",
"Snakefile": "python",
"*.wdl": "wdl"
}

修改片段:命令面板 → Snippets: Configure User Snippets → 选择对应文件。修改后 Reload Window 生效。


2. 片段速查

前缀 维度 说明
smk-pipeline 流程骨架 文件头 + include 外部 rule + rule all
smk-rule 计算规则 input/output/资源;执行体与运行环境可切换
smk-cluster 集群投递 cluster.yaml + SGE 投递命令
smk-run 本地运行 常用 snakemake 命令

3. 主框架预览

3.1 smk-rule — 计算规则

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
rule align:
input:
fq1="data/{sample}_1.fq.gz",
fq2="data/{sample}_2.fq.gz",
output:
bam="results/{sample}.bam",
params:
ref="refs/genome.fa",
log:
"logs/align_{sample}.log",
benchmark:
"benchmarks/align_{sample}.txt",
threads: 8
resources:
mem_mb=16000,
# priority: 100
# conda:
# "envs/align.yaml" # 运行时加 --use-conda
container:
"biocontainers/bwa:v0.7.17_cv1"
# singularity:
# "docker://biocontainers/bwa:v0.7.17_cv1" # --use-singularity
# singularity:
# "/path/to/bwa.sif"
shell:
"""
bwa mem -t {threads} {params.ref} {input.fq1} {input.fq2} \
| samtools sort -@ {threads} -o {output.bam}
"""
# script:
# "scripts/align.py"
# run:
# with open(output.bam, "w") as out:
# out.write(process(input.fq1, input.fq2))

3.2 smk-pipeline — 流程骨架

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
# version 1.0
## language: Snakemake
## File : Pipeline.smk
## Time : 2023/02/14 16:06:20
## Author : Liu.Bo
## Version : v1.0
## Contact : liubo4@genomics.cn/614347533@qq.com
## WebSite : http://www.ben-air.cn/
##
## Snakemake version support :
## - Successfully tested on v7.x
##
## WORKFLOW DEFINITIONS
# description

import sys
import pandas as pd

configfile: "config/default.yaml"
config.setdefault("S", "samples.tsv")
config.setdefault("O", "./output")
config.setdefault("Analysis", "")
config.setdefault("UnAnalysis", "")

try:
SAMPLE_FILE = config["S"]
OUTDIR = config["O"]
except KeyError:
sys.exit("Usage: snakemake -s Pipeline.smk -p -C S=sample.tsv O=outdir ...")

# --- 引入外部 smk 中的 rule ---
include: "rules/qc.smk"
include: "rules/align.smk" # rule align
include: "rules/variant.smk" # rule variant_call

df = pd.read_csv(SAMPLE_FILE, sep="\t", comment="#")
SAMPLES = df["Sample"].unique().tolist()

localrules: all

# --- 引用 include 中的 rule ---
# rules/variant.smk 中 variant_call 的 input 示例:
# bam="results/{sample}.bam", # align rule 的 output
# # bam=rules.align.output.bam, # 显式引用(Snakemake ≥6)

rule all:
input:
expand("results/{sample}.vcf.gz", sample=SAMPLES),

include 引用规则: rule 定义在 rules/*.smk 中;主流程通过 include 引入,下游 rule 的 input 路径匹配上游 output 建立依赖;具体 rule 体用 smk-rule 展开。


4. snippets JSON

全局配置文件路径见 §1。完整 JSON 如下(备份 / 迁移用):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
{
"Snakemake Pipeline": {
"prefix": "smk-pipeline",
"body": [
"import sys",
"",
"import pandas as pd",
"# import datetime, subprocess",
"",
"configfile: \"config/default.yaml\"",
"",
"config.setdefault(\"S\", \"samples.tsv\")",
"config.setdefault(\"O\", \"./output\")",
"config.setdefault(\"Analysis\", \"\")",
"config.setdefault(\"UnAnalysis\", \"\")",
"",
"# --- 参数校验 ---",
"try:",
" SAMPLE_FILE = config[\"S\"]",
" OUTDIR = config[\"O\"]",
"except KeyError:",
" sys.exit(",
" \"Usage: snakemake -s ${1:Pipeline.smk} -p -C S=sample.tsv O=outdir \"",
" 'Analysis=\"QC,MSI\" UnAnalysis=\"\"'",
" )",
"",
"include: \"${2:rules/qc.smk}\"",
"include: \"${3:rules/align.smk}\"",
"include: \"${4:rules/variant.smk}\"",
"",
"# --- 样本表读取 ---",
"df = pd.read_csv(SAMPLE_FILE, sep=\"\\t\", comment=\"#\")",
"SAMPLES = df[\"Sample\"].unique().tolist()",
"# SAMPLE_DICT = df.set_index(\"Sample\").to_dict(orient=\"index\")",
"",
"# --- 分支控制(按需启用)---",
"# ANALYSIS = set(config[\"Analysis\"].split(\",\")) if config.get(\"Analysis\") else None",
"# SKIP = set(config.get(\"UnAnalysis\", \"\").split(\",\"))",
"# def should_run(step):",
"# if step in SKIP: return False",
"# if ANALYSIS and step not in ANALYSIS: return False",
"# return True",
"",
"localrules: all",
"",
"# --- rule 链式依赖 ---",
"rule align: ...",
"rule variant_call:",
" input:",
" bam=\"results/{sample}.bam\",",
" # bam=rules.align.output.bam,",
"rule all:",
" input:",
" expand(\"results/{sample}.vcf.gz\", sample=SAMPLES),"
],
"description": "流程骨架:config 校验 + include + rule 链 + rule all"
},
"Snakemake Rule": {
"prefix": "smk-rule",
"body": [
"rule ${1:name}:",
" input:",
" ${2:in}=\"${3:data/{sample}.fq.gz}\",",
" output:",
" ${4:out}=\"${5:results/{sample}.bam}\",",
" params:",
" ${6:ref}=\"${7:refs/genome.fa}\",",
" log:",
" \"logs/${1:name}_{wildcards.sample}.log\",",
" benchmark:",
" \"benchmarks/${1:name}_{wildcards.sample}.txt\",",
" threads: ${8:4}",
" resources:",
" mem_mb=${9:8000},",
" # priority: 100",
" # conda:",
" # \"envs/${1:name}.yaml\" # --use-conda",
" container:",
" \"${10:biocontainers/tool:latest}\"",
" # singularity:",
" # \"docker://biocontainers/tool:latest\" # --use-singularity",
" # singularity:",
" # \"/path/to/tool.sif\"",
" shell:",
" \"\"\"",
" ${11:command}",
" \"\"\"",
" # script:",
" # \"scripts/${1:name}.py\"",
" # run:",
" # with open(output.${4:out}, \"w\") as fh:",
" # fh.write(process(input.${2:in}))"
],
"description": "计算 rule:默认 container + shell,其他场景注释"
},
"Snakemake Cluster": {
"prefix": "smk-cluster",
"body": [
"# --- cluster.yaml ---",
"__default__:",
" queue: \"${1:b2c_rd.q}\"",
" project: \"${2:P18Z11900N0299}\"",
" mem: \"4G\"",
" cores: 4",
"",
"${3:align}:",
" mem: \"16G\"",
" cores: 8",
" output: \"cluster_logs/{rule}.{wildcards}.o\"",
" error: \"cluster_logs/{rule}.{wildcards}.e\"",
"",
"# --- 投递命令 ---",
"snakemake -s ${4:Pipeline.smk} -p \\\\",
" -C S=${5:sample.tsv} O=${6:./output} \\\\",
" --cluster \"qsub -clear -V -cwd -P {cluster.project} -q {cluster.queue} \\\\",
" -l num_proc={threads} -l vf={resources.mem_mb}M -binding linear:{threads}\" \\\\",
" --cluster-config cluster.yaml \\\\",
" --jobs ${7:500} --rerun-incomplete --restart-times 5 --keep-going"
],
"description": "cluster.yaml + SGE 集群投递"
},
"Snakemake Run Local": {
"prefix": "smk-run",
"body": [
"snakemake -s ${1:Pipeline.smk} -p \\\\",
" -C S=${2:sample.tsv} O=${3:./output} \\\\",
" --cores ${4:8} \\\\",
" --rerun-incomplete --restart-times 5 --keep-going \\\\",
" --latency-wait 60 --stats runtime.json",
"",
"# --dag 2>/dev/null | dot -Tsvg > dag.svg",
"# --dry-run",
"# --use-singularity --singularity-args \"-B /path/to/mount\"",
"# --use-conda"
],
"description": "本地运行命令(可选参数注释)"
}
}

5. 场景切换说明

维度 默认 注释备选
pipeline 编排 config 校验 + include + rule 链 + rule all 分支控制 Analysis/UnAnalysis、运行日志
rule 引用 input 路径匹配上游 output rules.<name>.output.<key> 显式引用
执行体 shell script(外部脚本)、run(内联 Python)
运行环境 container(Docker) singularity(sif 或 docker://)、conda
辅助 log + benchmark priorityprotected() 输出

Singularity 挂载详见 使用容器-singularity


6. 相关文档

-------------本文结束感谢您的阅读-------------