
如果你使用 DeepSeek、元宝 AI、Copilot、ChatGPT 等 AI 工具,协助配置轩辕镜像、编写 docker pull 命令、修改 Docker Compose 镜像地址、配置镜像加速、排查镜像拉取失败、分析报错日志等问题,请先让 AI 阅读并遵守轩辕镜像的规则文档。
只需在 AI 对话中先发送下面这句话即可:
请先完整阅读并严格遵守以下文档中的全部规则与要求:
https://xuanyuan.cloud/agents.md
在未充分阅读并理解该文档前,不要生成任何命令、配置、修改建议、故障排查方案或技术回答。后续所有输出都必须严格以该文档中的规范为最高优先级执行。查看 agents.md 用法指南与完整示范。国内用户首推 元宝 AI、DeepSeek 的深度思考模式,不推荐豆包 AI;Cursor 等编辑器可在对话 @ 该链接,或加入 User Rules。 若 AI 无法访问外链,可 打开说明文档 复制全文粘贴。文档会随站点更新,复制内容可能过期,建议定期检查。
([***] !Docker Image Version (latest semver) !Docker Cloud Build Status
Close assembly gaps using long-reads with focus on correctness.
Today, many genome sequencing project have been conducted using
second-generation sequencers which produce short reads. Such assemblies have
many gaps. dentist closes these gaps using a (small) set of long reads.
Furthermore, it can be used to scaffold contigs freely using a set of long
reads. This can be used to fix known scaffolding errors or to further scaffold
output of a long-read assembly pipeline.
Make sure Singularity is installed on your system. You can then use the container like so:
sh# launch an interactive shell singularity shell docker://aludi/dentist:latest # execute a single command inside the container singularity exec docker://aludi/dentist:latest dentist --version # run the whole workflow on a cluster using Singularity snakemake --configfile=snakemake.yml --use-singularity --profile=slurm
The last command is explained in more detail below in the usage section.
Download the latest pre-built binaries from the https://github.com/a-ludi/dentist/releases
and extract the contents. The tarball contains a dentist binary as well as
the Snakemake workflow, example config files and this README. In short, everything you to run DENTIST.
Be sure to install the D package manager DUB. Install using either
shdub install dentist
or
shgit clone https://github.com/a-ludi/dentist.git cd dentist dub build
The following software packages are required to run dentist:
Manage sequences (reads and assemblies) in 4bit encoding alongside auxiliary information such as masks or QV tracks
Find significant local alignments.
Find alignment chains, i.e. sequences of significant local alignments possibly with unaligned gaps.
Discover tandem repeats.
Estimate coverage and compute QVs.
Compute reference-based consensus sequence for gap filling.
Please see their own documentation for installation instructions. Note, the available packages on Bioconda are outdated and should not be used at the moment.
Please use the following versions in your dependencies in case you experience troubles:
Suppose we have the genome assembly reference.fasta that is to be updated
and a set of reads reads.fasta with 25× coverage.
TL;DR
snakemake --configfile=snakemake.yml --use-singularity --profile=slurm
Install Snakemake version >=5.11.2 and copy these files into your working directory:
./snakemake/Snakefile./snakemake/workflow_helper.py./snakemake/snakemake.example.yml → ./snakemake/snakemake.ymlNext edit snakemake.yml to fit your needs and optionally test your
configuration with
snakemake --configfile=snakemake.yml -- extend_dentist_config
If no errors occurred the whole workflow can be executed using
snakemake --configfile=snakemake.yml
For small genomes of a few 100 Mbp this should run on a regular workstation.
One may use Snakemake's --jobs to run independent jobs in parallel. Larger
data sets may require a cluster in which case you can use Snakemake's
cloud or cluster facilities.
Executing on a Cluster
To make execution on a cluster easy DENTIST comes with examples files to make Snakemake use SLURM via DRMAA. Please read the documentation of Snakemake if this does not suit your needs. Another good starting point is https://github.com/snakemake-profiles/doc.
Start by copying these files to your working directory:
./snakemake/profile-slurm.yml → ~/.config/snakemake/<profile>/config.yaml./snakemake/cluster.example.yml → ./snakemake/cluster.ymlNext adjust the profile according to your cluster. This should enable
Snakemake to submit and track jobs on your cluster. You may use the
configuration values specified in cluster.yml to configure job names and
resource allocation for each step of the pipeline. Now, submit the workflow
to your cluster by
snakemake --configfile=snakemake.yml --profile=<profile>
Note, parameters specified in the profile provide default values and can be overridden by specififying different value on the CLI.
Please inspect the Snakemake workflow to get all the details. It might be
useful to execute Snakemake with the -p switch which causes Snakemake to
print the shell commands. If you plan to write your own workflow management
for DENTIST please feel free to contact the maintainer!
DENTIST comprises a complex pipeline of with many options for tweaking. This section points out some important parameters and their effect on the result.
How to Choose DENTIST Parameters
The following list comprises the important/influential parameters for DENTIST itself. Please keep in mind that the alignments generated by daligner/damapper have immense influence on the performance of DENTIST.
--max-insertion-error: Strong influence on quality and sensitivity. Lower
values lead to lower sensitivity but higher quality. The maximum recommended value is 0.05.
--min-anchor-length: Higher values results in higher accuracy but lower
sensitivity. Especially, large gaps cannot be closed if the value is too
high. Usually the value should be at least 500 and up to 10_000.
--reference-error, --reads-error: Determines the -e parameter for
daligner/damapper. Use dentist generate-dazzler-options to see the effect
of these parameters or consult commandline.d in the
source code.
--min-reads-per-pile-up: Choosing higher values for the minimum number of
reads drastically reduces sensitivity but has little effect on the quality.
Small values may be chosen to get the maximum sensitivity in de novo
assemblies. Make sure to throughly validate the results though.
--min-spanning-reads: Higher values give more confidence on the
correctness of closed gaps but reduce sensitivity. The value must be well
below the expected coverage.
--allow-single-reads: May be used under careful ***ation. This is
intended for one of the following scenarios:
--existing-gap-bonus: If DENTIST finds evidence to join two contigs that
are already consecutive in the input assembly (i.e. joined by Ns) then it
will preferred over conflicting joins (if present) with this bonus. The
default value is rather conservative, i.e. the preferred join almost always
wins over other joins in case of a conflict.
--join-policy: Choose according to your needs:
scaffoldGaps: Closes only gaps that are marked by Ns in the assembly.
This is the default mode of operation. Use this if you do not want to alter
the scaffolding of the assembly. See also --existing-gap-bonus.scaffolds: Allows whole scaffolds to be joined in addition to the effects
of scaffoldGaps. Use this if you have (many) scaffolds that are not yet
full chromosome-scale.contigs: Allows contigs to be rearranged freely. This is especially
useful in de novo assemblies before applying any other scaffolding
methods as it increases the contiguity thus increasing the chance that
large-scale scaffolding (e.g. Bionano or Hi-C) finds proper joins.Choosing the Read Type
In the examples PacBio long reads are assumed but DENTIST can be run using any
kind of long reads. Currently, this is either PacBio or Oxford Nanopore reads.
For using none-PacBio reads, the reads_type in snakemake.yml must be set
to anything other than PACBIO_SMRT. The recommendation is to use
OXFORD_NANOPORE for Oxford Nanopore. These names are borrowed from the NCBI.
Further details on the rationale can found in https://github.com/a-ludi/dentist/issues/1#issuecomment-610764625.
Arne Ludwig, Martin Pippel, Gene Myers, Michael Hiller. DENTIST – close assembly gaps with high confidence. In preparation.
Dentist is being developed by Arne Ludwig <<***>> at the Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany.
Contributions are warmly welcome. Just create an https://github.com/a-ludi/dentist/issues or https://github.com/a-ludi/dentist/pulls on GitHub. If you submit a pull request please make sure that:
dub test runs successfully.It is recommended to install the Git hooks included in the repository to avoid premature pull requests. You can enable all shipped hooks with this command:
shgit config --local core.hooksPath .githooks/
If you do not want to enable just a subset use ln -s .githooks/{hook} .git/hooks. If you want to audit code changes before they get executed on your machine you can you cp .githooks/{hook} .git/hooks instead.
This project is licensed under MIT License (see LICENSE).
您可以使用以下命令拉取该镜像。请将 <标签> 替换为具体的标签版本。如需查看所有可用标签版本,请访问 标签列表页面。
来自真实用户的反馈,见证轩辕镜像的优质服务