
dimeng851/embeddingThis project offers three types of protein sequence embedding methods:
1️⃣ Onehot: Encode your protein sequences into one-hot representations.
2️⃣ ProtTrans: Utilize ProtTrans to embed your protein sequences.
3️⃣ MSA Transformer: Employ MSA Transformer for embedding.
.fasta file, put the queir sequences into one FASTA file.
e.g.sh>DP02585 MWERLNCAAEDFYSRLLQKFNEEKKGIRKDPFLYEADVQVQLISKGQPNPLKNILNENDIVFIVEKVPLEKEETSHIEELQSEETAISDFSTGENVGPLALPVGKARQLIGLYTMAHNPNMTHLKINLPVTALPPLWVRCDSSDPEGTCWLGAELITTNNSITGIVLYVVSCKADKNYSVNLENLKNLHKKRHHLSTVTSKGFAQYELFKSSALDDTITASQTAIALDISWSPVDEILQIPPLSSTATLNIKVESGEPRGPLNHLYRELKFLLVLADGLRTGVTEWLEPLEAKSAVELVQEFLNDLNKLDGFGDSTKKDTEVETLKHDTAAVDRSVKRLFKVRSDLDFAEQLWCKMSSSVISYQDLVKCFTLIIQSLQRGDIQPWLHSGSNSLLSKLIHQSYHGTMDTVSLSGTIPVQMLLEIGLDKLKKDYISFFIGQELASLNHLEYFIAPSVDIQEQVYRVQKLHHILEILVSCMPFIKSQHELLFSLTQICIKYYKQNPLDEQHIFQLPVRPTAVKNLYQSEKPQKWRVEIYSGQKKIKTVWQLSDSSPIDHLNFHKPDFSELTLNGSLEERIFFTNMVTCSQVHFK >DP02606 MSRQSSVSFRSGGSRSFSTASAITPSVSRTSFTSVSRSGGGGGGGFGRVSLAGACGVGGYGSRSLYNLGGSKRISISTSGGSFRNRFGAGAGGGYGFGGGAGSGFGFGGGAGGGFGLGGGAGFGGGFGGPGFPVCPPGGIQEVTVNQSLLTPLNLQIDPSIQRVRTEEREQIKTLNNKFASFIDKVRFLEQQNKVLDTKWTLLQEQGTKTVRQNLEPLFEQYINNLRRQLDSIVGERGRLDSELRNMQDLVEDFKNKYEDEINKRTTAENEFVMLKKDVDAAYMNKVELEAKVDALMDEINFMKMFFDAELSQMQTHVSDTSVVLSMDNNRNLDLDSIIAEVKAQYEEIANRSRTEAESWYQTKYEELQQTAGRHGDDLRNTKHEISEMNRMIQRLRAEIDNVKKQCANLQNAIADAEQRGELALKDARNKLAELEEALQKAKQDMARLLREYQELMNTKLALDVEIATYRKLLEGEECRLSGEGVGPVNISVVTSSVSSGYGSGSGYGGGLGGGLGGGLGGGLAGGSSGSYYSSSSGGVGLGGGLSVGGSGFSASSGRGLGVGFGSGGGSSSSVKFVSTTSSSRKSFKS ...
.a3m files (for MSA transformer only)
Generate from HHblits
[SEQUENCE_NAME/ID].a3m, replace SEQUENCE_NAME/ID with the actural sequence ID, it should be the same as the name from.fastafile.
e.g.DP02585.a3mandDP02606.a3m
[SEQUENCE_NAME/ID].npy files.
Pull the Docker image from DockerHub
shdocker pull dimeng851/embedding:v3
Edit the embeeding methods in Docker file
Default: apply all three emedding methods: 1️⃣ onehot, 2️⃣ protTrans , and 3️⃣ MSA Transformer. If you want to generate embedding from only one or two of the embedding methods
a. open Dockerfile
b. delete the embedding methods you don't want from
shCMD python /embedding/main.py --embeddingType onehot,protTrans,msaTrans
run Containner
shdocker run -d \ -it \ --name CONTAINER_NAME \ --mount type=bind,source=PATH_TO_INPUT_FASTA_FILE,target=/embedding/data/input.fasta \ --mount type=bind,source=PATH_TO_INPUT_A3M_FOLDER,target=/embedding/data/hmm \ --mount type=bind,source=PATH_TO_INPUT_OUTPUT_FOLDER,target=/embedding/data/output \ --mount type=bind,source=PATH_TO_INPUT_TORCH_CHECKPOINT,target=/root/.cache/torch/hub/checkpoints/ \ --mount type=bind,source=PATH_TO_INPUT_HUGGINGFACE_HUB,target=/root/.cache/huggingface/hub/ \ dimeng851/embedding:v3
Please replace the following parts:
CONTAINER_NAMEwith any container name you like,
PATH_TO_INPUT_FASTA_FILEwith input fasta file path,
PATH_TO_INPUT_A3M_FOLDERwith the folder to Hblits searching results, here we require.a3mfiles,
PATH_TO_INPUT_OUTPUT_FOLDERwith the folder you want to put the embedding sequences. Do not change the other parts,
PATH_TO_INPUT_TORCH_CHECKPOINTwith the parent folder to two MSA Transformer models (esm_msa1b_t12_100M_UR50S-contact-regression.pt & esm_msa1b_t12_100M_UR50S.pt) if you have the models downloaded. Otherwise, a folder to save the pretranined models.
PATH_TO_INPUT_HUGGINGFACE_HUBwith the parent folder to ProtTrans (models--Rostlab--prot_t5_xl_uniref50) if you have the models downloaded. Otherwise, a folder to save the pretranined model.
Here is an example,
shdocker run -d \ -it \ --name embed_con \ --mount type=bind,source=/home/dimeng/caid3/test.fasta,target=/embedding/data/input.fasta \ --mount type=bind,source=/home/dimeng/project/linker_caid/a3m,target=/embedding/data/hmm \ --mount type=bind,source=/home/dimeng/caid3/output/embedding,target=/embedding/data/output \ --mount type=bind,source=/home/dimeng/.cache/torch/hub/checkpoints/,target=/root/.cache/torch/hub/checkpoints/ \ --mount type=bind,source=/home/dimeng/.cache/huggingface/hub/,target=/root/.cache/huggingface/hub/ \ dimeng851/embedding:v3
Check the embedded results from the output folder you provided
Here are some information about the Docker version this project used
shClient: Cloud integration: v1.0.35+desktop.10 Version: 25.0.3 API version: 1.44 Go version: go1.21.6 Git commit: 4debf41 Built: Tue Feb 6 21:13:26 2024 OS/Arch: darwin/amd64 Context: desktop-linux Server: Docker Desktop 4.27.2 (***) Engine: Version: 25.0.3 API version: 1.44 (minimum version 1.24) Go version: go1.21.6 Git commit: f417435 Built: Tue Feb 6 21:14:25 2024 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.6.28 GitCommit: ae07eda36dd25f8a1b98dfbf587313b99c0190bb runc: Version: 1.1.12 GitCommit: v1.1.12-0-g51d5e94 docker-init: Version: 0.19.0 GitCommit: de40ad0





manifest unknown 错误
TLS 证书验证失败
DNS 解析超时
410 错误:版本过低
402 错误:流量耗尽
身份认证失败错误
429 限流错误
凭证保存错误
来自真实用户的反馈,见证轩辕镜像的优质服务