bytez/nsorana_my_awesome_model Docker Image Overview

bytez/nsorana_my_awesome_model

bytez

下载次数: 0状态：社区镜像维护者：bytez仓库类型：镜像最近更新：11 个月前

轩辕镜像，让镜像更快，让人生更轻。点击查看

镜像简介版本下载

轩辕镜像，让镜像更快，让人生更轻。点击查看

Model

Model: Nsorana/my_awesome_model

Task: text-classification

How to Run this Model

If you're just getting started, we recommend that you try out the Bytez Model Playground directly or use one of our Client Libraries to access the Bytez Inference API.

You'll receive 100 free credits of inference each month!

Javascript, Python, and Julia are currently supported.

Bytez Model Playground

You can play with models without having to write any code by visiting Bytez

<p>Models can also be explored:</p>

If that's not your cup of tea, keep reading!

Setup Requirements

Ensure Docker is installed.
This model container is free and open source. It requires a free API key, so it can alert you of analytics and when model upgrades are available. To get your key, visit Bytez and sign in.

Your API key will be front and center with a copy button, like you see in the image below:

Running the Image

Step 1: Pull the Docker Image

bash
docker pull bytez/nsorana_my_awesome_model

Step 2: Start the container on port 8000

bash
docker run -it \
 -e KEY=YOUR_BYTEZ_API_KEY_HERE \
 -e PORT=8000 \
 -p 8000:8000 \
 bytez/nsorana_my_awesome_model

Adjusting the port if desired

NOTE you can adjust the port if needed by the -e PORT= environment variable and the -p option.

e.g. if you want to start the container on port 80, you'd do this instead:

bash
docker run -it \
 -e KEY=YOUR_BYTEZ_API_KEY_HERE \
 -e PORT=80 \
 -p 80:80 \
 bytez/nsorana_my_awesome_model

Step 3: Make an Inference

Send POST requests to the container and the model will reply.

bash
curl --location 'http://0.0.0.0:8000/run' \
--header 'Content-Type: application/json' \
--data-raw '{
  "text": "I am absolutely furious about the situation! How could you possibly let this happen? Your complete lack of responsibility and incompetence is mind-boggling. This has caused an enormous amount of stress and inconvenience for everyone involved, and it's entirely unacceptable. I expect immediate action to rectify this mess. If this isn't resolved promptly, there will be serious consequences. I'm beyond frustrated and utterly disappointed in your performance.",
  "params": {}
}'

Storing Weights Locally (Save them to your disk)

To ensure that weights are saved locally between runs, you can specify a directory for where you want weights to be stored.

For large models, this is highly recommeded, as download times can be hours for larger models.

This can be specified via the -v option

To do this, run the following command:

bash
docker run -it \
 -v /PATH/TO/YOUR/CACHING/DIRECTORY/HERE:/server/model \
 -e HF_HOME=/server/model \
 -e KEY=YOUR_BYTEZ_API_KEY_HERE \
 -p 8000:8000 \
 -e PORT=8000 \
 bytez/nsorana_my_awesome_model

Notice how in the command above we have -v /PATH/TO/YOUR/CACHING/DIRECTORY/HERE:/server/model and -e HF_HOME=/server/model

The -v /PATH/TO/YOUR/CACHING/DIRECTORY/HERE:/server/model says, mount the directory -v /PATH/TO/YOUR/CACHING/DIRECTORY/HERE to the docker container's filesystem at the directory /server/model

-e HF_HOME=/server/model allows the code to load the model from the directory in the docker container, i.e. from the /server/model directory.

On my machine, the command looks like this:

bash
docker run -it \
 -v /home/inf3rnus/models:/server/model \
 -e HF_HOME=/server/model \
 -e KEY=YOUR_BYTEZ_API_KEY_HERE \
 -p 8000:8000 \
 -e PORT=8000 \
 bytez/nsorana_my_awesome_model

Running on GPU(s)

To run on GPU(s), make sure you have the latest drivers from Nvidia and CUDA installed.

Then, simply run the command from above, but with --gpus all added to the list of docker options.

bash
docker run -it \
 --gpus all \
 -e KEY=YOUR_BYTEZ_API_KEY_HERE \
 -p 8000:8000 \
 -e PORT=8000 \
 bytez/nsorana_my_awesome_model

Local caching and running on GPU(s)

The two commands from above combined into one:

bash
docker run -it \
 --gpus all \
 -v /PATH/TO/YOUR/CACHING/DIRECTORY/HERE:/server/model \
 -e HF_HOME=/server/model \
 -e KEY=YOUR_BYTEZ_API_KEY_HERE \
 -p 8000:8000 \
 -e PORT=8000 \
 bytez/nsorana_my_awesome_model

Additional environment variables

-e DEVICE="SOME_VALUE_HERE"

Defaults to auto

Can be:

-e DEVICE="cuda"

-e DEVICE="auto"

-e DEVICE="cpu"

auto will attempt to place the weights on the GPU if available, and then place them onto system RAM if there is not enough memory.

cuda will attempt to place the weights on the GPU

cpu will attempt to place the weights on the CPU

Allows you to specify with greater control which device you want to run the model on. Auto may split the model across system RAM and VRAM. You will often use this to attempt forcing the model to be loaded onto the GPU.

NOTE: Some models only exclusively work with auto, cuda, or cpu

Have questions? Need help?

Hop into the Bytez *** for live support: the community is happy to help. If you don't have ***, email us.

Model Parameters

LENGTH CONTROL

max_length (int) (optional): The maximum length the generated tokens can have. Default: 20.
max_new_tokens (int) (optional): The maximum numbers of tokens to generate, ignoring the number of tokens in the prompt.
min_length (int) (optional): The minimum length of the sequence to be generated. Default: 0.
min_new_tokens (int) (optional): The minimum numbers of tokens to generate, ignoring the number of tokens in the prompt.
early_stopping (bool or str) (optional): Controls the stopping condition for beam-based methods. Default: False.
max_time (float) (optional): The maximum amount of time for the computation to run.

GENERATION STRATEGY

do_sample (bool) (optional): Whether to use sampling or greedy decoding. Default: False.
num_beams (int) (optional): Number of beams for beam search. Default: 1.
num_beam_groups (int) (optional): Number of groups for diversity among beams. Default: 1.
penalty_alpha (float) (optional): Balance model confidence and degeneration penalty.
use_cache (bool) (optional): Whether to use cache for speeding up decoding. Default: True.

LOGITS MANIPULATION

temperature (float) (optional): The value used to modulate the next token probabilities. Default: 1.
top_k (int) (optional): The number of highest probability vocabulary tokens to keep for top-k-filtering. Default: 50.
top_p (float) (optional): If set to float < 1, only the smallest set of most probable tokens with probabilities that add up to top_p or higher are kept for generation. Default: 1.
typical_p (float) (optional): Local typicality measures how similar the conditional probability of predicting a target token next is to the expected conditional probability of predicting a random token next, given the partial text already generated. If set to float < 1, the smallest set of the most locally typical tokens with probabilities that add up to typical_p or higher are kept for generation. See this paper for more details. Default: 1.
epsilon_cutoff (float) (optional): If set to float strictly between 0 and 1, only tokens with a conditional probability greater than epsilon_cutoff will be sampled. In the paper, suggested values range from 3e-4 to 9e-4, depending on the size of the model. See Truncation Sampling as Language Model Desmoothing for more details. Default: 0.
eta_cutoff (float) (optional): Eta sampling is a hybrid of locally typical sampling and epsilon sampling. If set to float strictly between 0 and 1, a token is only ***ed if it is greater than either eta_cutoff or sqrt(eta_cutoff) * exp(-entropy(softmax(next_token_logits))). The latter term is intuitively the expected next token probability, scaled by sqrt(eta_cutoff). In the paper, suggested values range from 3e-4 to 2e-3, depending on the size of the model. See Truncation Sampling as Language Model Desmoothing for more details. Default: 0.
diversity_penalty (float) (optional): This value is subtracted from a beam's score if it generates a token same as any beam from other group at a particular time. Note that diversity_penalty is only effective if group beam search is enabled. Default: 0.
repetition_penalty (float) (optional): The parameter for repetition penalty. 1.0 means no penalty. See this paper for more details. Default: 1.
encoder_repetition_penalty (float) (optional): The paramater for encoder_repetition_penalty. An exponential penalty on sequences that are not in the original input. 1.0 means no penalty. Default: 1.
length_penalty (float) (optional): Exponential penalty to the length that is used with beam-based generation. It is applied as an exponent to the sequence length, which in turn is used to divide the score of the sequence. Since the score is the log likelihood of the sequence (i.e. negative), length_penalty > 0.0 promotes longer sequences, while length_penalty < 0.0 encourages shorter sequences. Default: 1.
no_repeat_ngram_size (int) (optional): If set to int > 0, all ngrams of that size can only occur once. Default: 0.
bad_words_ids (List[List[int]]) (optional): List of list of token ids that are not allowed to be generated. Check [~generation.NoBadWordsLogitsProcessor] for further documentation and examples.
force_words_ids (List[List[int]]) (optional): List of token ids that must be generated. If given a List[List[int]], this is treated as a simple list of words that must be included, the opposite to bad_words_ids. If given List[List[List[int]]], this triggers a disjunctive constraint, where one can allow different forms of each word.
renormalize_logits (bool) (optional): Whether to renormalize the logits after applying all the logits processors or warpers (including the custom ones). It's highly recommended to set this flag to True as the search algorithms suppose the score logits are normalized but some logit processors or warpers break the normalization. Default: false.
constraints (List[Constraint]) (optional): Custom constraints that can be added to the generation to ensure that the output will contain the use of certain tokens as defined by Constraint objects, in the most sensible way possible.
forced_bos_token_id (int) (optional): The id of the token to force as the first generated token after the decoder_start_token_id. Useful for multilingual models like mBART where the first generated token needs to be the target language token. Default: model.config.forced_bos_token_id.
forced_eos_token_id (Union[int, List[int]]) (optional): The id of the token to force as the last generated token when max_length is reached. Optionally, use a list to set multiple end-of-sequence tokens. Default: model.config.forced_eos_token_id.
remove_invalid_values (bool) (optional): Whether to remove possible nan and inf outputs of the model to prevent the generation method to crash. Note that using remove_invalid_values can slow down generation. Default: model.config.remove_invalid_values.
exponential_decay_length_penalty (tuple(int, float)) (optional): This Tuple adds an exponentially increasing length penalty, after a certain amount of tokens have been generated. The tuple shall consist of: (start_index, decay_factor) where start_index indicates where penalty starts and decay_factor represents the factor of exponential decay
suppress_tokens (List[int]) (optional): A list of tokens that will be suppressed at generation. The SupressTokens logit processor will set their log probs to -inf so that they are not sampled.
begin_suppress_tokens (List[int]) (optional): A list of tokens that will be suppressed at the beginning of the generation. The SupressBeginTokens logit processor will set their log probs to -inf so that they are not sampled.
forced_decoder_ids (List[List[int]]) (optional): A list of pairs of integers which indicates a mapping from generation indices to token indices that will be forced before sampling. For example, [[1, 123]] means the second generated token will always be a token of index 123.
sequence_bias (Dict[Tuple[int], float]) (optional): Dictionary that maps a sequence of tokens to its bias term. Positive biases increase the odds of the sequence being selected, while negative biases do the opposite.
guidance_scale (float) (optional): The guidance scale for classifier free guidance (CFG). CFG is enabled by setting guidance_scale > 1. Higher guidance scale encourages the model to generate samples that are more closely linked to the input prompt, usually at the expense of poorer quality.
low_memory (bool) (optional): Switch to sequential beam search and sequential topk for contrastive search to reduce peak memory. Used with beam search and contrastive search.

GENERATE PARAMETERS

num_return_sequences (int) (optional): The number of independently computed returned sequences for each element in the batch. Default: 1.
output_attentions (bool) (optional): Whether or not to return the attentions tensors of all attention layers. See attentions under returned tensors for more details. Default: false.
output_hidden_states (bool) (optional): Whether or not to return the hidden states of all layers. See hidden_states under returned tensors for more details. Default: false.
output_scores (bool) (optional): Whether or not to return the prediction scores. See scores under returned tensors for more details. Default: false.
output_logits (bool) (optional): Whether or not to return the unprocessed prediction logit scores. See logits under returned tensors for more details.
return_dict_in_generate (bool) (optional): Whether or not to return a [~utils.ModelOutput] instead of a plain tuple. Default: false.

SPECIAL TOKENS

pad_token_id (int) (optional): The id of the padding token.
bos_token_id (int) (optional): The id of the beginning-of-sequence token.
eos_token_id (Union[int, List[int]]) (optional): The id of the end-of-sequence token. Optionally, use a list to set multiple end-of-sequence tokens.

ENCODER DECODER GENERATION PARAMETERS

encoder_no_repeat_ngram_size (int) (optional): If set to int > 0, all ngrams of that size that occur in the encoder_input_ids cannot occur in the decoder_input_ids. Default: 0.
decoder_start_token_id (Union[int, List[int]]) (optional): If an encoder-decoder model starts decoding with a different token than bos, the id of that token or a list of length batch_size. Indicating a list enables different start ids for each element in the batch (e.g. multilingual models with different target languages in one batch)

ASSISTANT GENERATION PARAMETERS

num_assistant_tokens (int) (optional): Defines the number of speculative tokens that shall be generated by the assistant model before being checked by the target model at each iteration. Higher values for num_assistant_tokens make the generation more speculative: If the assistant model is performant larger speed-ups can be reached, if the assistant model requires lots of corrections, lower speed-ups are reached. Default: 5.
num_assistant_tokens_schedule (str) (optional): Defines the schedule at which max assistant tokens shall be changed during inference. - heuristic: When all speculative tokens are correct, increase num_assistant_tokens by 2 else reduce by 1. num_assistant_tokens value is persistent over multiple generation calls with the same assistant model. - heuristic_transient: Same as heuristic but num_assistant_tokens is reset to its initial value after each generation call. - constant: num_assistant_tokens stays unchanged during generation Default: heuristic.

CACHING PARAMETERS

cache_implementation (str) (optional): Cache class that should be used when generating. Default: null.

GENERATION KWARGS

generation_kwargs (object) (optional): Additional generation kwargs will be forwarded to the generate function of the model. Kwargs that are not present in generate's signature will be used in the model forward pass.

Full parameter list available here, courtesy of Hugging Face.

Using models locally offers enhanced privacy, control, and customization for your projects. Happy building!

轩辕镜像配置手册

探索更多轩辕镜像的使用方法，找到最适合您系统的配置方式

Docker 配置

登录仓库拉取

通过 Docker 登录认证访问私有仓库

专属域名拉取

无需登录使用专属域名

K8s Containerd

Kubernetes 集群配置 Containerd

K3s

K3s 轻量级 Kubernetes 镜像加速

Dev Containers

VS Code Dev Containers 配置

Podman

Podman 容器引擎配置

Singularity/Apptainer

HPC 科学计算容器配置

其他仓库配置

ghcr、Quay、nvcr 等镜像仓库

Harbor 镜像源配置

Harbor Proxy Repository 对接专属域名

Portainer 镜像源配置

Portainer Registries 加速拉取

Nexus 镜像源配置

Nexus3 Docker Proxy 内网缓存

系统配置

Linux

在 Linux 系统配置镜像服务

Windows/Mac

在 Docker Desktop 配置镜像

MacOS OrbStack

MacOS OrbStack 容器配置

Docker Compose

Docker Compose 项目配置

NAS 设备

群晖

Synology 群晖 NAS 配置

飞牛

飞牛 fnOS 系统配置镜像

绿联

绿联 NAS 系统配置镜像

威联通

QNAP 威联通 NAS 配置

极空间

极空间 NAS 系统配置服务

网络设备

爱快路由

爱快 iKuai 路由系统配置

宝塔面板

在宝塔面板一键配置镜像

需要其他帮助？请查看我们的常见问题Docker 镜像访问常见问题解答或提交工单

镜像拉取常见问题

使用与功能问题

配置了专属域名后，docker search 为什么会报错？

docker search 限制

Docker Hub 上有的镜像，为什么在轩辕镜像网站搜不到？

站内搜不到镜像

机器不能直连外网时，怎么用 docker save / load 迁镜像？

离线 save/load

docker pull 拉插件报错（plugin v1+json）怎么办？

插件要用 plugin install

WSL 里 Docker 拉镜像特别慢，怎么排查和优化？

WSL 拉取慢

轩辕镜像安全吗？如何用 digest 校验镜像没被篡改？

安全与 digest

第一次用轩辕镜像拉 Docker 镜像，要怎么登录和配置？

新手拉取配置

错误码与失败问题

docker pull 提示 manifest unknown 怎么办？

manifest unknown

docker pull 提示 no matching manifest 怎么办？

no matching manifest（架构）

镜像已拉取完成，却提示 invalid tar header 或 failed to register layer 怎么办？

invalid tar header（解压）

Docker pull 时 HTTPS / TLS 证书验证失败怎么办？

TLS 证书失败

Docker pull 时 DNS 解析超时或连不上仓库怎么办？

DNS 超时

Docker 拉取出现 410 Gone 怎么办？

410 Gone 排查

出现 402 或「流量用尽」提示怎么办？

402 与流量用尽

Docker 拉取提示 UNAUTHORIZED（401）怎么办？

401 认证失败

遇到 429 Too Many Requests（请求太频繁）怎么办？

429 限流

docker login 提示 Cannot autolaunch D-Bus，还算登录成功吗？

D-Bus 凭证提示

为什么会出现「单层超过 20GB」或 413，无法加速拉取？

413 与超大单层

账号 / 计费 / 权限

轩辕镜像免费版和专业版有什么区别？

免费版与专业版区别

轩辕镜像支持哪些 Docker 镜像仓库？

支持的镜像仓库

镜像拉取失败还会不会扣流量？

失败是否计费

麒麟 V10 / 统信 UOS 提示 KYSEC 权限不够怎么办？

KYSEC 拦截脚本

如何在轩辕镜像申请开具发票？

申请开票

怎么修改轩辕镜像的网站登录和仓库登录密码？

修改登录密码

如何注销轩辕镜像账户？要注意什么？

注销账户

配置与原理类

写了 registry-mirrors，为什么还是走官方或仍然报错？

mirrors 不生效

怎么用 docker tag 去掉镜像名里的轩辕域名前缀？

去掉域名前缀

如何拉取指定 CPU 架构的镜像（如 ARM64、AMD64）？

指定架构拉取

用轩辕镜像拉镜像时快时慢，常见原因有哪些？

拉取速度原因

查看全部问题→

用户好评

来自真实用户的反馈，见证轩辕镜像的优质服务

oldzhang

运维工程师

Linux服务器

"Docker访问体验非常流畅，大镜像也能快速完成下载。"