
bytez/nsorana_my_awesome_modelModel: Nsorana/my_awesome_model
Task: text-classification
If you're just getting started, we recommend that you try out the Bytez Model Playground directly or use one of our Client Libraries to access the Bytez Inference API.
You'll receive 100 free credits of inference each month!
Javascript, Python, and Julia are currently supported.
You can play with models without having to write any code by visiting Bytez
<p>Models can also be explored:</p>
If that's not your cup of tea, keep reading!
Your API key will be front and center with a copy button, like you see in the image below:
bashdocker pull bytez/nsorana_my_awesome_model
bashdocker run -it \ -e KEY=YOUR_BYTEZ_API_KEY_HERE \ -e PORT=8000 \ -p 8000:8000 \ bytez/nsorana_my_awesome_model
NOTE you can adjust the port if needed by the -e PORT= environment variable and the -p option.
e.g. if you want to start the container on port 80, you'd do this instead:
bashdocker run -it \ -e KEY=YOUR_BYTEZ_API_KEY_HERE \ -e PORT=80 \ -p 80:80 \ bytez/nsorana_my_awesome_model
Send POST requests to the container and the model will reply.
bashcurl --location 'http://0.0.0.0:8000/run' \ --header 'Content-Type: application/json' \ --data-raw '{ "text": "I am absolutely furious about the situation! How could you possibly let this happen? Your complete lack of responsibility and incompetence is mind-boggling. This has caused an enormous amount of stress and inconvenience for everyone involved, and it's entirely unacceptable. I expect immediate action to rectify this mess. If this isn't resolved promptly, there will be serious consequences. I'm beyond frustrated and utterly disappointed in your performance.", "params": {} }'
To ensure that weights are saved locally between runs, you can specify a directory for where you want weights to be stored.
For large models, this is highly recommeded, as download times can be hours for larger models.
This can be specified via the -v option
To do this, run the following command:
bashdocker run -it \ -v /PATH/TO/YOUR/CACHING/DIRECTORY/HERE:/server/model \ -e HF_HOME=/server/model \ -e KEY=YOUR_BYTEZ_API_KEY_HERE \ -p 8000:8000 \ -e PORT=8000 \ bytez/nsorana_my_awesome_model
Notice how in the command above we have -v /PATH/TO/YOUR/CACHING/DIRECTORY/HERE:/server/model and -e HF_HOME=/server/model
The -v /PATH/TO/YOUR/CACHING/DIRECTORY/HERE:/server/model says, mount the directory -v /PATH/TO/YOUR/CACHING/DIRECTORY/HERE to the docker container's filesystem at the directory /server/model
-e HF_HOME=/server/model allows the code to load the model from the directory in the docker container, i.e. from the /server/model directory.
On my machine, the command looks like this:
bashdocker run -it \ -v /home/inf3rnus/models:/server/model \ -e HF_HOME=/server/model \ -e KEY=YOUR_BYTEZ_API_KEY_HERE \ -p 8000:8000 \ -e PORT=8000 \ bytez/nsorana_my_awesome_model
To run on GPU(s), make sure you have the latest drivers from Nvidia and CUDA installed.
Then, simply run the command from above, but with --gpus all added to the list of docker options.
bashdocker run -it \ --gpus all \ -e KEY=YOUR_BYTEZ_API_KEY_HERE \ -p 8000:8000 \ -e PORT=8000 \ bytez/nsorana_my_awesome_model
The two commands from above combined into one:
bashdocker run -it \ --gpus all \ -v /PATH/TO/YOUR/CACHING/DIRECTORY/HERE:/server/model \ -e HF_HOME=/server/model \ -e KEY=YOUR_BYTEZ_API_KEY_HERE \ -p 8000:8000 \ -e PORT=8000 \ bytez/nsorana_my_awesome_model
-e DEVICE="SOME_VALUE_HERE"
Defaults to auto
Can be:
-e DEVICE="cuda"
or
-e DEVICE="auto"
or
-e DEVICE="cpu"
auto will attempt to place the weights on the GPU if available, and then place them onto system RAM if there is not enough memory.
cuda will attempt to place the weights on the GPU
cpu will attempt to place the weights on the CPU
Allows you to specify with greater control which device you want to run the model on. Auto may split the model across system RAM and VRAM. You will often use this to attempt forcing the model to be loaded onto the GPU.
NOTE: Some models only exclusively work with auto, cuda, or cpu
Hop into the Bytez *** for live support: the community is happy to help. If you don't have ***, email us.
top_p or higher are kept for generation. Default: 1.typical_p or higher are kept for generation. See this paper for more details. Default: 1.epsilon_cutoff will be sampled. In the paper, suggested values range from 3e-4 to 9e-4, depending on the size of the model. See Truncation Sampling as Language Model Desmoothing for more details. Default: 0.eta_cutoff or sqrt(eta_cutoff) * exp(-entropy(softmax(next_token_logits))). The latter term is intuitively the expected next token probability, scaled by sqrt(eta_cutoff). In the paper, suggested values range from 3e-4 to 2e-3, depending on the size of the model. See Truncation Sampling as Language Model Desmoothing for more details. Default: 0.diversity_penalty is only effective if group beam search is enabled. Default: 0.length_penalty > 0.0 promotes longer sequences, while length_penalty < 0.0 encourages shorter sequences. Default: 1.~generation.NoBadWordsLogitsProcessor] for further documentation and examples.List[List[int]], this is treated as a simple list of words that must be included, the opposite to bad_words_ids. If given List[List[List[int]]], this triggers a disjunctive constraint, where one can allow different forms of each word.True as the search algorithms suppose the score logits are normalized but some logit processors or warpers break the normalization. Default: false.Constraint objects, in the most sensible way possible.decoder_start_token_id. Useful for multilingual models like mBART where the first generated token needs to be the target language token. Default: model.config.forced_bos_token_id.max_length is reached. Optionally, use a list to set multiple end-of-sequence tokens. Default: model.config.forced_eos_token_id.remove_invalid_values can slow down generation. Default: model.config.remove_invalid_values.(start_index, decay_factor) where start_index indicates where penalty starts and decay_factor represents the factor of exponential decaySupressTokens logit processor will set their log probs to -inf so that they are not sampled.SupressBeginTokens logit processor will set their log probs to -inf so that they are not sampled.[[1, 123]] means the second generated token will always be a token of index 123.guidance_scale > 1. Higher guidance scale encourages the model to generate samples that are more closely linked to the input prompt, usually at the expense of poorer quality.attentions under returned tensors for more details. Default: false.hidden_states under returned tensors for more details. Default: false.scores under returned tensors for more details. Default: false.logits under returned tensors for more details.~utils.ModelOutput] instead of a plain tuple. Default: false.encoder_input_ids cannot occur in the decoder_input_ids. Default: 0.batch_size. Indicating a list enables different start ids for each element in the batch (e.g. multilingual models with different target languages in one batch)num_assistant_tokens make the generation more speculative: If the assistant model is performant larger speed-ups can be reached, if the assistant model requires lots of corrections, lower speed-ups are reached. Default: 5.heuristic: When all speculative tokens are correct, increase num_assistant_tokens by 2 else reduce by 1. num_assistant_tokens value is persistent over multiple generation calls with the same assistant model. - heuristic_transient: Same as heuristic but num_assistant_tokens is reset to its initial value after each generation call. - constant: num_assistant_tokens stays unchanged during generation Default: heuristic.generate function of the model. Kwargs that are not present in generate's signature will be used in the model forward pass.Full parameter list available here, courtesy of Hugging Face.
Using models locally offers enhanced privacy, control, and customization for your projects. Happy building!
探索更多轩辕镜像的使用方法,找到最适合您系统的配置方式
通过 Docker 登录认证访问私有仓库
无需登录使用专属域名
Kubernetes 集群配置 Containerd
K3s 轻量级 Kubernetes 镜像加速
VS Code Dev Containers 配置
Podman 容器引擎配置
HPC 科学计算容器配置
ghcr、Quay、nvcr 等镜像仓库
Harbor Proxy Repository 对接专属域名
Portainer Registries 加速拉取
Nexus3 Docker Proxy 内网缓存
需要其他帮助?请查看我们的 常见问题Docker 镜像访问常见问题解答 或 提交工单
manifest unknown
no matching manifest(架构)
invalid tar header(解压)
TLS 证书失败
DNS 超时
410 Gone 排查
402 与流量用尽
401 认证失败
429 限流
D-Bus 凭证提示
413 与超大单层
来自真实用户的反馈,见证轩辕镜像的优质服务