bytez/samil24_whisper-large-sorani-v1 Docker Image Overview

bytez/samil24_whisper-large-sorani-v1

bytez

samil24/whisper-large-sorani-v1是一个基于Docker容器化的自动语音识别模型，支持本地部署与GPU加速，需Bytez API密钥，可配置生成参数并本地缓存模型权重，适用于语音转文本任务。

下载次数: 0状态：社区镜像维护者：bytez仓库类型：镜像最近更新：7 个月前

轩辕镜像，快一点，稳很多。点击查看

中文简介版本下载

轩辕镜像，快一点，稳很多。点击查看

samil24/whisper-large-sorani-v1 Docker镜像文档

镜像概述

samil24/whisper-large-sorani-v1是一个容器化的自动语音识别（automatic-speech-recognition）模型，基于Whisper-large架构优化，适用于Sorani语言及多语言语音转文本任务。该镜像支持本地部署，提供灵活的参数配置、GPU加速及模型权重本地缓存功能，需通过Bytez API密钥激活使用。

核心功能与特性

自动语音识别：支持将音频文件转换为文本，支持多语言（如指定language参数）及转录/翻译任务（通过task参数控制）
本地部署：通过Docker容器化运行，无需复杂环境配置
GPU加速：支持Nvidia GPU，通过--gpus参数启用
模型权重缓存：可将模型权重保存至本地目录，避免重复下载
灵活参数配置：提供丰富的生成参数（如长度控制、采样策略、logits调整等）
API集成：通过HTTP POST请求进行推理，易于与应用系统集成

使用场景

语音转文本应用开发
多语言语音数据处理
本地隐私敏感的语音识别任务
需要自定义语音识别参数的研究与开发
资源受限环境下的高效语音处理（通过本地缓存权重）

使用方法

前提条件

安装Docker环境（支持GPU需安装Nvidia Docker）
注册Bytez账号并获取API密钥：访问Bytez设置页面，登录后获取API密钥

拉取镜像

bash
docker pull bytez/samil24_whisper-large-sorani-v1

基础运行（CPU）

bash
docker run -it \
  -e KEY=YOUR_BYTEZ_API_KEY_HERE \
  -e PORT=8000 \
  -p 8000:8000 \
  bytez/samil24_whisper-large-sorani-v1

调整端口

如需使用其他端口（如80端口）：

bash
docker run -it \
  -e KEY=YOUR_BYTEZ_API_KEY_HERE \
  -e PORT=80 \
  -p 80:80 \
  bytez/samil24_whisper-large-sorani-v1

本地缓存模型权重

为避免重复下载大模型权重，可将权重保存至本地目录：

bash
docker run -it \
  -v /本地缓存目录路径:/server/model \
  -e HF_HOME=/server/model \
  -e KEY=YOUR_BYTEZ_API_KEY_HERE \
  -p 8000:8000 \
  -e PORT=8000 \
  bytez/samil24_whisper-large-sorani-v1

示例（Linux系统）：

bash
docker run -it \
  -v /home/user/models:/server/model \
  -e HF_HOME=/server/model \
  -e KEY=YOUR_BYTEZ_API_KEY_HERE \
  -p 8000:8000 \
  -e PORT=8000 \
  bytez/samil24_whisper-large-sorani-v1

GPU加速运行

确保已安装Nvidia驱动及CUDA，添加--gpus all参数：

bash
docker run -it \
  --gpus all \
  -e KEY=YOUR_BYTEZ_API_KEY_HERE \
  -p 8000:8000 \
  -e PORT=8000 \
  bytez/samil24_whisper-large-sorani-v1

GPU加速+本地缓存

bash
docker run -it \
  --gpus all \
  -v /本地缓存目录路径:/server/model \
  -e HF_HOME=/server/model \
  -e KEY=YOUR_BYTEZ_API_KEY_HERE \
  -p 8000:8000 \
  -e PORT=8000 \
  bytez/samil24_whisper-large-sorani-v1

发送推理请求

容器运行后，通过HTTP POST请求进行语音识别：

bash
curl --location 'http://0.0.0.0:8000/run' \
--header 'Content-Type: application/json' \
--data-raw '{
  "b64AudioBufferWav": "BASE64编码的WAV音频数据",
  "params": {
    "forward_params": {
      "language": "french",  // 指定语言（如"sorani"、"english"等）
      "task": "transcribe"   // "transcribe"（转录）或"translate"（翻译）
    }
  }
}'

环境变量配置

环境变量	说明	默认值	可选值
`KEY`	Bytez API密钥（必填）	-	从Bytez获取的API密钥
`PORT`	容器内部端口	8000	任意未占用端口
`DEVICE`	运行设备	`auto`	`auto`（自动检测）、`cuda`（GPU）、`cpu`（CPU）
`HF_HOME`	模型权重缓存路径	-	容器内路径（需配合`-v`挂载本地目录）

模型参数说明

长度控制

参数	类型	说明	默认值
`max_length`	int	生成 tokens 的最大长度	20
`max_new_tokens`	int	生成的新 tokens 最大数量（忽略提示中的 tokens）	-
`min_length`	int	生成序列的最小长度	0
`min_new_tokens`	int	生成的新 tokens 最小数量（忽略提示中的 tokens）	-
`early_stopping`	bool/str	控制基于beam的方法的停止条件	False
`max_time`	float	计算的最大运行时间	-

生成策略

参数	类型	说明	默认值
`do_sample`	bool	是否使用采样（而非贪婪解码）	False
`num_beams`	int	beam搜索的beam数量	1
`num_beam_groups`	int	用于beam多样性的组数	1
`penalty_alpha`	float	平衡模型置信度与退化惩罚	-
`use_cache`	bool	是否使用缓存加速解码	True

Logits调整

参数	类型	说明	默认值
`temperature`	float	调制下一个token概率的温度值	1
`top_k`	int	top-k过滤保留的最高概率词汇token数量	50
`top_p`	float	累积概率达到该值的最小token集	1
`typical_p`	float	局部典型性阈值	1
`epsilon_cutoff`	float	条件概率阈值（仅保留高于该值的token）	0
`eta_cutoff`	float	Eta采样阈值	0
`diversity_penalty`	float	组beam搜索中的多样性惩罚	0
`repetition_penalty`	float	重复惩罚参数（1.0表示无惩罚）	1
`encoder_repetition_penalty`	float	编码器重复惩罚参数	1
`length_penalty`	float	长度惩罚指数（>0促进长序列，<0促进短序列）	1
`no_repeat_ngram_size`	int	禁止重复的n-gram大小（>0时生效）	0
`bad_words_ids`	List[List[int]]	禁止生成的token id列表	-
`force_words_ids`	List[List[int]]	必须生成的token id列表	-
`renormalize_logits`	bool	应用logits处理器后是否重新归一化	false
`constraints`	List[Constraint]	自定义生成约束	-
`forced_bos_token_id`	int	强制作为第一个生成token的id	模型配置默认值
`forced_eos_token_id`	int/List[int]	强制作为结束token的id	模型配置默认值
`remove_invalid_values`	bool	是否移除nan/inf输出	模型配置默认值
`exponential_decay_length_penalty`	tuple(int, float)	指数衰减长度惩罚（起始索引, 衰减因子）	-
`suppress_tokens`	List[int]	生成时抑制的token列表	-
`begin_suppress_tokens`	List[int]	生成开始时抑制的token列表	-
`forced_decoder_ids`	List[List[int]]	强制生成的token映射（生成索引→token id）	-
`sequence_bias`	Dict[Tuple[int], float]	序列偏置（键为token序列，值为偏置项）	-
`guidance_scale`	float	分类器自由引导的引导尺度（>1启用）	-
`low_memory`	bool	是否启用低内存模式	-

生成参数

参数	类型	说明	默认值
`num_return_sequences`	int	每个输入返回的独立序列数	1
`output_attentions`	bool	是否返回注意力张量	false
`output_hidden_states`	bool	是否返回所有层的隐藏状态	false
`output_scores`	bool	是否返回预测分数	false
`output_logits`	bool	是否返回未处理的logit分数	-
`return_dict_in_generate`	bool	是否返回ModelOutput对象	false

特殊 tokens

参数	类型	说明	默认值
`pad_token_id`	int	填充token的id	-
`bos_token_id`	int	序列开始token的id	-
`eos_token_id`	int/List[int]	序列结束token的id	-

编码器-解码器生成参数

参数	类型	说明	默认值
`encoder_no_repeat_ngram_size`	int	编码器输入中禁止在解码器中出现的n-gram大小	0
`decoder_start_token_id`	int/List[int]	解码器开始token的id	模型配置默认值