mbari/sdcat

mbari

Sliced Detection and Clustering Analysis Toolkit (SDCAT) github.com/mbari-org/sdcat.git

下载次数: 0状态：社区镜像维护者：mbari仓库类型：镜像最近更新：1 个月前

让 AI 帮你使用轩辕镜像？ · 展开查看说明 · 点击收起说明

如果你使用 DeepSeek、元宝 AI、Copilot、ChatGPT 等 AI 工具，协助配置轩辕镜像、编写 docker pull 命令、修改 Docker Compose 镜像地址、配置镜像加速、排查镜像拉取失败、分析报错日志等问题，请先让 AI 阅读并遵守轩辕镜像的规则文档。

只需在 AI 对话中先发送下面这句话即可：

请先完整阅读并严格遵守以下文档中的全部规则与要求：

https://xuanyuan.cloud/agents.md

在未充分阅读并理解该文档前，不要生成任何命令、配置、修改建议、故障排查方案或技术回答。后续所有输出都必须严格以该文档中的规范为最高优先级执行。

查看 agents.md 用法指南与完整示范。国内用户首推元宝 AI、DeepSeek 的深度思考模式，不推荐豆包 AI；Cursor 等编辑器可在对话 @ 该链接，或加入 User Rules。若 AI 无法访问外链，可打开说明文档复制全文粘贴。文档会随站点更新，复制内容可能过期，建议定期检查。

镜像标签列表与下载命令

![MBARI]([***] https://img.shields.io/badge/%20%20%F0%9F%93%A6%F0%9F%9A%80-semantic--release-e***.svg](https://github.com/semantic-release/semantic-release) https://img.shields.io/badge/License-Apache_2.0-blue.svg](https://opensource.org/licenses/Apache-2.0) https://github.com/mbari-org/sdcat/actions/workflows/pytest.yml/badge.svg](https://github.com/mbari-org/sdcat/actions/workflows/pytest.yml) https://img.shields.io/docker/v/mbari/sdcat?sort=semver](https://hub.docker.com/r/mbari/sdcat) https://img.shields.io/docker/pulls/mbari/sdcat](https://hub.docker.com/r/mbari/sdcat)

sdcat* Sliced Detection and Clustering Analysis Toolkit*

Author: Danelle, *** . Reach out if you have questions, comments, or suggestions.

Features

Detection: Detects objects in images using a fine-grained saliency-based detection model, and/or an object detection models with the https://github.com/obss/sahi algorithm. These two algorithms can be com***ed through NMS (Non-Maximum Suppression) to produce the final detections.
- The detection models include YOLOv8s, YOLOS, and various MBARI-specific models for midwater and UAV images.
- The https://github.com/obss/sahi algorithm slices images into smaller windows and runs a detection model on the windows to improve detection accuracy.
Clustering: Clusters the detections using a Vision Transformer (ViT) model and the https://hdbscan.readthedocs.io/en/latest/how_hdbscan_works.html algorithm with a cosine similarity metric.
Analysis: Analyzes the clustering results.
- A summary of the is generated, including the number of clusters, cluster coverage, etc. in a JSON file. Example summary:

json
{
    "dataset": {
        "output": "/data/output",
        "clustering_algorithm": "HDBSCAN",
        "clustering_parameters": {
            "min_cluster_size": 2,
            "min_samples": 1,
            "cluster_selection_method": "leaf",
            "metric": "precomputed",
            "algorithm": "best",
            "alpha": 1.3,
            "cluster_selection_epsilon": 0.0,
            "use_pca": false
        },
        "feature_embedding_model": "MBARI-org/mbari-uav-vit-b-16",
        "roi": true,
        "input": [
            "/data/input"
        ],
        "image_resolution": "224x224 pixels",
        "detection_count": 328
    },
    "statistics": {
        "total_clusters": 5,
        "cluster_coverage": "0.99 (99.99%)",
        "top_predictions": [
            {
                "class": "Batray",
                "percentage": "89.33%"
            },
            {
                "class": "Buoy",
                "percentage": "2.44%"
            },
            {
                "class": "Otter",
                "percentage": "4.57%"
            },
            {
                "class": "Secci_Disc",
                "percentage": "0.30%"
            },
            {
                "class": "Shark",
                "percentage": "3.35%"
            }
        ]
    },
    "sdcat_version": "1.27.8",
    "command": "sdcat cluster roi --roi-dir /data/input --save-dir /data/output --device cpu --use-vits --vits-batch-size 10 --hdbscan-batch-size 100"
}

Visualization: Visualizes the detections and clusters.

Grid output

Cluster summary

Saliency detected bounding boxes

Who is this for?

If your images look something like the image below, and you want to detect objects in the images, and optionally cluster the detections, then this repository may be useful to you, particularly for discovery and/or to quickly gather training data to train a custom model.

The repository is designed to be run from the command line, and can be run in a Docker container, without or with a GPU (recommended). To use with a multiple gpus, use the --device cuda option To use with a single gpu, use the --device cuda:0,1 option

Detection

Detection can be done with a fine-grained saliency-based detection model, and/or one the following models run with the SAHI algorithm. Both detections algorithms (saliency and object detection) are run by default and com***ed to produce the final detections. SAHI is short for Slicing Aided Hyper Inference, and is a method to slice images into smaller windows and run a detection model on the windows.

Object Detection Model	Description	Installation
yolov8s	YOLOv8s model from Ultralytics	`pip install -U ultralytics`
yolov11s	YOLOv11s model from Ultralytics	`pip install -U ultralytics`
hustvl/yolos-small	YOLOS model a Vision Transformer (ViT)	included
hustvl/yolos-tiny	YOLOS model a Vision Transformer (ViT)	included
MBARI-org/megamidwater (default)	MBARI midwater YOLOv5x for general detection in midwater images	`pip install -U yolov5==7.0.14`
MBARI-org/uav-yolov5	MBARI UAV YOLOv5x for general detection in UAV images	`pip install -U yolov5==7.0.14`
MBARI-org/yolov5x6-uavs-oneclass	MBARI UAV YOLOv5x for general detection in UAV images single class	`pip install -U yolov5==7.0.14`
FathomNet/MBARI-315k-yolov5	MBARI YOLOv5x for general detection in benthic images	`pip install -U yolov5==7.0.14`
rfdetr-base	RF-DETR base model	`pip install -U inference rfdetr`
rfdetr-large	RF-DETR large model	`pip install -U inference rfdetr`

To skip saliency detection, use the --skip-saliency option.

shell
sdcat detect --skip-saliency --image-dir <image-dir> --save-dir <save-dir> --model <model> --slice-size-width 900 --slice-size-height 900

To skip using the SAHI algorithm, use --skip-sahi.

shell
sdcat detect --skip-sahi --image-dir <image-dir> --save-dir <save-dir> --model <model> --slice-size-width 900 --slice-size-height 900

ViTS + HDBSCAN Clustering

Once the detections are generated, the detections can be clustered. Alternatively, detections can be clustered from a collection of images, sometimes referred to as region of interests (ROIs) by providing the detections in a folder with the roi option.

CODE_TOKEN_3

Alternatively, you can provide a file containing a list of full paths to ROI images:

CODE_TOKEN_4

The clustering is done with a Vision Transformer (ViT) model, and a cosine similarity metric with the HDBSCAN algorithm. Clustering is generally done on a fine-grained scale, then clusters are combined using exemplars are extracted from each cluster - this is helpful to reassign noisy detections to the nearest cluster. This has been optimized to process data in batches of 50K (default) to support large collections of detections/rois.

What is an embedding? An embedding is a vector representation of an object in an image.

The defaults are set to produce fine-grained clusters, but the parameters can be adjusted to produce coarser clusters. The algorithm workflow looks like this:

Vision Transformer (ViT) Models	Description
google/vit-base-patch16-224(default)	16 block size trained on ImageNet21k with 21k classes
facebook/dino-vits8	trained on ImageNet which contains 1.3 M images with labels from 1000 classes
facebook/dino-vits16	trained on ImageNet which contains 1.3 M images with labels from 1000 classes
MBARI-org/mbari-uav-vit-b-16	MBARI UAV vits16 model trained on 10425 UAV images with labels from 21 classes

Smaller block_size means more patches and more accurate fine-grained clustering on smaller objects, so ViTS models with 8 block size are recommended for fine-grained clustering on small objects, and 16 is recommended for coarser clustering on larger objects. We recommend running with multiple models to see which model works best for your data, and to experiment with the --min-samples and --min-cluster-size options to get good clustering results.

Installation

Pip install the sdcat package with:

CODE_TOKEN_5

Alternatively, https://www.docker.com can be used to run the code. A pre-built docker image is available at https://hub.docker.com/r/mbari/sdcat with the latest version of the code.

First Detection

CODE_TOKEN_6

Followed by clustering CODE_TOKEN_7

A GPU is recommended for clustering and detection. If you don't have a GPU, you can still run the code, but it will be slower. If running on a CPU, multiple cores are recommended to speed up processing. Once your clustering is complete, subsequent runs will be faster as the necessary information is cached to support fast iteration.

CODE_TOKEN_8

Usage

To get all options available, use the --help option. For example:

CODE_TOKEN_9 which will print out the following: CODE_TOKEN_10

To get details on a particular command, use the --help option with the command. For example, with the cluster command:

CODE_TOKEN_11

which will print out the following: CODE_TOKEN_12

File organization

The sdcat toolkit generates data in the following folders.

For detections, the output is organized in a folder with the following structure:

CODE_TOKEN_13

For clustering, the output is organized in a folder with the following structure:

CODE_TOKEN_14

Process images creating bounding box detections with the YOLOv8s model.

The YOLOv8s model is not as accurate as other models, but is fast and good for detecting larger objects in images, and good for experiments and quick results. Slice size is the size of the detection window. The default is to allow the SAHI algorithm to determine the slice size; a smaller slice size will take longer to process.

CODE_TOKEN_15

Cluster detections from the YOLOv8s model, but use the classifications from the ViT model.

Cluster the detections from the YOLOv8s model. The detections are clustered using cosine similarity and embedding features from the default Vision Transformer (ViT) model google/vit-base-patch16-224

CODE_TOKEN_16

Performance Notes

🚀 The https://rapids.ai/ package is supported for speed-up with CUDA. Enable by using the --cuhdbscan option and installing RAPIDS. When RAPIDs is enabled, Euclidean distance as an approximation of cosine distance so the results may not be exactly the same as with the default HDBSCAN implementation.

Large collections of images the HDBSCAN is slow with cosine similarity , so to support processing large collections of detections/ROIs is done in batches. The --vits-batch-size option to set the batch size for your ViTS model and is default is 32. This means that the ViTS model will process 32 images at a time. For HDBSCAN, use the --hdbscan-batch-size option to set the batch size for HDBSCAN. You may want to maximize both of these batch sizes to speed up processing if you have a large collection of detections/ROIs.

Temporary Directory Sometimes it is useful to set an alternative temporary directory on systems with limited disk space, or if you want to use a faster disk for temporary files. To set a temporary directory, you can set the TMPDIR environment variable to the path of the directory you want to use. This directory is used to store temporary files created by the sdcat toolkit during processing. Much of the data is stored in the directory specified with the --save-dir option, but there are some temporary files are stored in the system's default temporary directory.

shell
export TMPDIR=/path/to/your/tmpdir

Related work

https://github.com/obss/sahi SAHI
https://arxiv.org/abs/2010.*** An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
https://github.com/***research/dinov2 DINOv2
https://arxiv.org/pdf/1911.02282.pdf HDBSCAN
https://github.com/muratkrty/specularity-removal Specularity Removal

镜像拉取方式

您可以使用以下命令拉取该镜像。请将 <标签> 替换为具体的标签版本。如需查看所有可用标签版本，请访问标签列表页面。

轩辕镜像加速拉取命令点我查看更多 sdcat 镜像标签

docker pull docker.xuanyuan.run/mbari/sdcat:<标签>

使用方法：

DockerHub 原生拉取命令

docker pull mbari/sdcat:<标签>

轩辕镜像配置手册

按平台快速找到配置文档

一键安装

一键安装 Docker

Linux Docker 一键安装

AI

用 AI 使用轩辕镜像

agents.md · AI 对话 · 提示词

Docker

登录仓库拉取

登录认证 · 私有仓库

专属域名拉取

免登录 · 高速拉取

Linux

Docker 镜像配置

Windows / Mac

Docker Desktop 配置

MacOS OrbStack

OrbStack 容器

Apple Container

macOS 原生容器

Docker Compose

Compose 项目配置

NAS

群晖

Synology 配置

飞牛

fnOS 镜像配置

绿联

绿联 NAS

威联通

QNAP 配置

极空间

极空间 NAS

Unraid

Unraid NAS

企业仓库

其他仓库

ghcr · Quay · nvcr

Harbor 镜像源

Proxy Repository 对接

Portainer 镜像源

Registries 配置

Nexus 镜像源

Docker Proxy 缓存

开发工具

Dev Containers

VS Code 开发容器

Podman

Podman 配置指南

Singularity / Apptainer

HPC 科学计算容器

Kubernetes

K8s Containerd

Kubernetes · Containerd

K3s

轻量级集群

面板 / 网络

爱快路由

iKuai 镜像加速

宝塔面板

一键配置镜像源

需要其他帮助？请查看我们的常见问题Docker 镜像访问常见问题解答或提交工单

镜像拉取常见问题

功能

版本功能对比

功能对比 · 版本选择

支持的镜像仓库

Docker Hub · GCR · GHCR

新手拉取配置

docker search 限制

专属域名 · Hub 搜索

不支持 push

仅支持 pull · 不支持

拉取速度原因

带宽 · 缓存 · 冷热镜像

错误码

402 与流量用尽

402 · 流量包 · 充值

401 认证失败

401 · docker login

manifest unknown

标签错误 · 镜像不存在

410 Gone 排查

410 · Docker 升级

429 限流

免费版 · 专业版 · 企业版 · 请求频率

其他报错

DNS 超时

DNS 解析 · 网络超时

TLS 证书失败

no matching manifest（架构）

账号

失败是否计费

manifest · blob · 计费

申请开发票（企业 / 个人）

企业 · 个人 · 工单

修改登录密码

网站 · 仓库 · 重置

注销账户

工单 · 数据 · 注销

原理

mirrors 不生效

daemon.json · 重启

去掉域名前缀

docker tag · 重命名

指定架构拉取

ARM64 · AMD64 · 多架构

latest 与「最新」

digest · 版本号 · 标签

查看全部问题→

用户好评

来自真实用户的反馈，见证轩辕镜像的优质服务

oldzhang

运维工程师

Linux服务器

"Docker访问体验非常流畅，大镜像也能快速完成下载。"