
如果你使用 DeepSeek、元宝 AI、Copilot、ChatGPT 等 AI 工具,协助配置轩辕镜像、编写 docker pull 命令、修改 Docker Compose 镜像地址、配置镜像加速、排查镜像拉取失败、分析报错日志等问题,请先让 AI 阅读并遵守轩辕镜像的规则文档。
只需在 AI 对话中先发送下面这句话即可:
请先完整阅读并严格遵守以下文档中的全部规则与要求:
https://xuanyuan.cloud/agents.md
在未充分阅读并理解该文档前,不要生成任何命令、配置、修改建议、故障排查方案或技术回答。后续所有输出都必须严格以该文档中的规范为最高优先级执行。查看 agents.md 用法指南与完整示范。国内用户首推 元宝 AI、DeepSeek 的深度思考模式,不推荐豆包 AI;Cursor 等编辑器可在对话 @ 该链接,或加入 User Rules。 若 AI 无法访问外链,可 打开说明文档 复制全文粘贴。文档会随站点更新,复制内容可能过期,建议定期检查。
Part of https://github.com/FaultMaven/faultmaven — The AI-Powered Troubleshooting Copilot
FaultMaven Knowledge Management Microservice - Open source RAG-powered knowledge base for troubleshooting documentation.
https://img.shields.io/badge/License-Apache%202.0-blue.svg](LICENSE) https://img.shields.io/badge/docker-ready-blue.svg](https://hub.docker.com/r/faultmaven/fm-knowledge-service)
The Knowledge Service implements a Retrieval-Augmented Generation (RAG) system for FaultMaven, allowing users to upload and search through troubleshooting documentation. Documents are chunked, embedded using BGE-M3 embeddings, and stored in a vector database for fast semantic search.
Features:
The service uses a provider pattern for vector database abstraction:
| Provider | Use Case | Scale | Configuration |
|---|---|---|---|
| ChromaDB (default) | Laptop/dev, self-hosted | ~100K documents | Embedded SQLite backend |
| Pinecone | Production, enterprise | Billions of documents | Managed cloud service |
Benefits:
bash# Run with persistent storage docker run -d -p 8004:8004 \ -v ./data/chromadb:/data/chromadb \ -v ./data/sqlite:/data/sqlite \ faultmaven/fm-knowledge-service:latest
The service will be available at http://localhost:8004.
See https://github.com/FaultMaven/faultmaven-deploy for complete deployment with all FaultMaven services.
bash# Clone repository git clone https://github.com/FaultMaven/fm-knowledge-service.git cd fm-knowledge-service # Create virtual environment python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate # Install dependencies pip install -e . # Run service uvicorn knowledge_service.main:app --reload --port 8004
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/v1/documents/upload | Upload document (multipart/form-data) |
| GET | /api/v1/documents | List user's documents |
| GET | /api/v1/documents/{document_id} | Get document metadata |
| DELETE | /api/v1/documents/{document_id} | Delete document |
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/v1/search | Semantic search across documents |
| POST | /api/v1/search/hybrid | Hybrid full-text + vector search |
| Method | Endpoint | Description |
|---|---|---|
| GET | /health | Health check |
Configuration via environment variables:
| Variable | Description | Default |
|---|---|---|
SERVICE_NAME | Service identifier | fm-knowledge-service |
ENVIRONMENT | Deployment environment | development |
PORT | Service port | 8004 |
LOG_LEVEL | Logging level | INFO |
| Variable | Description | Default |
|---|---|---|
DATABASE_URL | Database connection string | sqlite+aiosqlite:////data/sqlite/fm_knowledge.db |
Supported databases:
sqlite+aiosqlite:////data/sqlite/fm_knowledge.dbpostgresql+asyncpg://user:pass@host:5432/faultmaven| Variable | Description | Default |
|---|---|---|
VECTOR_DB_PROVIDER | Vector database provider (chroma or pinecone) | chroma |
ChromaDB Configuration (Default)
| Variable | Description | Default |
|---|---|---|
CHROMA_HOST | ChromaDB server host | localhost |
CHROMA_PORT | ChromaDB server port | 8007 |
CHROMADB_PATH | ChromaDB data directory (legacy) | /data/chromadb |
Pinecone Configuration
| Variable | Description | Default |
|---|---|---|
PINECONE_API_KEY | Pinecone API key | (required) |
PINECONE_ENVIRONMENT | Pinecone environment (e.g., us-east-1) | (required) |
PINECONE_INDEX_NAME | Pinecone index name | faultmaven-knowledge |
| Variable | Description | Default |
|---|---|---|
EMBEDDING_MODEL | Embeddings model name | BAAI/bge-m3 |
CHUNK_SIZE | Text chunk size | 1000 |
CHUNK_OVERLAP | Chunk overlap size | 200 |
MAX_UPLOAD_SIZE_MB | Maximum file size | 10 |
Development (ChromaDB):
bashVECTOR_DB_PROVIDER=chroma CHROMA_HOST=localhost CHROMA_PORT=8007 DATABASE_URL=sqlite+aiosqlite:////data/sqlite/fm_knowledge.db
Production (Pinecone + PostgreSQL):
bashVECTOR_DB_PROVIDER=pinecone PINECONE_API_KEY=your-api-key PINECONE_ENVIRONMENT=us-east-1 PINECONE_INDEX_NAME=faultmaven-production-kb DATABASE_URL=postgresql+asyncpg://user:pass@postgres:5432/faultmaven
Upload documents via multipart/form-data:
bashcurl -X POST http://localhost:8004/api/v1/documents/upload \ -H "X-User-ID: user_123" \ -F "file=@troubleshooting_guide.pdf" \ -F "title=Database Troubleshooting Guide" \ -F "description=Common database issues and solutions" \ -F "tags=database,performance,errors"
Response:
json{ "document_id": "doc_abc123", "user_id": "user_123", "filename": "troubleshooting_guide.pdf", "title": "Database Troubleshooting Guide", "description": "Common database issues and solutions", "file_type": "pdf", "file_size": 245678, "chunk_count": 42, "tags": ["database", "performance", "errors"], "created_at": "2025-11-16T10:30:00Z" }
Search documents using natural language queries:
bashcurl -X POST http://localhost:8004/api/v1/search \ -H "X-User-ID: user_123" \ -H "Content-Type: application/json" \ -d '{ "query": "How to fix database connection timeouts?", "limit": 5, "min_relevance": 0.7 }'
Response:
json{ "results": [ { "chunk_id": "chunk_001", "document_id": "doc_abc123", "document_title": "Database Troubleshooting Guide", "content": "Connection timeouts typically occur when...", "relevance_score": 0.92, "metadata": { "page": 15, "section": "Connection Issues" } } ], "query": "How to fix database connection timeouts?", "total_results": 5 }
| Extension | Format | Processing |
|---|---|---|
.txt | Plain text | Direct chunking |
.md | Markdown | Direct chunking |
.pdf | PyPDF2 extraction | |
.doc, .docx | Word | python-docx extraction |
.rtf | Rich Text | striprtf extraction |
python{ "document_id": str, # Unique identifier "user_id": str, # Owner user ID "filename": str, # Original filename "title": str, # Document title "description": str, # Optional description "file_type": str, # File extension "file_size": int, # Size in bytes "chunk_count": int, # Number of chunks "tags": List[str], # Searchable tags "created_at": datetime, # Upload timestamp "updated_at": datetime # Last modification }
python{ "chunk_id": str, # Unique chunk identifier "document_id": str, # Parent document "content": str, # Chunk text "embedding": List[float], # BGE-M3 vector (1024-dim) "metadata": { "user_id": str, "document_title": str, "chunk_index": int, "file_type": str, "tags": List[str] } }
This service uses trusted header authentication from the FaultMaven API Gateway:
X-User-ID (required): Identifies the user making the requestX-User-Email (optional): User's email addressX-User-Roles (optional): User's rolesAll document operations are scoped to the user specified in X-User-ID. Users can only access their own documents.
Important: This service should run behind the https://github.com/FaultMaven/faultmaven which handles authentication and sets these headers. Never expose this service directly to the internet.
┌─────────────────┐ │ API Gateway │ (Handles authentication) └────────┬────────┘ │ X-User-ID header ↓ ┌─────────────────┐ │ Knowledge Svc │ (Document processing) └────┬───────┬────┘ │ │ ↓ ↓ ┌─────────┐ ┌──────────────┐ │ SQLite │ │ ChromaDB │ │Metadata │ │Vector Store │ └─────────┘ └──────────────┘
bash# Run all tests pytest # Run with coverage pytest --cov=knowledge_service # Run specific test file pytest tests/test_documents.py -v
Apache 2.0 - See LICENSE for details.
See our https://github.com/FaultMaven/.github/blob/main/CONTRIBUTING.md for detailed guidelines.
您可以使用以下命令拉取该镜像。请将 <标签> 替换为具体的标签版本。如需查看所有可用标签版本,请访问 标签列表页面。
来自真实用户的反馈,见证轩辕镜像的优质服务