
crawlabteam/crawlab-proCrawlab Pro是一个基于Docker的分布式爬虫管理平台专业版镜像,旨在为开发者和企业提供一站式的爬虫开发、部署、管理和监控解决方案。该镜像封装了Crawlab Pro的全部核心功能,可快速部署在各种环境中,极大简化了爬虫系统的搭建和维护流程。
使用Docker Compose快速部署:
yamlversion: '3.8' services: master: image: crawlab/pro:latest container_name: crawlab-master restart: always environment: - CRAWLAB_NODE_MASTER=true - CRAWLAB_MONGO_URI=mongodb://mongo:27017/crawlab - CRAWLAB_REDIS_URI=redis://redis:6379/0 - CRAWLAB_USERNAME=admin - CRAWLAB_PASSWORD=admin ports: - "8080:8080" depends_on: - mongo - redis mongo: image: mongo:4.4 container_name: crawlab-mongo restart: always volumes: - mongo-data:/data/db redis: image: redis:6-alpine container_name: crawlab-redis restart: always volumes: - redis-data:/data command: redis-server --appendonly yes volumes: mongo-data: redis-data:
启动服务:
bashdocker-compose up -d
访问Web界面:http://localhost:8080,使用默认账号密码(admin/admin)登录。
添加工作节点:
yamlworker: image: crawlab/pro:latest container_name: crawlab-worker restart: always environment: - CRAWLAB_NODE_MASTER=false - CRAWLAB_GRPC_ADDRESS=master:9000 - CRAWLAB_MONGO_URI=mongodb://mongo:27017/crawlab - CRAWLAB_REDIS_URI=redis://redis:6379/0 depends_on: - master
| 环境变量 | 描述 | 默认值 |
|---|---|---|
CRAWLAB_NODE_MASTER | 是否为主节点 | false |
CRAWLAB_GRPC_ADDRESS | GRPC服务地址 | localhost:9000 |
CRAWLAB_HTTP_ADDRESS | HTTP服务地址 | 0.0.0.0:8080 |
CRAWLAB_MONGO_URI | MongoDB连接URI | mongodb://localhost:27017/crawlab |
CRAWLAB_REDIS_URI | Redis连接URI | redis://localhost:6379/0 |
CRAWLAB_USERNAME | 管理员用户名 | admin |
CRAWLAB_PASSWORD | 管理员密码 | admin |
CRAWLAB_LOG_LEVEL | 日志级别 | info |
CRAWLAB_TASK_CONCURRENCY | 任务并发数 | 5 |
CRAWLAB_REGISTER_TYPE | 节点注册方式 | manual |
为确保数据持久化,建议挂载以下目录:
yamlvolumes: - ./data:/app/data - ./logs:/app/logs - ./tasks:/app/tasks
通过Nginx反向代理配置自定义域名和SSL:
nginxserver { listen 80; server_name crawlab.example.com; return 301 [***] } server { listen 443 ssl; server_name crawlab.example.com; ssl_certificate /etc/nginx/ssl/cert.pem; ssl_certificate_key /etc/nginx/ssl/key.pem; location / { proxy_pass [***] proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; } }
添加更多工作节点以提高爬取能力:
bash# 添加工作节点1 docker run -d --name crawlab-worker-1 \ --link crawlab-master:master \ --link crawlab-mongo:mongo \ --link crawlab-redis:redis \ -e CRAWLAB_NODE_MASTER=false \ -e CRAWLAB_GRPC_ADDRESS=master:9000 \ -e CRAWLAB_MONGO_URI=mongodb://mongo:27017/crawlab \ -e CRAWLAB_REDIS_URI=redis://redis:6379/0 \ crawlab/pro:latest # 添加工作节点2 docker run -d --name crawlab-worker-2 \ --link crawlab-master:master \ --link crawlab-mongo:mongo \ --link crawlab-redis:redis \ -e CRAWLAB_NODE_MASTER=false \ -e CRAWLAB_GRPC_ADDRESS=master:9000 \ -e CRAWLAB_MONGO_URI=mongodb://mongo:27017/crawlab \ -e CRAWLAB_REDIS_URI=redis://redis:6379/0 \ crawlab/pro:latest
bashdocker logs -f crawlab-master
bash# 拉取最新镜像 docker pull crawlab/pro:latest # 重启服务 docker-compose down docker-compose up -d
bash# 备份MongoDB数据 docker exec crawlab-mongo mongodump --db crawlab --out /data/backup docker cp crawlab-mongo:/data/backup ./backup # 备份任务代码 docker cp crawlab-master:/app/tasks ./tasks-backup
检查端口映射和防火墙设置:
bash# 检查容器状态 docker ps | grep crawlab-master # 检查端口监听 netstat -tulpn | grep 8080
检查日志获取详细错误信息:
bash# 查看爬虫任务日志 docker exec -it crawlab-master cat /app/logs/task/[task-id].log
通过环境变量重置密码:
yamlenvironment: - CRAWLAB_RESET_PASSWORD=true - CRAWLAB_USERNAME=admin - CRAWLAB_PASSWORD=newpassword
重启容器后,密码将被重置,之后可以移除CRAWLAB_RESET_PASSWORD环境变量。
Crawlab Pro是商业软件,使用前请确保已获得合法授权。详细许可条款请参见官方许可协议。
manifest unknown 错误
TLS 证书验证失败
DNS 解析超时
410 错误:版本过低
402 错误:流量耗尽
身份认证失败错误
429 限流错误
凭证保存错误
来自真实用户的反馈,见证轩辕镜像的优质服务