target/pod-reaper,否则将报错退出。主要环境变量分类如下:
| 类别 | 环境变量列表 |
|---|---|
| 基础配置 | NAMESPACE、GRACE_PERIOD、SCHEDULE、RUN_DURATION、EVICT、DRY_RUN |
| 标签/注解筛选 | EXCLUDE_LABEL_KEY、EXCLUDE_LABEL_VALUES、REQUIRE_LABEL_KEY、REQUIRE_LABEL_VALUES、REQUIRE_ANNOTATION_KEY、REQUIRE_ANNOTATION_VALUES |
| 日志配置 | LOG_LEVEL、LOG_FORMAT |
| 规则配置 | CHAOS_CHANCE、CONTAINER_STATUSES、POD_STATUSES、MAX_DURATION、MAX_UNREADY |
示例配置:
sh# 基础配置 NAMESPACE=test SCHEDULE=@every 30s RUN_DURATION=15m EXCLUDE_LABEL_KEY=pod-reaper EXCLUDE_LABEL_VALUES=disabled,false DRY_RUN=false # 启用规则(至少一个) CHAOS_CHANCE=.001 # 混沌概率规则
NAMESPACE""(监控所有命名空间)GRACE_PERIODnil(使用Pod默认优雅关闭周期)time.Duration格式(如"1h15m30s"),0s表示立即硬终止。SCHEDULE"@every 1m"(每分钟检查一次)"* * * * *",可选秒级精度"* * * * * *")或间隔格式(如"@every 30s")。"@every 1h2m3s"(每1小时2分3秒)、"12 * * * * *"(每分钟第12秒执行)。RUN_DURATION"0s"(无限期运行)time.Duration格式(如"15m"表示15分钟)。EXCLUDE_LABEL)避免自我删除导致运行时长不足。EVICTDRY_RUNfalse)1/t/T/TRUE/true/True(启用);0/f/F/FALSE/false/False(禁用)。EXCLUDE_LABEL_KEY 和 EXCLUDE_LABEL_VALUESEXCLUDE_LABEL_KEY且Value在EXCLUDE_LABEL_VALUES列表中,则被排除。EXCLUDE_LABEL_KEY=pod-reaper、EXCLUDE_LABEL_VALUES=disabled,false → 排除标签pod-reaper: disabled或pod-reaper: false的Pod。REQUIRE_LABEL_KEY 和 REQUIRE_LABEL_VALUESREQUIRE_LABEL_KEY且Value在REQUIRE_LABEL_VALUES列表中,才会被考虑。REQUIRE_ANNOTATION_KEY 和 REQUIRE_ANNOTATION_VALUESREQUIRE_LABEL,但基于Pod注解。LOG_LEVELInfoDebug、Info、Warning、Error、Fatal、Panic。LOG_FORMATLogrusLogrus:默认格式;Fluentd:适配Fluentd/Stackdriver的格式。json{"level":"info","msg":"loaded rule: chaos chance .3","time":"2017-10-18T17:09:25Z"} {"level":"info","msg":"executing reap cycle","time":"2017-10-18T17:09:55Z"} {"level":"info","msg":"reaping pod","pod":"hello-cloud-deployment-3026746346-bj65k","reasons":["was flagged for chaos","has been running for 3m6.257891269s"],"time":"2017-10-18T17:09:55Z"}
CHAOS_CHANCE为浮点值(范围[0,1))。生成随机数若小于该值,则标记Pod。shSCHEDULE=@every 30s # 每30秒检查一次 CHAOS_CHANCE=.01 # 1%概率删除匹配的Pod
EXCLUDE_LABEL排除关键Pod。CONTAINER_STATUSES为逗号分隔的容器状态列表(无空格)。若容器处于Waiting或Terminated状态且状态匹配,则标记Pod。shSCHEDULE=@every 10m # 每10分钟检查一次 CONTAINER_STATUSES=ImagePullBackOff,ErrImagePull,Error # 匹配这些容器状态
Evicted)。POD_STATUSES为逗号分隔的Pod状态列表(无空格)。若Pod状态匹配,则标记删除。shSCHEDULE=@every 10m # 每10分钟检查一次 POD_STATUSES=Evicted,Unknown # 匹配这些Pod状态
MAX_DURATION为Go语言time.Duration格式。若Pod运行时间超过该值,则标记删除。shSCHEDULE=@every 5m # 每5分钟检查一次 MAX_DURATION=2h # 删除运行超过2小时的Pod
MAX_UNREADY为Go语言time.Duration格式。若Pod未就绪时间超过该值,则标记删除。shSCHEDULE=@every 5m # 每5分钟检查一次 MAX_UNREADY=10m # 删除未就绪超过10分钟的Pod
pod-reaper依赖Kubernetes服务账户权限,需配置RBAC以允许列出和删除Pod(或使用驱逐API)。
yamlapiVersion: v1 kind: ServiceAccount metadata: name: pod-reaper namespace: default --- apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: pod-reaper-role namespace: default rules: - apiGroups: [""] resources: ["pods"] verbs: ["get", "list", "delete"] # 若启用EVICT,需添加"create"(用于创建Eviction) --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: pod-reaper-binding namespace: default subjects: - kind: ServiceAccount name: pod-reaper namespace: default roleRef: kind: Role name: pod-reaper-role apiGroup: rbac.authorization.k8s.io
yamlapiVersion: apps/v1 kind: Deployment metadata: name: pod-reaper namespace: default spec: replicas: 1 selector: matchLabels: app: pod-reaper template: metadata: labels: app: pod-reaper pod-reaper: disabled # 排除自身被删除 spec: serviceAccountName: pod-reaper containers: - name: pod-reaper image: target/pod-reaper:latest env: - name: NAMESPACE value: "default" - name: SCHEDULE value: "@every 1m" - name: EXCLUDE_LABEL_KEY value: "pod-reaper" - name: EXCLUDE_LABEL_VALUES value: "disabled" - name: CHAOS_CHANCE value: ".005" # 0.5%混沌概率
yamlapiVersion: v1 kind: Pod metadata: name: pod-reaper-once labels: pod-reaper: disabled # 排除自身 spec: serviceAccountName: pod-reaper restartPolicy: Never # 一次性运行 containers: - name: pod-reaper image: target/pod-reaper:latest env: - name: SCHEDULE value: "@every 30s" # 每30秒检查一次 - name: RUN_DURATION value: "15m" # 运行15分钟后退出 - name: CHAOS_CHANCE value: ".3" # 30%混沌概率
sh# 示例:删除运行超过2小时且未就绪超过10分钟的Pod MAX_DURATION=2h MAX_UNREADY=10m
sh# 实例1:混沌概率规则 CHAOS_CHANCE=.01 # 实例2:运行时长规则 MAX_DURATION=2h
EXCLUDE_LABEL为pod-reaper自身添加排除标签,避免自我删除。EVICT可尊重PodDisruptionBudget,适合生产环境。DRY_RUN=true验证规则效果,再实际执行删除。



manifest unknown 错误
TLS 证书验证失败
DNS 解析超时
410 错误:版本过低
402 错误:流量耗尽
身份认证失败错误
429 限流错误
凭证保存错误
来自真实用户的反馈,见证轩辕镜像的优质服务