专属域名
文档搜索
轩辕助手
Run助手
邀请有礼
返回顶部
快速返回页面顶部
收起
收起工具栏
轩辕镜像 官方专业版
轩辕镜像 官方专业版轩辕镜像 官方专业版官方专业版
首页个人中心搜索镜像

交易
充值流量我的订单
工具
提交工单镜像收录一键安装
Npm 源Pip 源Homebrew 源
帮助
常见问题
其他
关于我们网站地图

官方QQ群: 1072982923

milqmedia/alertmanager Docker 镜像 - 轩辕镜像 | Docker 镜像高效稳定拉取服务

热门搜索:openclaw🔥nginx🔥redis🔥mysqlopenjdkcursorweb2apimemgraphzabbixetcdubuntucorednsjdk
alertmanager
milqmedia/alertmanager
milqmedia
monitoring alertmanager
下载次数: 0状态:社区镜像维护者:milqmedia仓库类型:镜像最近更新:7 年前
轩辕镜像,加速的不只是镜像。点击查看
镜像简介版本下载
轩辕镜像,加速的不只是镜像。点击查看

swarmprom

Swarmprom is a starter kit for Docker Swarm monitoring with Prometheus, Grafana, cAdvisor, Node Exporter, Alert Manager and Unsee.

Install

Clone this repository and run the monitoring stack:

bash
$ git clone [***]
$ cd swarmprom

ADMIN_USER=admin \
ADMIN_PASSWORD=admin \
SLACK_URL=[***] \
SLACK_CHANNEL=devops-alerts \
SLACK_USER=alertmanager \
docker stack deploy -c docker-compose.yml mon

Prerequisites:

  • Docker CE 17.09.0-ce or Docker EE 17.06.2-ee-3
  • Swarm cluster with one manager and a worker node
  • Docker engine experimental enabled and metrics address set to 0.0.0.0:9323

Services:

  • prometheus (metrics database) http://<swarm-ip>:9090
  • grafana (visualize metrics) http://<swarm-ip>:3000
  • node-exporter (host metrics collector)
  • cadvisor (containers metrics collector)
  • dockerd-exporter (Docker daemon metrics collector, requires Docker experimental metrics-addr to be enabled)
  • alertmanager (alerts dispatcher) http://<swarm-ip>:9093
  • unsee (alert manager dashboard) http://<swarm-ip>:9094
  • caddy (reverse proxy and basic auth provider for prometheus, alertmanager and unsee)

Setup Grafana

Navigate to http://<swarm-ip>:3000 and login with user admin password admin. You can change the credentials in the compose file or by supplying the ADMIN_USER and ADMIN_PASSWORD environment variables at stack deploy.

Swarmprom Grafana is preconfigured with two dashboards and Prometheus as the default data source:

  • Name: Prometheus
  • Type: Prometheus
  • Url: [***]
  • Access: proxy

After you login, click on the home drop down, in the left upper corner and you'll see the dashboards there.

Docker Swarm Nodes Dashboard

!Nodes

URL: http://<swarm-ip>:3000/dashboard/db/docker-swarm-nodes

This dashboard shows key metrics for monitoring the resource usage of your Swarm nodes and can be filtered by node ID:

  • Cluster up-time, number of nodes, number of CPUs, CPU idle gauge
  • System load average graph, CPU usage graph by node
  • Total memory, available memory gouge, total disk space and available storage gouge
  • Memory usage graph by node (used and cached)
  • I/O usage graph (read and write Bps)
  • IOPS usage (read and write operation per second) and CPU IOWait
  • Running containers graph by Swarm service and node
  • Network usage graph (inbound Bps, outbound Bps)
  • Nodes list (instance, node ID, node name)

Docker Swarm Services Dashboard

!Nodes

URL: http://<swarm-ip>:3000/dashboard/db/docker-swarm-services

This dashboard shows key metrics for monitoring the resource usage of your Swarm stacks and services, can be filtered by node ID:

  • Number of nodes, stacks, services and running container
  • Swarm tasks graph by service name
  • Health check graph (total health checks and failed checks)
  • CPU usage graph by service and by container (top 10)
  • Memory usage graph by service and by container (top 10)
  • Network usage graph by service (received and transmitted)
  • Cluster network traffic and IOPS graphs
  • Docker engine container and network actions by node
  • Docker engine list (version, node id, OS, kernel, graph driver)

Prometheus Stats Dashboard

!Nodes

URL: http://<swarm-ip>:3000/dashboard/db/prometheus

  • Uptime, local storage memory chunks and series
  • CPU usage graph
  • Memory usage graph
  • Chunks to persist and persistence urgency graphs
  • Chunks ops and checkpoint duration graphs
  • Target scrapes, rule evaluation duration, samples ingested rate and scrape duration graphs

Prometheus service discovery

In order to collect metrics from Swarm nodes you need to deploy the exporters on each server. Using global services you don't have to manually deploy the exporters. When you scale up your cluster, Swarm will launch a cAdvisor, node-exporter and dockerd-exporter instance on the newly created nodes. All you need is an automated way for Prometheus to reach these instances.

Running Prometheus on the same overlay network as the exporter services allows you to use the DNS service discovery. Using the exporters service name, you can configure DNS discovery:

yaml
scrape_configs:
  - job_name: 'node-exporter'
    dns_sd_configs:
    - names:
      - 'tasks.node-exporter'
      type: 'A'
      port: 9100
  - job_name: 'cadvisor'
    dns_sd_configs:
    - names:
      - 'tasks.cadvisor'
      type: 'A'
      port: 8080
  - job_name: 'dockerd-exporter'
    dns_sd_configs:
    - names:
      - 'tasks.dockerd-exporter'
      type: 'A'
      port: 9323

When Prometheus runs the DNS lookup, Docker Swarm will return a list of IPs for each task. Using these IPs, Prometheus will bypass the Swarm load-*** and will be able to scrape each exporter instance.

The problem with this approach is that you will not be able to tell which exporter runs on which node. Your Swarm nodes' real IPs are different from the exporters IPs since exporters IPs are dynamically assigned by Docker and are part of the overlay network. Swarm doesn't provide any records for the tasks DNS, besides the overlay IP. If Swarm provides SRV records with the nodes hostname or IP, you can re-label the source and overwrite the overlay IP with the real IP.

In order to tell which host a node-exporter instance is running, I had to create a prom file inside the node-exporter containing the hostname and the Docker Swarm node ID.

When a node-exporter container starts node-meta.prom is generated with the following content:

bash
"node_meta{node_id=\"$NODE_ID\", node_name=\"$NODE_NAME\"} 1"

The node ID value is supplied via {{.Node.ID}} and the node name is extracted from the /etc/hostname file that is mounted inside the node-exporter container.

yaml
  node-exporter:
    image: stefanprodan/swarmprom-node-exporter
    environment:
      - NODE_ID={{.Node.ID}}
    volumes:
      - /etc/hostname:/etc/nodename
    command:
      - '-collector.textfile.directory=/etc/node-exporter/'

Using the textfile command, you can instruct node-exporter to collect the node_meta metric. Now that you have a metric containing the Docker Swarm node ID and name, you can use it in promql queries.

Let's say you want to find the available memory on each node, normally you would write something like this:

sum(node_memory_MemAvailable) by (instance)

{instance="10.0.0.5:9100"} 889450496
{instance="10.0.0.13:9100"} ***
{instance="10.0.0.15:9100"} ***

The above result is not very helpful since you can't tell what Swarm node is behind the instance IP. So let's write that query taking into account the node_meta metric:

sql
sum(node_memory_MemAvailable * on(instance) group_left(node_id, node_name) node_meta) by (node_id, node_name)

{node_id="wrdvtftteo0uaekmdq4dxrn14",node_name="swarm-manager-1"} 889450496
{node_id="moggm3uaq8tax9ptr1if89pi7",node_name="swarm-worker-1"} ***
{node_id="vkdfx99mm5u4xl2drqhnwtnsv",node_name="swarm-worker-2"} ***

This is much better. Instead of overlay IPs, now I can see the actual Docker Swarm nodes ID and hostname. Knowing the hostname of your nodes is useful for alerting as well.

You can define an alert when available memory reaches 10%. You also will receive the hostname in the alert message and not some overlay IP that you can't correlate to a infrastructure item.

Maybe you are wondering why you need the node ID if you have the hostname. The node ID will help you match node-exporter instances to cAdvisor instances. All metrics exported by cAdvisor have a label named container_label_com_docker_swarm_node_id, and this label can be used to filter containers metrics by Swarm nodes.

Let's write a query to find out how many containers are running on a Swarm node. Knowing from the node_meta metric all the nodes IDs you can define a filter with them in Grafana. Assuming the filter is $node_id the container count query should look like this:

count(rate(container_last_seen{container_label_com_docker_swarm_node_id=~"$node_id"}[5m])) 

Another use case for node ID is filtering the metrics provided by the Docker engine daemon. Docker engine doesn't have a label with the node ID attached on every metric, but there is a swarm_node_info metric that has this label. If you want to find out the number of failed health checks on a Swarm node you would write a query like this:

sum(engine_daemon_health_checks_failed_total) * on(instance) group_left(node_id) swarm_node_info{node_id=~"$node_id"})  

For now the engine metrics are still experimental. If you want to use dockerd-exporter you have to enable the experimental feature and set the metrics address to 0.0.0.0:9323.

If you are running Docker with systemd create or edit /etc/systemd/system/docker.service.d/docker.conf file like so:

[Service]
ExecStart=
ExecStart=/usr/bin/dockerd -H fd:// \
  --storage-driver=overlay2 \
  --dns 8.8.4.4 --dns 8.8.8.8 \
  --experimental=true \
  --metrics-addr 0.0.0.0:9323

Apply the config changes with systemctl daemon-reload && systemctl restart docker and check if the docker_gwbridge ip address is 172.18.0.1:

bash
ip -o addr show docker_gwbridge

Replace 172.18.0.1 with your docker_gwbridge address in the compose file:

yaml
  dockerd-exporter:
    image: stefanprodan/caddy
    environment:
      - DOCKER_GWBRIDGE_IP=172.18.0.1

Collecting Docker Swarm metrics with Prometheus is not a smooth process, and because of group_left queries tend to become more complex. In the future I hope Swarm DNS will contain the SRV record for hostname and Docker engine metrics will expose container metrics replacing cAdvisor all together.

Configure Prometheus

I've set the Prometheus retention period to 24h and the heap size to 1GB, you can change these values in the compose file.

yaml
  prometheus:
    image: stefanprodan/swarmprom-prometheus
    command:
      - '-storage.local.target-heap-size=***'
      - '-storage.local.retention=24h'
    deploy:
      resources:
        limits:
          memory: 2048M
        reservations:
          memory: 1024M

Set the heap size to a maximum of 50% of the total physical memory.

When using host volumes you should ensure that Prometheus doesn't get scheduled on different nodes. You can pin the Prometheus service on a specific host with placement constraints.

yaml
  prometheus:
    image: stefanprodan/swarmprom-prometheus
    volumes:
      - prometheus:/prometheus
    deploy:
      mode: replicated
      replicas: 1
      placement:
        constraints:
          - node.labels.monitoring.role == prometheus

Configure alerting

The Prometheus swarmprom comes with the following alert rules:

Swarm Node CPU Usage

Alerts when a node CPU usage goes over 80% for five minutes.

ALERT node_cpu_usage
  IF 100 - (avg(irate(node_cpu{mode="idle"}[1m])  * on(instance) group_left(node_name) node_meta * 100) by (node_name)) > 80
  FOR 5m
  LABELS      { severity="warning" }
  ANNOTATIONS {
      summary = "CPU alert for Swarm node '{{ $labels.node_name }}'",
      description = "Swarm node {{ $labels.node_name }} CPU usage is at {{ humanize $value}}%.",
  }

Swarm Node Memory Alert

Alerts when a node memory usage goes over 80% for five minutes.

ALERT node_memory_usage
  IF sum(((node_memory_MemTotal - node_memory_MemAvailable) / node_memory_MemTotal) * on(instance) group_left(node_name) node_meta * 100) by (node_name) > 80
  FOR 5m
  LABELS      { severity="warning" }
  ANNOTATIONS {
      summary = "Memory alert for Swarm node '{{ $labels.node_name }}'",
      description = "Swarm node {{ $labels.node_name }} memory usage is at {{ humanize $value}}%.",
  }

Swarm Node Disk Alert

Alerts when a node storage usage goes over 85% for five minutes.

ALERT node_disk_usage
  IF ((node_filesystem_size{mountpoint="/rootfs"} - node_filesystem_free{mountpoint="/rootfs"}) * 100 / node_filesystem_size{mountpoint="/rootfs"}) * on(instance) group_left(node_name) node_meta > 85
  FOR 5m
  LABELS      { severity="warning" }
  ANNOTATIONS {
      summary = "Disk alert for Swarm node '{{ $labels.node_name }}'",
      description = "Swarm node {{ $labels.node_name }} disk usage is at {{ humanize $value}}%.",
  }

Swarm Node Disk Fill Rate Alert

Alerts when a node storage is going to remain out of free space in six hours.

ALERT node_disk_fill_rate_6h
  IF predict_linear(node_filesystem_free{mountpoint="/rootfs"}[1h], 6*3600) * on(instance) group_left(node_name) node_meta < 0
  FOR 1h
  LABELS      { severity="critical" }
  ANNOTATIONS {
      summary = "Disk fill alert for Swarm node '{{ $labels.node_name }}'",
      description = "Swarm node {{ $labels.node_name }} disk is going to fill up in 6h.",
  }

You can add alerts to swarm_node and swarm_task files and rerun stack deploy to update them. Because these files are mounted inside the Prometheus container at run time as Docker configs you don't have to bundle them with the image.

The Alertmanager swarmprom image is configured with the Slack receiver. In order to receive alerts on Slack you have to provide the Slack API url, username and channel via environment variables:

yaml
  alertmanager:
    image: stefanprodan/swarmprom-alertmanager
    environment:
      - SLACK_URL=${SLACK_URL}
      - SLACK_CHANNEL=${SLACK_CHANNEL}
      - SLACK_USER=${SLACK_USER}

You can install the stress package with apt and test out the CPU alert, you should receive something like this:

!Alerts

Cloudflare has made a great dashboard for managing alerts. Unsee can aggregate alerts from multiple Alertmanager instances, running either in HA mode or separate. You can access unsee at http://<swarm-ip>:9094 using the admin user/password set via compose up:

!Unsee

Monitoring applications and backend services

You can extend swarmprom with special-purpose exporters for services like MongoDB, PostgreSQL, Kafka, Redis and also instrument your own applications using the Prometheus client libraries.

In order to scrape other services you need to attach those to the mon_net network so Prometheus can reach them. Or you can attach the mon_prometheus service to the networks where your services are running.

Once your services are reachable by Prometheus you can add the dns name and port of those services to the Prometheus config using the JOBS environment variable:

yaml
  prometheus:
    image: stefanprodan/swarmprom-prometheus
    environment:
      - JOBS=mongo-exporter:9216 kafka-exporter:9216 redis-exporter:9216

Monitoring production systems

The swarmprom project is meant as a starting point in developing your own monitoring solution. Before running this in production you should *** building and publishing your own Prometheus, node exporter and alert manager images. Docker Swarm doesn't play well with locally built images, the first step would be to setup a secure Docker registry that your Swarm has access to and push the images there. Your CI system should assign version tags to each image. Don't rely on the latest tag for continuous deployments, Prometheus will soon reach v2 and the data store will not be backwards compatible with v1.x.

Another thing you should *** is having redundancy for Prometheus and alert manager. You could run them as a service with two replicas pinned on different nodes, or even better, use a service like Weave Cloud Cortex to ship your metrics outside of your current setup. You can use Weave Cloud not only as a backup of your metrics database but you can also define alerts and use it as a data source for your Grafana dashboards. Having the alerting and monitoring system hosted on a different platform other than your production is good practice that will allow you to react quickly and efficiently when a major disaster strikes.

Swarmprom comes with built-in Weave Cloud integration, what you need to do is run the weave-compose stack with your Weave service token:

bash
TOKEN=<WEAVE-TOKEN> \
ADMIN_USER=admin \
ADMIN_PASSWORD=admin \
docker stack deploy -c weave-compose.yml mon

This will deploy Weave Scope and Prometheus with Weave Cortex as remote write. The local retention is set to 24h so even if your internet connection drops you'll not lose data as Prometheus will retry pushing data to Weave Cloud when the connection is up again.

You can define alerts and notifications routes in Weave Cloud in the same way you would do with alert manager.

To use Grafana with Weave Cloud you have to reconfigure the Prometheus data source like this:

  • Name: Prometheus
  • Type: Prometheus
  • Url: [***]
  • Access: proxy
  • Basic auth: use your service token as password, the user value is ignored

Weave Scope automatically generates a map of your application, enabling you to intuitively understand, monitor, and control your microservices based application. You can view metrics, tags and metadata of the running processes, containers and hosts. Scope offers remote access to the Swarm’s nods and containers, making it easy to diagnose issues in real-time.

!Scope

!Scope Hosts

查看更多 alertmanager 相关镜像 →
bitnami/alertmanager logo
bitnami/alertmanager
bitnami
alertmanager的Bitnami安全镜像
14 次收藏1000万+ 次下载
7 个月前更新
ubuntu/alertmanager logo
ubuntu/alertmanager
Ubuntu 官方镜像
Ubuntu Rock版Alertmanager,用于处理Prometheus等客户端应用发送的警报,支持去重、分组、路由至接收器(如邮件、PagerDuty),并提供警报静音和抑制功能,基于Ubuntu且接收安全更新。
1万+ 次下载
22 天前更新
cortexproject/alertmanager logo
cortexproject/alertmanager
cortexproject
暂无描述
1万+ 次下载
6 年前更新
openeuler/alertmanager logo
openeuler/alertmanager
openeuler
基于openEuler构建的官方Alertmanager镜像,用于处理客户端应用(如Prometheus)发送的告警,支持告警去重、分组、路由至邮件等接收器,并提供静默和抑制功能。
921 次下载
1 个月前更新
prom/alertmanager logo
prom/alertmanager
prom
prom/alertmanager是Prometheus生态的告警管理组件,用于处理来自Prometheus服务器等客户端的告警,提供去重、分组、路由至邮件/PagerDuty等接收器的功能,并支持告警静默和抑制,确保告警高效分发与管理。
248 次收藏1亿+ 次下载
10 天前更新
functions/alertmanager logo
functions/alertmanager
functions
FaaS是一个基于Docker的无服务器函数框架,支持指标监控,可将任何进程打包为函数,无需重复样板代码即可处理各类Web事件,具备易用UI、跨语言支持、可移植性和自动扩展能力。
1 次收藏100万+ 次下载
6 年前更新

轩辕镜像配置手册

探索更多轩辕镜像的使用方法,找到最适合您系统的配置方式

Docker 配置

登录仓库拉取

通过 Docker 登录认证访问私有仓库

专属域名拉取

无需登录使用专属域名

K8s Containerd

Kubernetes 集群配置 Containerd

K3s

K3s 轻量级 Kubernetes 镜像加速

Dev Containers

VS Code Dev Containers 配置

Podman

Podman 容器引擎配置

Singularity/Apptainer

HPC 科学计算容器配置

其他仓库配置

ghcr、Quay、nvcr 等镜像仓库

Harbor 镜像源配置

Harbor Proxy Repository 对接专属域名

Portainer 镜像源配置

Portainer Registries 加速拉取

Nexus 镜像源配置

Nexus3 Docker Proxy 内网缓存

系统配置

Linux

在 Linux 系统配置镜像服务

Windows/Mac

在 Docker Desktop 配置镜像

MacOS OrbStack

MacOS OrbStack 容器配置

Docker Compose

Docker Compose 项目配置

NAS 设备

群晖

Synology 群晖 NAS 配置

飞牛

飞牛 fnOS 系统配置镜像

绿联

绿联 NAS 系统配置镜像

威联通

QNAP 威联通 NAS 配置

极空间

极空间 NAS 系统配置服务

网络设备

爱快路由

爱快 iKuai 路由系统配置

宝塔面板

在宝塔面板一键配置镜像

需要其他帮助?请查看我们的 常见问题Docker 镜像访问常见问题解答 或 提交工单

镜像拉取常见问题

使用与功能问题

配置了专属域名后,docker search 为什么会报错?

docker search 限制

Docker Hub 上有的镜像,为什么在轩辕镜像网站搜不到?

站内搜不到镜像

机器不能直连外网时,怎么用 docker save / load 迁镜像?

离线 save/load

docker pull 拉插件报错(plugin v1+json)怎么办?

插件要用 plugin install

WSL 里 Docker 拉镜像特别慢,怎么排查和优化?

WSL 拉取慢

轩辕镜像安全吗?如何用 digest 校验镜像没被篡改?

安全与 digest

第一次用轩辕镜像拉 Docker 镜像,要怎么登录和配置?

新手拉取配置

错误码与失败问题

docker pull 提示 manifest unknown 怎么办?

manifest unknown

Docker pull 时 HTTPS / TLS 证书验证失败怎么办?

TLS 证书失败

Docker pull 时 DNS 解析超时或连不上仓库怎么办?

DNS 超时

Docker 拉取出现 410 Gone 怎么办?

410 Gone 排查

出现 402 或「流量用尽」提示怎么办?

402 与流量用尽

Docker 拉取提示 UNAUTHORIZED(401)怎么办?

401 认证失败

遇到 429 Too Many Requests(请求太频繁)怎么办?

429 限流

docker login 提示 Cannot autolaunch D-Bus,还算登录成功吗?

D-Bus 凭证提示

为什么会出现「单层超过 20GB」或 413,无法加速拉取?

413 与超大单层

账号 / 计费 / 权限

轩辕镜像免费版和专业版有什么区别?

免费版与专业版区别

轩辕镜像支持哪些 Docker 镜像仓库?

支持的镜像仓库

镜像拉取失败还会不会扣流量?

失败是否计费

麒麟 V10 / 统信 UOS 提示 KYSEC 权限不够怎么办?

KYSEC 拦截脚本

如何在轩辕镜像申请开具发票?

申请开票

怎么修改轩辕镜像的网站登录和仓库登录密码?

修改登录密码

如何注销轩辕镜像账户?要注意什么?

注销账户

配置与原理类

写了 registry-mirrors,为什么还是走官方或仍然报错?

mirrors 不生效

怎么用 docker tag 去掉镜像名里的轩辕域名前缀?

去掉域名前缀

如何拉取指定 CPU 架构的镜像(如 ARM64、AMD64)?

指定架构拉取

用轩辕镜像拉镜像时快时慢,常见原因有哪些?

拉取速度原因

查看全部问题→

用户好评

来自真实用户的反馈,见证轩辕镜像的优质服务

用户头像

oldzhang

运维工程师

Linux服务器

5

"Docker访问体验非常流畅,大镜像也能快速完成下载。"

轩辕镜像
镜像详情
...
milqmedia/alertmanager
博客公告Docker 镜像公告与技术博客
热门镜像查看热门 Docker 镜像推荐
一键安装一键安装 Docker 并配置镜像源
镜像拉取问题咨询请 提交工单,官方技术交流群:1072982923。轩辕镜像所有镜像均来源于原始仓库,本站不存储、不修改、不传播任何镜像内容。
镜像拉取问题咨询请提交工单,官方技术交流群:。轩辕镜像所有镜像均来源于原始仓库,本站不存储、不修改、不传播任何镜像内容。
商务合作:点击复制邮箱
©2024-2026 源码跳动
商务合作:点击复制邮箱Copyright © 2024-2026 杭州源码跳动科技有限公司. All rights reserved.