基于Prometheus和Grafana的IPMI/BMC监控系统

一、前言&环境准备

服务器BMC/IPMI,可以看到硬件的运行情况,比如CPU温度、内存温度、风扇转速、主板电压等,这些数据默认只能通过登录web后台或者ipmitool等接口工具来实时查看,看不到历史曲线图,也做不到监控告警,当服务器出现问题时,除了日志之外,这些物理传感器指标绘制的趋势图更有利于我们判断服务器在历史异常时间点的硬件状态趋势变化情况,或者实时监测指标到达设定的阈值时自动告警。

prometheus和grafana,前者作为数据采集,后者作为监控展示及告警,缺一不可,这两个组件如何安装非本文重点,本文建立在已安装两个环境之下。

安装教程可以参考前一篇文章…

我这里的prometheus和grafana是在VPS上安装的,方便监控多台服务器。

服务器是一台超微物理服务器,IP是192.168.10.x

服务器安装了PVE,是基于Debian的

二、安装ipmi_exporter

此页面下载对应系统版本的二进制文件,解压即可。

1
2
wget https://github.com/prometheus-community/ipmi_exporter/releases/download/v1.10.0/ipmi_exporter-1.10.0.linux-amd64.tar.gz
tar xf ipmi_exporter-1.10.0.linux-amd64.tar.gz

不添加PATH环境变量的情况下,设置一个软链接到PATH路径下:

1
2
mv /opt/ipmi_exporter-1.10.0.linux-amd64 /opt/ipmi_exporter
ln -sf /opt/ipmi_exporter/ipmi_exporter /usr/local/bin/ipmi_exporter

验证是否可以成功执行:

1
ipmi_exporter -h

三、写systemctl服务并安装FreeIPMI

ipmi_exporter yaml配置文件

1
vim /opt/ipmi_exporter/ipmi_remote.yml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
modules:
default:
# These settings are used if no module is specified, the
# specified module doesn't exist, or of course if
# module=default is specified.
user: "ADMIN" # 无特殊需求,填写这两行账号密码即可,IPMI用户名
pass: "PASSWORD" # 无特殊需求,填写这两行账号密码即可,IPMI密码
# The below settings correspond to driver-type, privilege-level, and
# session-timeout respectively, see `man 5 freeipmi.conf` (and e.g.
# `man 8 ipmi-sensors` for a list of driver types).
driver: "LAN_2_0"
privilege: "user"
# The session timeout is in milliseconds. Note that a scrape can take up
# to (session-timeout * #-of-collectors) milliseconds, so set the scrape
# timeout in Prometheus accordingly.
# Must be larger than the retransmission timeout, which defaults to 1000.
timeout: 10000
# Available collectors are bmc, bmc-watchdog, ipmi, chassis, dcmi, sel,
# and sm-lan-mode
# If _not_ specified, bmc, ipmi, chassis, and dcmi are used
collectors:
- bmc
- ipmi
- chassis
# Got any sensors you don't care about? Add them here.
exclude_sensor_ids:
- 2
- 29
- 32
- 50
- 52
- 55
dcmi:
# Use these settings when scraped with module=dcmi.
user: "admin_user"
pass: "another_pw"
privilege: "admin"
driver: "LAN_2_0"
collectors:
- dcmi
thatspecialhost:
# Use these settings when scraped with module=thatspecialhost.
user: "some_user"
pass: "secret_pw"
privilege: "admin"
driver: "LAN"
collectors:
- ipmi
- sel
# Need any special workaround flags set? Add them here.
# Workaround flags might be needed to address issues with specific vendor implementations
# e.g. https://www.gnu.org/software/freeipmi/freeipmi-faq.html#Why-is-the-output-from-FreeIPMI-different-than-another-software_003f
# For a full list of flags, refer to:
# https://www.gnu.org/software/freeipmi/manpages/man8/ipmi-sensors.8.html#lbAL
workaround_flags:
- discretereading
# If you require additional command line arguments (e.g. --bridge-sensors for ipmimonitoring),
# you can specify them per collector - BE CAREFUL, you can easily break the exporter with this!
custom_args:
ipmi:
- "--bridge-sensors"
advanced:
# Use these settings when scraped with module=advanced.
user: "some_user"
pass: "secret_pw"
privilege: "admin"
driver: "LAN"
collectors:
- ipmi
- sel
# USING ANY OF THE BELOW VOIDS YOUR WARRANTY! YOU MAY GET BITTEN BY SHARKS!
# You can override the command to be executed for a collector. Paired with
# custom_args, this can be used to e.g. execute the IPMI tools with sudo:
collector_cmd:
ipmi: sudo
sel: sudo
custom_args:
ipmi:
- "ipmimonitoring"
sel:
- "ipmi-sel"

编辑服务文件

1
vim /etc/systemd/system/ipmi_exporter.service
1
2
3
4
5
6
7
8
9
10
11
[Unit]
Description=IPMI Exporter
Wants=network-online.target
After=network-online.target
[Service]
User=root
Group=root
Type=simple
ExecStart=/usr/local/bin/ipmi_exporter --config.file=/opt/ipmi_exporter/ipmi_remote.yml
[Install]
WantedBy=multi-user.target

重载并启动服务:

1
2
systemctl daemon-reload
systemctl enable ipmi_exporter --now

验证服务已运行成功,并且没有报错:

1
journalctl -u ipmi_exporter.service -f

安装FreeIPMI

安装FreeIPMI即可。

发行版 安装命令
Archlinux pacman -Sy extra/freeipmi
Centos/Redhat yum install freeipmi -y
Debian/Ubuntu apt install freeipmi -y
Gentoo emerge –ask freeipmi

四、配置Prometheus并验证metrics

1.配置ipmi_targets

写一个targets文件,填写需要监控的ipmi的IP(不是服务器上安装的系统IP):

1
vim /usr/local/prometheus/ipmi_targets.yml 
1
2
3
4
- targets:
- xx.xx.xx.xx # 被监控的IPMI IP
labels:
job: ipmi_exporter

2.配置prometheus

在prometheus主配置文件里追加以下ipmi_exporter的任务,prometheus将收集来自ipmi_exporter服务主机的metrics数据:

1
vim /usr/local/prometheus/prometheus.yml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
- job_name: ipmi_exporter
params:
module: ['default']
scrape_interval: 1m
scrape_timeout: 30s
metrics_path: /ipmi
scheme: http
file_sd_configs:
- files:
- /etc/prometheus/ipmi_targets.yml
refresh_interval: 5m
relabel_configs:
- source_labels: [__address__]
separator: ;
regex: (.*)
target_label: __param_target
replacement: ${1}
action: replace
- source_labels: [__param_target]
separator: ;
regex: (.*)
target_label: instance
replacement: ${1}
action: replace
- separator: ;
regex: .*
target_label: __address__
replacement: xx.xx.xx.xx:9290 # 这里写ipmi_exporter服务所在的系统IP,不是IPMIIP,我这里使用了tailscale分配的IP
action: replace

之后重启prometheus服务

3.验证metrics数据收集情况

此时到Prometheus的web页面,找到我们添加的ipmi_exporter任务,确保已经UP

Endpoint链接点进去可以看到ipmi_exporter收集的metrics原始数据

五、配置grafana监控面板

导入监控模板

获取到metric数据后,最后一步则将这些数据通过监控图的方式展示出来,这里采用模板方式进行导入。

在Grafana主面板中,点击导入选项

填写ID为15765,之后点击Load

数据源选择Prometheus

就可以了,可以根据自己的实际情况修改模板


基于Prometheus和Grafana的IPMI/BMC监控系统
https://blog.quickso.cn/2025/03/14/基于Prometheus和Grafana的IPMI-BMC监控系统/
作者
木子欢儿
发布于
2025年3月14日
许可协议