Prometheus使用blackbox_exporter墨盒监控

黑盒监控相较于白盒监控最大的不同在于黑盒监控是以故障为导向当故障发生时,黑盒监控能快速发现故障,而白盒监控则侧重于主动发现或者预测潜在的问题。一个完善的监控目标是要能够从白盒的角度发现潜在问题,能够在黑盒的角度快速发现已经发生的问题。

Blackbox ExporterPrometheus社区提供的官方黑盒监控解决方案,其允许用户通过:HTTPHTTPSDNSTCP以及ICMP的方式对网络进行探测。

1.下载blackbox_exporter墨盒监控:

https://github.com/prometheus/blackbox_exporter

2.配置blackox_exporte

2.1 部署blackox_exporte

1
2
tar xf downloads/blackbox_exporter-0.14.0.linux-amd64.tar.gz
mv -f blackbox_exporter-0.14.0.linux-amd64 /work/admin/blackbox_exporter

运行blackbox_exporter的服务器需要以root用户运行或开启 CAP_NET_RAW功能,这里开启该功能

1
sudo setcap cap_net_raw+ep blackbox_exporter/blackbox_exporter

2.2 配置blackox_exporte

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
cat blackox_exporter/blackbox.yml

modules:
http_2xx:
prober: http
timeout: 5s
http:
preferred_ip_protocol: "ip4"
#method: GET
#valid_http_versions: ["HTTP/1.1", "HTTP/2"]
#valid_status_codes: [200]
http_post_2xx:
prober: http
http:
method: POST
preferred_ip_protocol: "ip4"
headers:
Content-Type: application/json #header头
body: '{"hmac":"","params":{"publicFundsKeyWords":"box"}}' #传参
tcp_connect:
prober: tcp
timeout: 5s
pop3s_banner:
prober: tcp
tcp:
query_response:
- expect: "^+OK"
tls: true
tls_config:
insecure_skip_verify: false
ssh_banner:
prober: tcp
timeout: 5s
tcp:
query_response:
- expect: "^SSH-2.0-"
irc_banner:
prober: tcp
tcp:
query_response:
- send: "NICK prober"
- send: "USER prober prober prober :prober"
- expect: "PING :([^ ]+)"
send: "PONG ${1}"
- expect: "^:[^ ]+ 001"
icmp:
prober: icmp
timeout: 2s
icmp:
preferred_ip_protocol: "ip4"
# source_ip_address: "192.168.6.216"
dns_soa:
prober: dns
dns:
query_name: "prometheus.io"
query_type: "SOA"
dns_tcp_example:
prober: dns
dns:
transport_protocol: "tcp" # defaults to "udp"
preferred_ip_protocol: "ip4" # defaults to "ip6"
query_name: "www.prometheus.io"

2.3 启动blackox_exporte

1
/work/admin/blackbox_exporter/blackbox_exporter --config.file=/work/admin/blackbox_exporter/blackbox.yml

2.4 使用supervisor管理进程

1
2
3
4
5
[program:blackbox_exporter_product]
command=/work/admin/blackbox_exporter/blackbox_exporter --config.file=/work/admin/blackbox_exporter/blackbox.yml
directory=/work/admin/blackbox_exporter
redirect_stderr=true
autorestart=true

3.prometheus配置

3.1 icmp监控

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
scrape_configs:
- job_name: 'blackbox_product_icmp'
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /probe
params:
module: [icmp]
static_configs:
- targets: ['192.168.4.58']
labels: {cluster: 'product',type: 'icmp',env: 'activemq',job: 'activemq1',export: 'ping'}
- targets: ['192.168.1.231']
labels: {cluster: 'product',type: 'icmp',env: 'activemq',job: 'activemq2',export: 'ping'}
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 192.168.6.216:9115

3.2 http监控

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
- job_name: 'blackbox_product_http'
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets: ['https://upload.feiersmart.com/static/agreement/privacy.html']
labels: {cluster: 'product',type: 'https',env: 'http',job: 'upload',export: 'http'}
- targets: ['https://gitlab.feiersmart.com']
labels: {cluster: 'product',type: 'http',env: 'http',job: 'gitlab',export: 'http'}
- targets: ['http://file.feiersmart.com/1/']
labels: {cluster: 'product',type: 'http',env: 'http',job: 'file',export: 'http'}
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 192.168.6.216:9115

3.3 post监控

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
- job_name: 'blackbox_product_post'
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /probe
params:
module: [http_post_2xx]
static_configs:
- targets:
- http://ms-gateway.feiersmart.com/v1/config/getrunconfig.do
labels: {cluster: 'product',type: 'http',env: 'post',job: 'ms-gateway',export: 'post'}
- targets:
- http://aiui-msp.feiersmart.openspeech.cn/musicservice/useservice.do
labels: {cluster: 'product',type: 'http',env: 'http',job: 'aiui-msp',export: 'post'}
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 192.168.6.216:9115

3.4 tcp监控

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
- job_name: 'blackbox_aftertreatment_tcp'
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /probe
params:
module: [tcp_connect]
static_configs:
- targets: ['172.22.123.100:6379']
labels: {cluster: 'aftertreatment',type: 'tcp',env: 'xfyun-bj-tools',job: 'xfyun-bj-tools1',export: 'redis-server'}
- targets: ['172.22.123.100:8080']
labels: {cluster: 'aftertreatment',type: 'tcp',env: 'xfyun-bj-tools',job: 'xfyun-bj-tools1',export: 'apache-tomcat-7.0.70(mspsystem/xy-mspsystem)'}
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 172.22.123.100:9115

3.5 dns监控

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
- job_name: 'blackbox_aftertreatment_dns'
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /probe
params:
module: [dns_soa]
static_configs:
- targets: ['feier.ai']
labels: {cluster: 'aftertreatment',type: 'dns',env: 'xfyun-bj-tools',job: 'xfyun-bj-tools1',export: '国外域名解析'}
- targets: ['feiersmart.com']
labels: {cluster: 'aftertreatment',type: 'dns',env: 'xfyun-bj-tools',job: 'xfyun-bj-tools1',export: '网站域名解析'}
- targets: ['kohlerchina.azure-api.cn']
labels: {cluster: 'aftertreatment',type: 'dns',env: 'xfyun-bj-tools',job: 'xfyun-bj-tools1',export: '合作方域名解析'}
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 172.22.123.100:9115

3.6 重启prometheus

重启之前检查下配置文件有没有错误

1
./promtool check config prometheus.yml

如果开启了api管理功能,可以如下方式热加载配置

1
curl -X POST http://127.0.0.1:9090/-/reload

4.grafana出图

推荐使用9965

上图监控面板我经过一些改动,仅供参考

-------------本文结束感谢您的阅读-------------