[指标监控] 云原生监控系统 Prometheus

prometheus

字数统计: 3.9k阅读时长: 15 min

 2019/12/08  Share

Prometheus 作为当前炙手可热的云原生监控系统，是继 Kubernetes 之后第二个加入云原生计算基金会的成员。其安装及使用也是相当便捷，有强大的扩展性和集成性，查询语言 PromQL 可以轻松完成指标数据的查询与聚和。

监控的目标

《SRE: Google运维解密》一书中指出，监控系统需要能够有效的支持白盒监控和黑盒监控：通过白盒能够了解其内部的实际运行状态，通过对监控指标的观察能够预判可能出现的问题，从而对潜在的不确定因素进行优化。而黑盒监控，常见的如HTTP探针，TCP探针等，可以在系统或者服务在发生故障时能够快速通知相关的人员进行处理。通过建立完善的监控体系，从而达到以下目的：

长期趋势分析：通过对监控样本数据的持续收集和统计，对监控指标进行长期趋势分析。例如，通过对磁盘空间增长率的判断，我们可以提前预测在未来什么时间节点上需要对资源进行扩容。
对照分析：两个版本的系统运行资源使用情况的差异如何？在不同容量情况下系统的并发和负载变化如何？通过监控能够方便的对系统进行跟踪和比较。
告警：当系统出现或者即将出现故障时，监控系统需要迅速反应并通知管理员，从而能够对问题进行快速的处理或者提前预防问题的发生，避免出现对业务的影响。
故障分析与定位：当问题发生后，需要对问题进行调查和处理。通过对不同监控监控以及历史数据的分析，能够找到并解决根源问题。
数据可视化：通过可视化仪表盘能够直接获取系统的运行状态、资源使用情况、以及服务运行状态等直观的信息。

为什么使用 Prometheus

Prometheus核心部分只有一个单独的二进制文件，不存在任何的第三方依赖(数据库，缓存等等)。唯一需要的就是本地磁盘，因此不会有潜在级联故障的风险。(Prometheus 不仅仅是一个监控系统，同时也是一个时序数据库，Prometheus 不直接使用现有的时序数据库作为后端存储，就是希望监控系统有着时序数据库的特点，而且还需要部署和维护非常方便)
Prometheus基于 Pull 模型的架构方式，可以在任何地方搭建我们的监控系统。对于一些复杂的情况，还可以使用Prometheus服务发现(Service Discovery)的能力动态管理监控目标。
拥有多维度数据模型：所有采集的监控数据均以指标(metric)的形式保存在内置的时间序列数据库当中(TSDB)。所有的样本除了基本的指标名称以外，还包含一组用于描述该样本特征的标签，一个时间序列由一个度量指标和多个标签值确定。
强大的查询语言 PromQL：Prometheus 内置了一个强大的数据查询语言 PromQL。通过 PromQL 可以实现对监控数据的查询、聚合。同时 PromQL 也被应用于数据可视化(如Grafana)以及告警当中。通过PromQL可以轻松回答类似于以下问题：
- 在过去一段时间中95%应用延迟时间的分布范围？
- 预测在4小时后，磁盘空间占用大致会是什么情况？
- CPU占用率前5位的服务有哪些？(过滤)
Prometheus可以高效地处理这些数据，对于单一 Prometheus Server 实例而言它可以处理：数以百万的监控指标；每秒处理数十万的数据点。
扩展性：每个数据中心、每个团队都可以运行独立的 Prometheus Sevrer。Prometheus 对于联邦集群的支持，可以让多个Prometheus实例产生一个逻辑集群，当单实例 Prometheus Server 处理的任务量过大时，通过使用功能分区(sharding) + 联邦集群(federation) 可以对其进行扩展。
易于集成：支持多种语言客户端，基于这些SDK可以快速让应用程序纳入到Prometheus的监控当中，或者开发自己的监控数据收集程序。此外，还提供大量第三方提供的数据采集Exporters，比如：Oracle DB Exporter、PostgreSQL exporter、Redis exporter、NVIDIA GPU exporter、RabbitMQ exporter、Nginx metric library、JMX exporter等。

安装 Prometheus Server

docker 安装 Prometheus Server：

1	docker run -d -p 9090:9090 -v /etc/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus

打开 http://127.0.0.1:9090/ 就可以看到 Prometheus 主页了。

也可以直接下载安装包，解压后执行：

1	./prometheus --config.file=prometheus.yml

在Prometheus的架构设计中，Prometheus Server 并不直接服务监控特定的目标，其主要任务负责数据的收集，存储并且对外提供数据查询支持。因此为了能够能够监控到某些东西，如主机的CPU使用率，我们需要使用到Exporter。为了能够采集到主机的运行指标如CPU, 内存，磁盘等信息，可以使用 Node Exporter。直接运行node_exporter，打开9100端口，在 http://localhost:9100/metrics 就可以看到相关信息。

想要主动拉取 Node 信息，需要配置 prometheus.yml，job_name 为 prometheus 为其自带的，暂且不管。metrics_path 默认为 /metrics，对于后续监控自己的系统，想要更改path，即添加或更改此属性。在下方新增 node job_name 启动后，就可以在 Prometheus 首页的 Expression 处搜索到 Node 信息了。

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']
  - job_name: 'node'
    static_configs:
      - targets: ['localhost:9100']

Prometheus 架构

中间最核心的就是 Prometheus Server，最重要的功能就是数据的获取，主要是以 pull 的方式从 Jobs、Exporters 拉取，而寿命周期短的任务则可以存放数据至 Pushgateway 网关，再由 Prometheus Server 定期从网关中拉取。也可以通过 Service Discovery 发现需要主动监控的访问（当要监控特别多的节点，而这些节点也是动态变化的，可以采用这种方式）。 Prometheus Server 会将数据存储存储在磁盘上，并对外暴露 HTTP 服务，供外部客户端进行 PromQL 查询。

此外，还可以配置一些报警条件，当条件满足时，报警信息会以 push 的方式推至 Alertmanager，由 Alertmanager 处理告警的通知（当然，实际的告警是 Prometheus 产生的），比如告警去重后以短信或者邮件的方式进行通知。

Prometheus 核心概念

数据模型

Metric names and labels 指标名称及标签：每个时间序列由其指标名称和称为标签的可选键值对来进行唯一标识。
Sample 采样值：时序序列的数据即是样本，每个样本是一个float64数值，或者是一个精确到毫秒的时间戳。

Notation 标记/注解：实际上就是指标和标签组合的时间序列表示：

1 2	<metric name>{<label name>=<label value>, ...} api_http_requests_total{method="POST", handler="/messages"}

度量指标

Counter 计数器：是一个累计的指标，表示一个单调递增的计数器，比如用于任务请求数、成功或失败的任务数等。
Gauge 计量器：是一个数值可以上下移动的指标。
Histogram 直方图：主要用于采样分析，将范围分成一个个的桶，将数值放入桶中进行分布情况分析。
Summary 汇总：与 Histogram 类似，主要提供百分比分布情况。

任务和实例

任务 Job、实例 Instance

一个任务可能有多个实例，比如要应用的某个指标，可以为这个应用建一个 Job，该应用可以有多台服务器，即有多个 Instance。当前在每一个Job中主要使用了静态配置(static_configs)的方式定义监控目标。除了静态配置每一个Job的采集Instance地址以外，Prometheus还支持与DNS、Consul、E2C、Kubernetes等进行集成实现自动发现Instance实例，并从这些Instance上获取监控数据。访问 /target 直接从Prometheus的UI中查看当前所有的任务以及每个任务对应的实例信息。

Prometheus 抓取采样值后，会自动给采样值添加一下标签和值：job 抓取所属任务、instance 抓取来源实例。

另外每次抓取时，Prometheus 还会自动在以下时序里插入采样值：

up{job="", instance=""}: 1 if the instance is healthy, i.e. reachable, or 0 if the scrape failed.
scrape_duration_seconds{job="", instance=""}: duration of the scrape.
scrape_samples_post_metric_relabeling{job="", instance=""}: the number of samples remaining after metric relabeling was applied.
scrape_samples_scraped{job="", instance=""}: the number of samples the target exposed.
scrape_series_added{job="", instance=""}: the approximate number of new series in this scrape. New in v2.10

Prometheus 配置

完整配置文档

prometheus.yml 文件内容：

global:  # 全局配置，全局节点的配置对其他所有节点都有效，同时也是其他节点的默认值
  scrape_interval:     15s # 抓取间隔 默认1min
  evaluation_interval: 15s # 规则评估间隔 默认1min
  # scrape_timeout  # 抓取超时时间 默认10s
 
alerting: # Alertmanager 的配置
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093
 
rule_files: # 记录规则配置和告警规则配置
  # - "first_rules.yml"
  # - "second_rules.yml"
 

scrape_configs: # 抓取配置
  - job_name: 'prometheus'
 
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
 
    static_configs:
    - targets: ['localhost:9090']

抓取配置：

# 任务名
job_name: <job_name>

# 抓取间隔，默认为全局配饰
[ scrape_interval: <duration> | default = <global_config.scrape_interval> ]

# 抓取超时时间，默认为全局配饰
[ scrape_timeout: <duration> | default = <global_config.scrape_timeout> ]

# 抓取地址的路径，默认为/metrics
[ metrics_path: <path> | default = /metrics ]

# 是否尊重抓取回来的标签
[ honor_labels: <boolean> | default = false ]

# 协议，默认http，可选https
[ scheme: <scheme> | default = http ]

# 抓取地址的参数
params:
  [ <string>: [<string>, ...] ]
  
# 目标配置  
static_configs:
  [ - <static_config> ... ]

# Sets the `Authorization` header on every scrape request with the
# configured username and password.
# password and password_file are mutually exclusive.
basic_auth:
  [ username: <string> ]
  [ password: <secret> ]
  [ password_file: <string> ]

：

# 目标地址列表
targets:
  [ - '<host>' ]

# 标签列表
labels:
  [ <labelname>: <labelvalue> ... ]

PromQL

PromQL 语法数据类型

Instant vector 瞬时向量：一组时序，每个时序只有一个采样值
Range vector 区间向量：一组时序，每个时序包含一段时间内的多个采样值
Scalar 标量：一个浮点数
String 字符串：一个字符串，暂时未用

时序选择器

瞬时向量选择器

选择一组时序在某个采用点的采样值。最简单的情况就是指定一个指标，选择出所有属于该度量指标的时序的当前采样值。

1	http_requests_total

可以在后面添加用大括号包围起来的一组标签键值对来对时序进行过滤：

1	http_requests_total{job="prometheus",group="canary"}

标签匹配时，可以使用值，也可以使用正则表达式，匹配操作符有如下四种：

=: Select labels that are exactly equal to the provided string.
!=: Select labels that are not equal to the provided string.
=~: Select labels that regex-match the provided string.
!~: Select labels that do not regex-match the provided string.

示例如下：

1	http_requests_total{environment=~"staging\|testing\|development",method!="GET"}

度量指标名称也可以使用使用内部标签__name__表示，表达式 http_requests_total 也可以写成 {__name__="http_requests_total"}。表达式{__name__=~"customized.*"}匹配所有指标名称以customized打头的时序。

区间向量选择器

与瞬时向量选择器不同的是，会在后面加上中括号包起来的指定区间长度：

1	http_requests_total{job="prometheus"}[5m]

时长单位为以下几种：

s - seconds
m - minutes
h - hours
d - days
w - weeks
y - years

偏移修饰器

上两种选择器是是当前时间或从当前时间倒推的，若需要选择过去时间点或过去时间点倒推的，则使用偏移修饰器。

# 指标名称为 http_requests_total 的所有时序在5分钟前的采样值
http_requests_total offset 5m

# 指标名称为 http_requests_total 的在一周前的这个时间点过去5分钟的采样值
http_requests_total[5m] offset 1w

二元操作符

+ (addition)
- (subtraction)
* (multiplication)
/ (division)
% (modulo)
^ (power/exponentiation)

传统二元运算符用在标量和标量之间，而这里可以用在向量和标量、向量和向量之间。二元操作符里的向量特指瞬时向量，不包含区间向量。

标量与标量：通常的算术运算。
向量与标量：把标量与向量里的每一个标量进行运算，结果组成一个新向量
向量与向量：左边向量里的每一个元素在右边向量里去找一个匹配元素（见向量匹配规则），然后两个匹配元素执行计算，计算结果组成一个新的向量。如果没有找到匹配元素，则该元素丢弃。

![image-20191219181408718](/Users/john/Library/Application Support/typora-user-images/image-20191219181408718.png)

向量匹配规则

有两种匹配规则：one-to-one，one-to-many/many-to-to：

one-to-one：左边的向量的元素匹配到唯一一个右边元素。看标签键值对是否匹配（不看指标名称），ignoring忽略标签不参与匹配的标签，on指定参与匹配的标签。

method_code:http_errors:rate5m{method="get", code="500"}  24
method_code:http_errors:rate5m{method="get", code="404"}  30
method_code:http_errors:rate5m{method="put", code="501"}  3
method_code:http_errors:rate5m{method="post", code="500"} 6
method_code:http_errors:rate5m{method="post", code="404"} 21

method:http_requests:rate5m{method="get"}  600
method:http_requests:rate5m{method="del"}  34
method:http_requests:rate5m{method="post"} 120

查询：
method_code:http_errors:rate5m{code="500"} / ignoring(code) method:http_requests:rate5m
执行结果：
{method="get"}  0.04            //  24 / 600
{method="post"} 0.05            //   6 / 120

one-to-many/many-to-one：某一边会有多个元素和另一边的元素匹配，需要使用group_left或group_right指明是左边还是右边元素较多：

查询：
method_code:http_errors:rate5m / ignoring(code) group_left method:http_requests:rate5m
执行结果：
{method="get", code="500"}  0.04            //  24 / 600
{method="get", code="404"}  0.05            //  30 / 600
{method="post", code="500"} 0.05            //   6 / 120
{method="post", code="404"} 0.175           //  21 / 120