云原生监控Prometheus-Operator部署配置


准备工作

github地址:
https://github.com/prometheus-operator/prometheus-operator

安装方式可选方式:

安装部署prometheus-operator

本文使用helm方式安装prometheus-operator

下载安装包

  1. 添加helm仓库
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
  1. 搜索包,排第一的kube-prometheus-stack是现在最新的的promethes-operator charts
helm search repo prometheus
NAME                                                      CHART VERSION        APP VERSION        DESCRIPTION                                       
prometheus-community/kube-prometheus-stack                42.2.0               0.60.1             kube-prometheus-stack collects Kubernetes manif...
prometheus-community/prometheus                           19.0.0               v2.40.5            Prometheus is a monitoring system and time seri...
prometheus-community/prometheus-adapter                   3.4.2                v0.10.0            A Helm chart for k8s prometheus adapter   
……
  1. 下载包
helm pull prometheus-community/kube-prometheus-stack --version=42.2.0

配置value.yaml

安装charts

kubectl create ns prometheus
helm install prometheus-stack kube-prometheus-stack-42.2.0.tgz -n prometheus

查看命名空间prometheus下的各资源

$ kubectl get all -n prometheus
NAME                                                         READY   STATUS             RESTARTS      AGE
pod/alertmanager-prometheus-stack-kube-prom-alertmanager-0   2/2     Running            1 (32s ago)   103s
pod/prometheus-prometheus-stack-kube-prom-prometheus-0       2/2     Running            0             102s
pod/prometheus-stack-grafana-67f9c54566-cqqdg                3/3     Running            0             107s
pod/prometheus-stack-kube-prom-admission-patch-4z5ks         0/1     CrashLoopBackOff   3 (36s ago)   102s
pod/prometheus-stack-kube-prom-operator-689885654-c2znh      1/1     Running            0             107s
pod/prometheus-stack-kube-state-metrics-59fbfbfd5f-mjjjn     0/1     ImagePullBackOff   0             107s
pod/prometheus-stack-prometheus-node-exporter-bdxrq          1/1     Running            0             107s
pod/prometheus-stack-prometheus-node-exporter-j8fpn          1/1     Running            0             107s
pod/prometheus-stack-prometheus-node-exporter-jf65w          1/1     Running            0             107s
pod/prometheus-stack-prometheus-node-exporter-rms29          1/1     Running            0             107s

NAME                                                TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
service/alertmanager-operated                       ClusterIP   None             <none>        9093/TCP,9094/TCP,9094/UDP   103s
service/prometheus-operated                         ClusterIP   None             <none>        9090/TCP                     102s
service/prometheus-stack-grafana                    ClusterIP   10.100.46.155    <none>        80/TCP                       107s
service/prometheus-stack-kube-prom-alertmanager     ClusterIP   10.110.149.34    <none>        9093/TCP                     107s
service/prometheus-stack-kube-prom-operator         ClusterIP   10.105.139.158   <none>        443/TCP                      107s
service/prometheus-stack-kube-prom-prometheus       ClusterIP   10.109.41.102    <none>        9090/TCP                     107s
service/prometheus-stack-kube-state-metrics         ClusterIP   10.96.77.114     <none>        8080/TCP                     107s
service/prometheus-stack-prometheus-node-exporter   ClusterIP   10.98.46.213     <none>        9100/TCP                     107s

NAME                                                       DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/prometheus-stack-prometheus-node-exporter   4         4         4       4            4           <none>          107s

NAME                                                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/prometheus-stack-grafana              1/1     1            1           107s
deployment.apps/prometheus-stack-kube-prom-operator   1/1     1            1           107s
deployment.apps/prometheus-stack-kube-state-metrics   0/1     1            0           107s

NAME                                                             DESIRED   CURRENT   READY   AGE
replicaset.apps/prometheus-stack-grafana-67f9c54566              1         1         1       107s
replicaset.apps/prometheus-stack-kube-prom-operator-689885654    1         1         1       107s
replicaset.apps/prometheus-stack-kube-state-metrics-59fbfbfd5f   1         1         0       107s

NAME                                                                    READY   AGE
statefulset.apps/alertmanager-prometheus-stack-kube-prom-alertmanager   1/1     103s
statefulset.apps/prometheus-prometheus-stack-kube-prom-prometheus       1/1     102s

NAME                                                   COMPLETIONS   DURATION   AGE
job.batch/prometheus-stack-kube-prom-admission-patch   0/1           102s       102s

发现有两个pod起不来。通过describe发现kube-state-metrics是镜像拉不下来,直接编辑deployment.apps/prometheus-stack-kube-state-metrics修改image

image: registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.7.0

改为

image: bitnami/kube-state-metrics:2.7.0

保存后deployment会自动重新创建,等待一段时间后再看pod已经成功启动,kube-prom-admission-patch也自动好了。

查看CRD

operator会安装如下CRD

$ kubectl get crd
NAME                                                  CREATED AT
alertmanagerconfigs.monitoring.coreos.com             2022-12-04T13:46:48Z
alertmanagers.monitoring.coreos.com                   2022-12-04T13:46:48Z
podmonitors.monitoring.coreos.com                     2022-12-04T13:46:48Z
probes.monitoring.coreos.com                          2022-12-04T13:46:48Z
prometheuses.monitoring.coreos.com                    2022-12-04T13:46:48Z
prometheusrules.monitoring.coreos.com                 2022-12-04T13:46:48Z
servicemonitors.monitoring.coreos.com                 2022-12-04T13:46:48Z
thanosrulers.monitoring.coreos.com                    2022-12-04T13:46:49Z

通过operator安装prometheus

Helm install以后,会自动用operator在prometheus命名空间下安装一个Prometheus CR,可查看:

$ kubectl get Prometheus -n prometheus 
NAME                                    VERSION   DESIRED   READY   RECONCILED   AVAILABLE   AGE
prometheus-stack-kube-prom-prometheus   v2.39.1   1         1       True         True        12m

如果没有安装,或者我们需要在其它ns下另外安装一个prometheus(适合于非集群管理员,拿不到管理员安装的prometheus),可提交一个如下CR

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  annotations:
    meta.helm.sh/release-name: prometheus-stack
    meta.helm.sh/release-namespace: prometheus
  creationTimestamp: "2023-11-25T13:19:58Z"
  generation: 1
  labels:
    app: kube-prometheus-stack-prometheus
    app.kubernetes.io/instance: prometheus-stack
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/part-of: kube-prometheus-stack
    app.kubernetes.io/version: 42.2.0
    chart: kube-prometheus-stack-42.2.0
    heritage: Helm
    release: prometheus-stack
  name: prometheus-1
  namespace: prometheus
spec:
  alerting:
    alertmanagers:
    - apiVersion: v2
      name: prometheus-stack-kube-prom-alertmanager
      namespace: prometheus
      pathPrefix: /
      port: http-web
  enableAdminAPI: true
  evaluationInterval: 30s
  externalUrl: http://prometheus-1.prometheus:9090
  hostNetwork: false
  image: quay.io/prometheus/prometheus:v2.39.1
  listenLocal: false
  logFormat: logfmt
  logLevel: info
  paused: false
  podMonitorNamespaceSelector: {}
  podMonitorSelector:
    matchLabels:
      release: prometheus-stack
  portName: http-web
  probeNamespaceSelector: {}
  probeSelector:
    matchLabels:
      release: prometheus-stack
  replicas: 1
  retention: 10d
  routePrefix: /
  ruleNamespaceSelector: {}
  ruleSelector:
    matchLabels:
      release: prometheus-stack
  scrapeInterval: 30s
  securityContext:
    fsGroup: 2000
    runAsGroup: 2000
    runAsNonRoot: true
    runAsUser: 1000
  serviceAccountName: prometheus-stack-kube-prom-prometheus
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector:
    matchLabels:
      release: prometheus-stack
  shards: 1
  storage:
    volumeClaimTemplate:
      spec:
        accessModes:
        - ReadWriteMany
        resources:
          requests:
            storage: 2Gi
        storageClassName: nfs-storage
  version: v2.39.1
  walCompression: true

其余Prometheus CR的参数,可用explain命令查看

kubectl explain Prometheus

暴露服务

默认安装的prometheus服务都是ClusterIP,可用Ingress、NodePort或LoadBalancer暴露出来,比如用NodePort:

kubectl edit svc prometheus-stack-kube-prom-prometheus -n prometheus

将type: ClusterIP改为NodePort,查看端口

$ kubectl get svc -n prometheus 
NAME                                        TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
alertmanager-operated                       ClusterIP   None             <none>        9093/TCP,9094/TCP,9094/UDP   25m
prometheus-operated                         ClusterIP   None             <none>        9090/TCP                     25m
prometheus-stack-grafana                    ClusterIP   10.100.46.155    <none>        80/TCP                       25m
prometheus-stack-kube-prom-alertmanager     ClusterIP   10.110.149.34    <none>        9093/TCP                     25m
prometheus-stack-kube-prom-operator         ClusterIP   10.105.139.158   <none>        443/TCP                      25m
prometheus-stack-kube-prom-prometheus       NodePort    10.109.41.102    <none>        9090:30633/TCP               25m
prometheus-stack-kube-state-metrics         ClusterIP   10.96.77.114     <none>        8080/TCP                     25m
prometheus-stack-prometheus-node-exporter   ClusterIP   10.98.46.213     <none>        9100/TCP                     25m

然后就可以打开prometheus页面了:
http://192.168.126.100:30633

部署后问题

kubelet 10250端口不通

解决办法:

  1. 修改kube-controller-manager.yaml,将–bind-address改为0.0.0.0

  1. 修改kube-scheduler.yaml,将–bind-address改为0.0.0.0

修改后解决

kube-proxy 10249端口不通


解决办法:

kubectl edit configmap kube-proxy -n kube-system

把metricsBindAddress修改成metricsBindAddress: 0.0.0.0:10249

重启kube-proxy pod:

kubectl get pods -n kube-system | grep kube-proxy |awk '{print $1}'|xargs kubectl delete pods -n kube-system

etcd 2181端口不通

解决办法:修改etcd.yaml

配置Grafana

Helm install好以后grafana已经自动装好,将prometheus-stack-grafana服务的类型改为NodePort后,查看grafana的admin密码:

kubectl get secret prometheus-stack-grafana -n prometheus -o jsonpath="{.data.admin-pass


文章作者: 洪宇轩
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 洪宇轩 !
评论
 上一篇
云原生链路跟踪工具Jaeger+OpenTelemetry Collector部署详解 云原生链路跟踪工具Jaeger+OpenTelemetry Collector部署详解
提到链路跟踪,或者叫全链路监控,或者叫APM(Application Performance Management),具体含义和原理不赘述,开源方案有skywalking、zipkin、elasticapm等工具,商业产品有基调听云等等,但在云原生领域,也有一个CNCF已毕业项目jaeger同样发展迅速。
2023-04-04
下一篇 
K8S入门系列之五:K8S授权与认证 K8S入门系列之五:K8S授权与认证
本文讲解K8S的authenticated(授权)和authorized(认证),如何创建一个访问K8S集群的用户并给他赋予相应的rbac权限。
2022-08-16
  目录