准备工作
github地址:
https://github.com/prometheus-operator/prometheus-operator
安装方式可选方式:
- 源码yaml安装
- helm安装:k8sv1.21+以后新版的helm charts为此地址:https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack
安装部署prometheus-operator
本文使用helm方式安装prometheus-operator
下载安装包
- 添加helm仓库
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
- 搜索包,排第一的kube-prometheus-stack是现在最新的的promethes-operator charts
helm search repo prometheus
NAME CHART VERSION APP VERSION DESCRIPTION
prometheus-community/kube-prometheus-stack 42.2.0 0.60.1 kube-prometheus-stack collects Kubernetes manif...
prometheus-community/prometheus 19.0.0 v2.40.5 Prometheus is a monitoring system and time seri...
prometheus-community/prometheus-adapter 3.4.2 v0.10.0 A Helm chart for k8s prometheus adapter
……
- 下载包
helm pull prometheus-community/kube-prometheus-stack --version=42.2.0
配置value.yaml
安装charts
kubectl create ns prometheus
helm install prometheus-stack kube-prometheus-stack-42.2.0.tgz -n prometheus
查看命名空间prometheus下的各资源
$ kubectl get all -n prometheus
NAME READY STATUS RESTARTS AGE
pod/alertmanager-prometheus-stack-kube-prom-alertmanager-0 2/2 Running 1 (32s ago) 103s
pod/prometheus-prometheus-stack-kube-prom-prometheus-0 2/2 Running 0 102s
pod/prometheus-stack-grafana-67f9c54566-cqqdg 3/3 Running 0 107s
pod/prometheus-stack-kube-prom-admission-patch-4z5ks 0/1 CrashLoopBackOff 3 (36s ago) 102s
pod/prometheus-stack-kube-prom-operator-689885654-c2znh 1/1 Running 0 107s
pod/prometheus-stack-kube-state-metrics-59fbfbfd5f-mjjjn 0/1 ImagePullBackOff 0 107s
pod/prometheus-stack-prometheus-node-exporter-bdxrq 1/1 Running 0 107s
pod/prometheus-stack-prometheus-node-exporter-j8fpn 1/1 Running 0 107s
pod/prometheus-stack-prometheus-node-exporter-jf65w 1/1 Running 0 107s
pod/prometheus-stack-prometheus-node-exporter-rms29 1/1 Running 0 107s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 103s
service/prometheus-operated ClusterIP None <none> 9090/TCP 102s
service/prometheus-stack-grafana ClusterIP 10.100.46.155 <none> 80/TCP 107s
service/prometheus-stack-kube-prom-alertmanager ClusterIP 10.110.149.34 <none> 9093/TCP 107s
service/prometheus-stack-kube-prom-operator ClusterIP 10.105.139.158 <none> 443/TCP 107s
service/prometheus-stack-kube-prom-prometheus ClusterIP 10.109.41.102 <none> 9090/TCP 107s
service/prometheus-stack-kube-state-metrics ClusterIP 10.96.77.114 <none> 8080/TCP 107s
service/prometheus-stack-prometheus-node-exporter ClusterIP 10.98.46.213 <none> 9100/TCP 107s
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/prometheus-stack-prometheus-node-exporter 4 4 4 4 4 <none> 107s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/prometheus-stack-grafana 1/1 1 1 107s
deployment.apps/prometheus-stack-kube-prom-operator 1/1 1 1 107s
deployment.apps/prometheus-stack-kube-state-metrics 0/1 1 0 107s
NAME DESIRED CURRENT READY AGE
replicaset.apps/prometheus-stack-grafana-67f9c54566 1 1 1 107s
replicaset.apps/prometheus-stack-kube-prom-operator-689885654 1 1 1 107s
replicaset.apps/prometheus-stack-kube-state-metrics-59fbfbfd5f 1 1 0 107s
NAME READY AGE
statefulset.apps/alertmanager-prometheus-stack-kube-prom-alertmanager 1/1 103s
statefulset.apps/prometheus-prometheus-stack-kube-prom-prometheus 1/1 102s
NAME COMPLETIONS DURATION AGE
job.batch/prometheus-stack-kube-prom-admission-patch 0/1 102s 102s
发现有两个pod起不来。通过describe发现kube-state-metrics是镜像拉不下来,直接编辑deployment.apps/prometheus-stack-kube-state-metrics
修改image
image: registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.7.0
改为
image: bitnami/kube-state-metrics:2.7.0
保存后deployment会自动重新创建,等待一段时间后再看pod已经成功启动,kube-prom-admission-patch也自动好了。
查看CRD
operator会安装如下CRD
$ kubectl get crd
NAME CREATED AT
alertmanagerconfigs.monitoring.coreos.com 2022-12-04T13:46:48Z
alertmanagers.monitoring.coreos.com 2022-12-04T13:46:48Z
podmonitors.monitoring.coreos.com 2022-12-04T13:46:48Z
probes.monitoring.coreos.com 2022-12-04T13:46:48Z
prometheuses.monitoring.coreos.com 2022-12-04T13:46:48Z
prometheusrules.monitoring.coreos.com 2022-12-04T13:46:48Z
servicemonitors.monitoring.coreos.com 2022-12-04T13:46:48Z
thanosrulers.monitoring.coreos.com 2022-12-04T13:46:49Z
通过operator安装prometheus
Helm install以后,会自动用operator在prometheus命名空间下安装一个Prometheus CR,可查看:
$ kubectl get Prometheus -n prometheus
NAME VERSION DESIRED READY RECONCILED AVAILABLE AGE
prometheus-stack-kube-prom-prometheus v2.39.1 1 1 True True 12m
如果没有安装,或者我们需要在其它ns下另外安装一个prometheus(适合于非集群管理员,拿不到管理员安装的prometheus),可提交一个如下CR
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
annotations:
meta.helm.sh/release-name: prometheus-stack
meta.helm.sh/release-namespace: prometheus
creationTimestamp: "2023-11-25T13:19:58Z"
generation: 1
labels:
app: kube-prometheus-stack-prometheus
app.kubernetes.io/instance: prometheus-stack
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/part-of: kube-prometheus-stack
app.kubernetes.io/version: 42.2.0
chart: kube-prometheus-stack-42.2.0
heritage: Helm
release: prometheus-stack
name: prometheus-1
namespace: prometheus
spec:
alerting:
alertmanagers:
- apiVersion: v2
name: prometheus-stack-kube-prom-alertmanager
namespace: prometheus
pathPrefix: /
port: http-web
enableAdminAPI: true
evaluationInterval: 30s
externalUrl: http://prometheus-1.prometheus:9090
hostNetwork: false
image: quay.io/prometheus/prometheus:v2.39.1
listenLocal: false
logFormat: logfmt
logLevel: info
paused: false
podMonitorNamespaceSelector: {}
podMonitorSelector:
matchLabels:
release: prometheus-stack
portName: http-web
probeNamespaceSelector: {}
probeSelector:
matchLabels:
release: prometheus-stack
replicas: 1
retention: 10d
routePrefix: /
ruleNamespaceSelector: {}
ruleSelector:
matchLabels:
release: prometheus-stack
scrapeInterval: 30s
securityContext:
fsGroup: 2000
runAsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
serviceAccountName: prometheus-stack-kube-prom-prometheus
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector:
matchLabels:
release: prometheus-stack
shards: 1
storage:
volumeClaimTemplate:
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 2Gi
storageClassName: nfs-storage
version: v2.39.1
walCompression: true
其余Prometheus CR的参数,可用explain命令查看
kubectl explain Prometheus
暴露服务
默认安装的prometheus服务都是ClusterIP,可用Ingress、NodePort或LoadBalancer暴露出来,比如用NodePort:
kubectl edit svc prometheus-stack-kube-prom-prometheus -n prometheus
将type: ClusterIP改为NodePort,查看端口
$ kubectl get svc -n prometheus
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 25m
prometheus-operated ClusterIP None <none> 9090/TCP 25m
prometheus-stack-grafana ClusterIP 10.100.46.155 <none> 80/TCP 25m
prometheus-stack-kube-prom-alertmanager ClusterIP 10.110.149.34 <none> 9093/TCP 25m
prometheus-stack-kube-prom-operator ClusterIP 10.105.139.158 <none> 443/TCP 25m
prometheus-stack-kube-prom-prometheus NodePort 10.109.41.102 <none> 9090:30633/TCP 25m
prometheus-stack-kube-state-metrics ClusterIP 10.96.77.114 <none> 8080/TCP 25m
prometheus-stack-prometheus-node-exporter ClusterIP 10.98.46.213 <none> 9100/TCP 25m
然后就可以打开prometheus页面了:
http://192.168.126.100:30633
部署后问题
kubelet 10250端口不通
解决办法:
- 修改kube-controller-manager.yaml,将–bind-address改为0.0.0.0
- 修改kube-scheduler.yaml,将–bind-address改为0.0.0.0
修改后解决
kube-proxy 10249端口不通
解决办法:
kubectl edit configmap kube-proxy -n kube-system
把metricsBindAddress修改成metricsBindAddress: 0.0.0.0:10249
重启kube-proxy pod:
kubectl get pods -n kube-system | grep kube-proxy |awk '{print $1}'|xargs kubectl delete pods -n kube-system
etcd 2181端口不通
解决办法:修改etcd.yaml
配置Grafana
Helm install好以后grafana已经自动装好,将prometheus-stack-grafana服务的类型改为NodePort后,查看grafana的admin密码:
kubectl get secret prometheus-stack-grafana -n prometheus -o jsonpath="{.data.admin-pass