根据是否编写代码,我们可以把自定义调度器的方式分为两种:
- 不写代码,调整组合已有的默认插件,从而定义新的调度器
- 实现接口代码,自定义开发调度器
本文将会描述第一种方式,通过调整默认插件的方式快速定义一个新的调度器。
自定义调度器示例
默认插件 NodeResourcesFit 有三种评分策略:
- LeastAllocated(默认):优先选择资源使用率最低的节点
- MostAllocated:优先选择资源使用率较高的节点从而最大化节点资源使用率
- RequestedToCapacityRatio:平衡节点的资源使用率
默认插件 VolumeBinding 绑定卷的默认超时时间是 600 秒。
示例中,我们将自定义一个调度器,将 NodeResourcesFit 的评分策略配置为 MostAllocated,VolumeBinding 的超时时间配置为 60 秒。
配置 KubeSchedulerConfiguration
首先,通过 KubeSchedulerConfiguration 对象自定义了一个调度器,叫做 my-custom-scheduler:
apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
- schedulerName: my-custom-scheduler # 调度器名称
plugins:
score:
enabled:
- name: NodeResourcesFit
weight: 1
pluginConfig:
- name: NodeResourcesFit
args:
scoringStrategy:
type: MostAllocated
resources:
- name: cpu
weight: 1
- name: memory
weight: 1
- name: VolumeBinding
args:
bindTimeoutSeconds: 60
由于 KubeSchedulerConfiguration 对象本质上是 kube-scheduler 的配置文件,为了后续部署的时候方便使用,可以通过定义一个 ConfigMap 包含 KubeSchedulerConfiguration 的内容:
apiVersion: v1
kind: ConfigMap
metadata:
name: my-scheduler-config
namespace: kube-system
data:
my-scheduler-config.yaml: |
apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
- schedulerName: my-custom-scheduler # 调度器名称
plugins:
score:
enabled:
- name: NodeResourcesFit
weight: 1
pluginConfig:
- name: NodeResourcesFit
args:
scoringStrategy:
type: MostAllocated
resources:
- name: cpu
weight: 1
- name: memory
weight: 1
- name: VolumeBinding
args:
bindTimeoutSeconds: 60
创建一个 SA 并绑定 ClusterRole
kubectl create sa my-scheduler
kubectl create clusterrolebinding my-scheduler --clusterrole cluster-admin --serviceaccount="kube-system:my-scheduler"
用 Deployment 部署自定义调度器
然后,我们需要部署该 自定义调度器
,为此可以定义一个 Deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-custom-kube-scheduler
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
component: my-custom-kube-scheduler
template:
metadata:
labels:
component: my-custom-kube-scheduler
spec:
serviceAccountName: my-scheduler
hostAliases:
- ip: "192.168.126.188"
hostnames:
- apiserver.cluster.local
containers:
- command:
- kube-scheduler
- --authentication-kubeconfig=/etc/kubernetes/scheduler.conf
- --authorization-kubeconfig=/etc/kubernetes/scheduler.conf
- --bind-address=0.0.0.0
- --kubeconfig=/etc/kubernetes/scheduler.conf
- --leader-elect=false
- --config=/etc/kubernetes/my-scheduler-config.yaml
- -v=5
image: registry.k8s.io/kube-scheduler:v1.29.1
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 8
httpGet:
path: /healthz
port: 10259
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 15
name: kube-scheduler
resources:
requests:
cpu: 100m
startupProbe:
failureThreshold: 24
httpGet:
path: /healthz
port: 10259
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 15
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/kubernetes/scheduler.conf
name: kubeconfig
readOnly: true
- mountPath: /etc/localtime
name: localtime
readOnly: true
- name: my-scheduler-config
mountPath: /etc/kubernetes/my-scheduler-config.yaml
subPath: my-scheduler-config.yaml
volumes:
- hostPath:
path: /etc/kubernetes/scheduler.conf
type: FileOrCreate
name: kubeconfig
- hostPath:
path: /etc/localtime
type: File
name: localtime
- name: my-scheduler-config
configMap:
name: my-scheduler-config
nodeName: k8s-master
注意几个地方:
- 基本上是从现有的 scheduler 的 yaml 抄过来的
- 不知道为何启动参数
--kubeconfig=/etc/kubernetes/scheduler.conf
没起作用,看日志报Neither --kubeconfig nor --master was specified. Using the inClusterConfig.
,因此干脆创建了新的服务账户my-scheduler
(见上一步),然后在 Deployment 中用serviceAccountName: my-scheduler
指定 - 需要将 CongfigMap
my-scheduler-config
挂载到 工作负载的/etc/kubernetes/my-scheduler-config.yaml
,然后启动时加上参数--config=/etc/kubernetes/my-scheduler-config.yaml
- 默认的 scheduler 使用的是
hostNetwork: true
,但这个自定义的 scheduler 不能再用 hostNetwork 了,否则会报端口冲突
现在可以启动自定义 scheduler 了。
到这里你一定会问:自己部署的自定义调度器与已经存在的默认调度器会有冲突吗?答案是不会。只要 schedulerName 不同就不会有冲突,两个调度器各跑各的。通过自己部署的方式也避免了对默认调度器的任何干预。
验证
我们通过部署两个 pod ,分别使用默认调度器 default-scheduler 和我们的自定义调度器 my-custom-scheduler:
apiVersion: v1
kind: Pod
metadata:
name: nginx-default
spec:
schedulerName: default-scheduler
containers:
- image: registry.cn-beijing.aliyuncs.com/fpf_devops/nginx:1.24
name: nginx
---
apiVersion: v1
kind: Pod
metadata:
name: nginx-custom
spec:
schedulerName: my-custom-scheduler
containers:
- image: registry.cn-beijing.aliyuncs.com/fpf_devops/nginx:1.24
name: nginx
然后观察自定义调度器的日志:
从图中可以看到,自定义调度器工作正常,顺利完成了 pod 的调度,且与默认调度器互不影响。