K8S二次开发调度器系列之三:自定义开发filter调度器


根据是否编写代码,我们可以把自定义调度器的方式分为两种:

  • 不写代码,调整组合已有的默认插件,从而定义新的调度器
  • 实现接口代码,自定义开发调度器

本文将会描述第二种方式,编写一个 filter 类型的调度器。

深度解析scheduler原理 这篇文章中已经讲解过,要实现一个 filter 类型的调度器,主要是实现 FilterPluginPlugin 这两个接口:

// https://github.com/kubernetes/kubernetes/blob/v1.28.4/pkg/scheduler/framework/interface.go#L349C1-L367C2

// FilterPlugin is an interface for Filter plugins. These plugins are called at the
// filter extension point for filtering out hosts that cannot run a pod.
// This concept used to be called 'predicate' in the original scheduler.
// These plugins should return "Success", "Unschedulable" or "Error" in Status.code.
// However, the scheduler accepts other valid codes as well.
// Anything other than "Success" will lead to exclusion of the given host from running the pod.
type FilterPlugin interface {
    Plugin
    // Filter is called by the scheduling framework.
    // All FilterPlugins should return "Success" to declare that
    // the given node fits the pod. If Filter doesn't return "Success",
    // it will return "Unschedulable", "UnschedulableAndUnresolvable" or "Error".
    // For the node being evaluated, Filter plugins should look at the passed
    // nodeInfo reference for this particular node's information (e.g., pods
    // considered to be running on the node) instead of looking it up in the
    // NodeInfoSnapshot because we don't guarantee that they will be the same.
    // For example, during preemption, we may pass a copy of the original
    // nodeInfo object that has some pods removed from it to evaluate the
    // possibility of preempting them to schedule the target pod.
    Filter(ctx , state *CycleState, pod *v1.Pod, nodeInfo *NodeInfo) *Status
}
type Plugin interface {
  Name() string
}

也就是要实现 FilterName 这两个方法。

下来我们编写一个简单的调度器,过滤掉 label 含有 hello 字样的 Node。

编写自定义调度器

创建一个空白目录 scheduler/node-filter-label 并初始化:

mkdir -p scheduler/node-filter-labe
cd scheduler && go mod init myscheduler

创建目录 manifests 用于存放 KubeSchedulerConfiguration ,创建目录 plugins 存放插件代码,最终目录结构如下:

tree scheduler/
scheduler/
├── go.mod
├── go.sum
├── node-filter-label
│   ├── main.go
│   ├── manifests
│   │   └── nodelabelfilter.yaml
│   └── plugins
│       └── node_filter_label.go

编写 node_filter_label.go

package plugins

import (
	"context"

	v1 "k8s.io/api/core/v1"
	"k8s.io/apimachinery/pkg/runtime"
	"k8s.io/klog/v2"
	"k8s.io/kubernetes/pkg/scheduler/framework"
)

const SchedulerName = "NodeFilterLabel"

type NodeFilterLabel struct{}

func (pl *NodeFilterLabel) Name() string {
	return SchedulerName
}

func (pl *NodeFilterLabel) Filter(ctx context.Context, _ *framework.CycleState, pod *v1.Pod, nodeInfo *framework.NodeInfo) *framework.Status {
	for k, v := range nodeInfo.Node().ObjectMeta.Labels {
		if k == "node" && v == "hello" {
			klog.InfoS("node failed to pass NodeFilterLabel filter", "pod_name", pod.Name, "current node", nodeInfo.Node().Name)
			return framework.NewStatus(framework.UnschedulableAndUnresolvable, "node has label node=hello")
		}
	}
	klog.InfoS("node pass NodeFilterLabel filter", "pod_name", pod.Name, "current node", nodeInfo.Node().Name)
	return nil
}

func New(_ context.Context, _ runtime.Object, _ framework.Handle) (framework.Plugin, error) {
	return &NodeFilterLabel{}, nil
}

编写 main.go

package main

import (
	"os"

	"myscheduler/plugins"

	"k8s.io/component-base/cli"
	"k8s.io/component-base/logs"
	_ "k8s.io/component-base/metrics/prometheus/clientgo"
	_ "k8s.io/component-base/metrics/prometheus/version" // for version metric registration
	"k8s.io/kubernetes/cmd/kube-scheduler/app"
)

func main() {
	command := app.NewSchedulerCommand(app.WithPlugin(plugins.SchedulerName, plugins.New))
	logs.InitLogs()
	defer logs.FlushLogs()

	code := cli.Run(command)
	os.Exit(code)
}

编写 KubeSchedulerConfiguration

apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
clientConnection:
  kubeconfig: "/etc/kubernetes/scheduler.conf"
profiles:
- schedulerName: nodelabelfilter
  plugins:
    filter:
      enabled:
      - name: NodeFilterLabel
      disabled:
      - name: "*"

编译

注意,通过 go mod tidy 生成的 go.mod 和 go.sum 无法直接使用,会报如下错误:

unknown revision v0.0.0’ errors, seemingly due to 'require k8s.io/foo v0.0.0

我们需要从 kubernetes-sigs/scheduler-plugins 找到对应的 k8s 版本(比如1.29)并复制该分支的 go.mod 过来。

go build -o nodelabelfilter main.go

本地调试运行

编写好以后,本地调试可以直接找台机器启动,无需做成工作负载或者 Pod 放到 k8s 中运行,只要这台机器有 scheduler.conf(可以从 kube-master 机器获取) 且能访问到 k8s 集群。

注意,如果将代码放到 kube-master 节点,可能会和上面默认的 scheduler 端口冲突

./nodelabelfilter --leader-elect=false --config nodelabelfilter.yaml
I1117 09:03:19.700978   25785 serving.go:380] Generated self-signed cert in-memory
W1117 09:03:20.057352   25785 authentication.go:339] No authentication-kubeconfig provided in order to lookup client-ca-file in configmap/extension-apiserver-authentication in kube-system, so client certificate authentication won't work.
W1117 09:03:20.057402   25785 authentication.go:363] No authentication-kubeconfig provided in order to lookup requestheader-client-ca-file in configmap/extension-apiserver-authentication in kube-system, so request-header client certificate authentication won't work.
W1117 09:03:20.057418   25785 authorization.go:193] No authorization-kubeconfig provided, so SubjectAccessReview of authorization tokens won't work.
I1117 09:03:20.073613   25785 server.go:154] "Starting Kubernetes Scheduler" version="v0.0.0-master+$Format:%H$"
I1117 09:03:20.073977   25785 server.go:156] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
I1117 09:03:20.079517   25785 secure_serving.go:213] Serving securely on [::]:10259
I1117 09:03:20.081147   25785 tlsconfig.go:240] "Starting DynamicServingCertificateController"

测试

假设集群中有 1 个 master,2 个 node,给 master 和 node1 添加 label node=hello

kubectl label nodes k8s-master node=hello
kubectl label nodes k8s-node2 node=hello

创建一个 Pod,观察调度情况:

apiVersion: v1
kind: Pod
metadata:
  name: nginx-labelfilter
spec:
  schedulerName: nodelabelfilter
  containers:
  - image: registry.cn-beijing.aliyuncs.com/fpf_devops/nginx:1.24
    name: nginx

查看自定义插件的日志:

I1117 09:08:32.807891   25785 node_filter_label.go:27] "node pass NodeFilterLabel filter" pod_name="nginx-labelfilter" current node="k8s-node1"
I1117 09:08:32.807932   25785 node_filter_label.go:23] "node failed to pass NodeFilterLabel filter" pod_name="nginx-labelfilter" current node="k8s-node2"
I1117 09:08:32.807888   25785 node_filter_label.go:23] "node failed to pass NodeFilterLabel filter" pod_name="nginx-labelfilter" current node="k8s-master"

可以看到 master 和 node2 两个 node 被过滤掉了,不能调度,只能调度到 node1 上。

部署自定义调度器

构建镜像

v1.24 版以后的 k8s 底层从 docker 换成了 containerd,自然也就没有了以前的 docker build 命令了。需要用 containerd 对应的构建命令。

containerd 有一个子项目 nerdctl,用来兼容 docker cli,可以像 docker 命令一样来管理本地的镜像和容器。

nerdctl 包含精简版和完整版:

  • 精简版仅有 nerdctl 命令,无法使用 nerdctl build 命令,执行 nerdctl build 会报错
  • 完整版不仅有 netdctl 命令,还包含了 buildkitdbuildctlctrrunccontainerd 相关的命令,以及 cni 插件的二进制文件

下载后解压,将 bin 下的 nerdctlbuildkitd 拷贝或软链到 /usr/local/bin/,将 lib 下的 systemd/system/buildkit.service 拷贝到 /etc/systemd/system/ 下,启动 buildkitd:

systemctl enable buildkit.service --now

编写 Dockerfile

FROM registry.cn-beijing.aliyuncs.com/fpf_devops/golang:1.23.0
WORKDIR /opt
ADD nodelabelfilter /opt
ENTRYPOINT ["/opt/nodelabelfilter", "--leader-elect=false", "--config=/opt/nodelabelfilter.yaml"]

构建镜像:

nerdctl build -t nodelabelfilter:v1.0.0 .

部署

用一个工作负载部署该镜像,参见用 Deployment 部署自定义调度器


文章作者: 洪宇轩
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 洪宇轩 !
评论
  目录