Master 节点

初始化集群前,kubeadm 会去 Google 的镜像仓库拉取镜像,如果无法正常访问,请参考《准备 Kubernetes 集群环境》

sudo kubeadm init \
    --apiserver-advertise-address 10.0.8.81 \
    --apiserver-bind-port 6443 \
    --control-plane-endpoint cluster-endpoint \
    --kubernetes-version v1.25.2 \
    --service-cidr 10.96.0.0/16 \
    --pod-network-cidr 192.168.0.0/16
[init] Using Kubernetes version: v1.25.2
[preflight] Running pre-flight checks
	[WARNING HTTPProxy]: Connection to "https://10.0.8.81" uses proxy "http://10.0.8.18:8234". If that is not intended, adjust your proxy settings
	[WARNING HTTPProxyCIDR]: connection to "192.168.0.0/16" uses proxy "http://10.0.8.18:8234". This may lead to malfunctional cluster setup. Make sure that Pod and Services IP ranges specified correctly as exceptions in proxy configuration
	[WARNING SystemVerification]: missing optional cgroups: blkio
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [cluster-endpoint kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local s01] and IPs [10.96.0.1 10.0.8.81]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost s01] and IPs [10.0.8.81 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost s01] and IPs [10.0.8.81 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 27.503677 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node s01 as control-plane by adding the labels: [node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node s01 as control-plane by adding the taints [node-role.kubernetes.io/control-plane:NoSchedule]
[bootstrap-token] Using token: aog7zw.pigdvq7fzg1e4y5w
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of control-plane nodes by copying certificate authorities
and service account keys on each node and then running the following as root:

  kubeadm join cluster-endpoint:6443 --token aog7zw.pigdvq7fzg1e4y5w \
	--discovery-token-ca-cert-hash sha256:09149deed5c5697105c73c64168dd5d2e2e92fc565e94c04a61792f8012e514c \
	--control-plane

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join cluster-endpoint:6443 --token aog7zw.pigdvq7fzg1e4y5w \
	--discovery-token-ca-cert-hash sha256:09149deed5c5697105c73c64168dd5d2e2e92fc565e94c04a61792f8012e514c

切记保存好初始化集群成功后的输出结果,其中的 Token 后面在加入节点的时还会用到!默认情况下加入集群的 Token 有效期为 24 小时,过期后可以使用 kubeadm token create --print-join-command 来重新生成。

参数解释:

  • –apiserver-advertise-address 10.0.8.81: API 服务器所公布的其正在监听的 IP 地址。如果未设置,则使用默认网络接口。
  • –apiserver-bind-port 6443: API 服务器绑定的端口。
  • –control-plane-endpoint cluster-endpoint: 为控制平面指定一个稳定的 IP 地址或 DNS 名称。
  • –kubernetes-version v1.25.2: 为控制平面选择一个特定的 Kubernetes 版本。
  • –service-cidr 10.96.0.0/16: 为服务的虚拟 IP 地址另外指定 IP 地址段,默认:10.96.0.0/16。
  • –pod-network-cidr 192.168.0.0/16: 指明 pod 网络可以使用的 IP 地址段。如果设置了这个参数,控制平面将会为每一个节点自动分配 CIDRs,默认:192.168.0.0/16。

service-cidrpod-network-cidr 两个网段不能有冲突,更不能与 apiserver-advertise-address 有冲突,否则可能存在一些奇怪的问题。

可以看到上面的输出中,第 10 行和第 11 行,因为安装 kubeadm, kubectl 和 kubelet 时需要访问 Google 的源,我在虚拟机 SSH Session 中使用了如下命令设置了代理:

export https_proxy=http://10.0.8.18:8234;export http_proxy=http://10.0.8.18:8234;export all_proxy=socks5://10.0.8.18:8235;export no_proxy=cluster-master,cluster-endpoint,10.96.0.1,localhost,127.0.0.1,::1

这导致验证时产生了一些警告,但并不影响集群的初始化,可以忽略,但是还是建议在初始化之前,将这些临时环境变量 unset 掉。

unset https_proxy http_proxy all_proxy no_proxy

拷贝 Kubernetes 配置文件

执行上述初始化集群成功后的输出内容第 66~68 行的命令:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

此时查看 Kubernetes 所启动的 Pod 中 coreDNS 应该是出于 Pending 状态,需要安装 CNI 后才能正常启动。

安装 Calico CNI

我这里所使用的 CNI 的实现是 Calico,当前版本是 v3.24.1,更多内容可以参考官方文档

kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.24.1/manifests/tigera-operator.yaml

如果在初始化集群时,pod-network-cidr 的值是默认的 192.168.0.0/16 这个网段,那么直接执行下面命令就可以了:

kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.24.1/manifests/custom-resources.yaml

否则需要将 custom-resources.yaml 配置文件下载下来,然后手动替换里面的 CIDR 设置:

curl -O https://raw.githubusercontent.com/projectcalico/calico/v3.24.1/manifests/custom-resources.yaml
cat custom-resources.yaml
# This section includes base Calico installation configuration.
# For more information, see: https://projectcalico.docs.tigera.io/master/reference/installation/api#operator.tigera.io/v1.Installation
apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
  name: default
spec:
  # Configures Calico networking.
  calicoNetwork:
    # Note: The ipPools section cannot be modified post-install.
    ipPools:
    - blockSize: 26
      cidr: 192.168.0.0/16
      encapsulation: VXLANCrossSubnet
      natOutgoing: Enabled
      nodeSelector: all()

---

# This section configures the Calico API server.
# For more information, see: https://projectcalico.docs.tigera.io/master/reference/installation/api#operator.tigera.io/v1.APIServer
apiVersion: operator.tigera.io/v1
kind: APIServer
metadata:
  name: default
spec: {}

修改第 15 行中的 cidr 值,与你初始化集群时的 pod-network-cidr 值保持一致,然后执行:

kubectl create -f custom-resources.yaml

如果之前给 Containerd 配置了代理,切记执行完上述命令后,将代理设置取消掉,否则将导致 Calico plugin 无法从 Kubernetes Service API 获取集群信息,从而无法启动 Pod!

查看集群 Pod 的运行情况

安装好 Calico 后,使用如下命令查看集群的 Pod 运行情况 (每两秒更新一次):

watch kubectl get pods -A

Every 2.0s: kubectl get pods -A                                     s01: Mon Oct  3 03:31:39 2022

NAMESPACE          NAME                                       READY   STATUS    RESTARTS      AGE
calico-apiserver   calico-apiserver-5c5d497dbc-cxb5q          1/1     Running   1 (10m ago)   14h
calico-apiserver   calico-apiserver-5c5d497dbc-gbh6m          1/1     Running   1 (10m ago)   14h
calico-system      calico-kube-controllers-85666c5b94-h7gnj   1/1     Running   1 (10m ago)   14h
calico-system      calico-node-6qrfk                          1/1     Running   1 (10m ago)   14h
calico-system      calico-typha-b84cfb796-ctzx2               1/1     Running   2 (10m ago)   14h
calico-system      calico-typha-b84cfb796-w9t7k               1/1     Running   1 (10m ago)   14h
calico-system      csi-node-driver-c94pg                      2/2     Running   2 (10m ago)   14h
kube-system        coredns-565d847f94-fm848                   1/1     Running   1 (10m ago)   15h
kube-system        coredns-565d847f94-tbhr2                   1/1     Running   1 (10m ago)   15h
kube-system        etcd-s01                                   1/1     Running   1 (10m ago)   15h
kube-system        kube-apiserver-s01                         1/1     Running   1 (10m ago)   15h
kube-system        kube-controller-manager-s01                1/1     Running   1 (10m ago)   15h
kube-system        kube-proxy-kmvzb                           1/1     Running   1 (10m ago)   15h
kube-system        kube-scheduler-s01                         1/1     Running   1 (10m ago)   15h
tigera-operator    tigera-operator-6675dc47f4-7w8gm           1/1     Running   1 (10m ago)   14h

当所有 Pod 的 STATUS 都是 Running 的时候,说明 Kubernetes 集群的 Master 节点已经初始化并启动完成。

加入工作节点

工作节点基本上与 Master 节点配置差不多,请参考《准备 Kubernetes 集群环境》

安装完 Containerd 和 Kubeadm 等 CLI 后,执行初始化 Master 节点时提供的加入节点命令:

kubeadm join cluster-endpoint:6443 --token aog7zw.pigdvq7fzg1e4y5w \
    --discovery-token-ca-cert-hash sha256:09149deed5c5697105c73c64168dd5d2e2e92fc565e94c04a61792f8012e514c

这是节点会拉取所需镜像,然后启动 Pod,此时可以去 Master 节点查看 Pod 的运行情况,如果所有 Pod 都出于 Running 状态,则表明节点加入成功:

watch kubectl get pods -A

Every 2.0s: kubectl get pods -A                                     s01: Mon Oct  3 03:31:39 2022

NAMESPACE          NAME                                       READY   STATUS    RESTARTS      AGE
calico-apiserver   calico-apiserver-5c5d497dbc-cxb5q          1/1     Running   1 (26m ago)   15h
calico-apiserver   calico-apiserver-5c5d497dbc-gbh6m          1/1     Running   1 (26m ago)   15h
calico-system      calico-kube-controllers-85666c5b94-h7gnj   1/1     Running   1 (26m ago)   15h
calico-system      calico-node-6qrfk                          1/1     Running   1 (26m ago)   14h
calico-system      calico-node-cnwdl                          1/1     Running   1 (26m ago)   15h
calico-system      calico-node-w8p2h                          1/1     Running   1 (26m ago)   14h
calico-system      calico-typha-b84cfb796-ctzx2               1/1     Running   2 (26m ago)   14h
calico-system      calico-typha-b84cfb796-w9t7k               1/1     Running   1 (26m ago)   15h
calico-system      csi-node-driver-c94pg                      2/2     Running   2 (26m ago)   15h
calico-system      csi-node-driver-jmmg2                      2/2     Running   2 (26m ago)   14h
calico-system      csi-node-driver-qmvpx                      2/2     Running   2 (26m ago)   14h
kube-system        coredns-565d847f94-fm848                   1/1     Running   1 (26m ago)   15h
kube-system        coredns-565d847f94-tbhr2                   1/1     Running   1 (26m ago)   15h
kube-system        etcd-s01                                   1/1     Running   1 (26m ago)   15h
kube-system        kube-apiserver-s01                         1/1     Running   1 (26m ago)   15h
kube-system        kube-controller-manager-s01                1/1     Running   1 (26m ago)   15h
kube-system        kube-proxy-kmvzb                           1/1     Running   1 (26m ago)   15h
kube-system        kube-proxy-w6swd                           1/1     Running   1 (26m ago)   14h
kube-system        kube-proxy-x7z96                           1/1     Running   1 (26m ago)   14h
kube-system        kube-scheduler-s01                         1/1     Running   1 (26m ago)   15h
tigera-operator    tigera-operator-6675dc47f4-7w8gm           1/1     Running   1 (26m ago)   15h

可以看到,上面是我又加入了两个 Node 后的所有 Pod 运行情况,每个节点会启动 calico-nodecsi-node-driver 两个 Pod。

kubectl get nodes -o wide
NAME   STATUS   ROLES           AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
s01    Ready    control-plane   15h   v1.25.2   10.0.8.81     <none>        Ubuntu 22.04.1 LTS   5.15.0-48-generic   containerd://1.6.8
s02    Ready    <none>          14h   v1.25.2   10.0.8.82     <none>        Ubuntu 22.04.1 LTS   5.15.0-48-generic   containerd://1.6.8
s03    Ready    <none>          14h   v1.25.2   10.0.8.83     <none>        Ubuntu 22.04.1 LTS   5.15.0-48-generic   containerd://1.6.8

到这里整个 Kubernetes Cluster 就算是安装完成了,接下来就是安装 Dashboard 了……

I hope this is helpful, Happy hacking…