Master 节点⌗

初始化集群前，kubeadm 会去 Google 的镜像仓库拉取镜像，如果无法正常访问，请参考《准备 Kubernetes 集群环境》。

sudo kubeadm init \
    --apiserver-advertise-address 10.0.8.81 \
    --apiserver-bind-port 6443 \
    --control-plane-endpoint cluster-endpoint \
    --kubernetes-version v1.25.2 \
    --service-cidr 10.96.0.0/16 \
    --pod-network-cidr 192.168.0.0/16
[init] Using Kubernetes version: v1.25.2
[preflight] Running pre-flight checks
	[WARNING HTTPProxy]: Connection to "https://10.0.8.81" uses proxy "http://10.0.8.18:8234". If that is not intended, adjust your proxy settings
	[WARNING HTTPProxyCIDR]: connection to "192.168.0.0/16" uses proxy "http://10.0.8.18:8234". This may lead to malfunctional cluster setup. Make sure that Pod and Services IP ranges specified correctly as exceptions in proxy configuration
	[WARNING SystemVerification]: missing optional cgroups: blkio
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [cluster-endpoint kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local s01] and IPs [10.96.0.1 10.0.8.81]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost s01] and IPs [10.0.8.81 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost s01] and IPs [10.0.8.81 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 27.503677 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node s01 as control-plane by adding the labels: [node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node s01 as control-plane by adding the taints [node-role.kubernetes.io/control-plane:NoSchedule]
[bootstrap-token] Using token: aog7zw.pigdvq7fzg1e4y5w
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of control-plane nodes by copying certificate authorities
and service account keys on each node and then running the following as root:

  kubeadm join cluster-endpoint:6443 --token aog7zw.pigdvq7fzg1e4y5w \
	--discovery-token-ca-cert-hash sha256:09149deed5c5697105c73c64168dd5d2e2e92fc565e94c04a61792f8012e514c \
	--control-plane

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join cluster-endpoint:6443 --token aog7zw.pigdvq7fzg1e4y5w \
	--discovery-token-ca-cert-hash sha256:09149deed5c5697105c73c64168dd5d2e2e92fc565e94c04a61792f8012e514c

切记保存好初始化集群成功后的输出结果，其中的 Token 后面在加入节点的时还会用到！默认情况下加入集群的 Token 有效期为 24 小时，过期后可以使用 kubeadm token create --print-join-command 来重新生成。

参数解释：

–apiserver-advertise-address 10.0.8.81: API 服务器所公布的其正在监听的 IP 地址。如果未设置，则使用默认网络接口。
–apiserver-bind-port 6443: API 服务器绑定的端口。
–control-plane-endpoint cluster-endpoint: 为控制平面指定一个稳定的 IP 地址或 DNS 名称。
–kubernetes-version v1.25.2: 为控制平面选择一个特定的 Kubernetes 版本。
–service-cidr 10.96.0.0/16: 为服务的虚拟 IP 地址另外指定 IP 地址段，默认：10.96.0.0/16。
–pod-network-cidr 192.168.0.0/16: 指明 pod 网络可以使用的 IP 地址段。如果设置了这个参数，控制平面将会为每一个节点自动分配 CIDRs，默认：192.168.0.0/16。

service-cidr 和 pod-network-cidr 两个网段不能有冲突，更不能与 apiserver-advertise-address 有冲突，否则可能存在一些奇怪的问题。

可以看到上面的输出中，第 10 行和第 11 行，因为安装 kubeadm, kubectl 和 kubelet 时需要访问 Google 的源，我在虚拟机 SSH Session 中使用了如下命令设置了代理：

export https_proxy=http://10.0.8.18:8234;export http_proxy=http://10.0.8.18:8234;export all_proxy=socks5://10.0.8.18:8235;export no_proxy=cluster-master,cluster-endpoint,10.96.0.1,localhost,127.0.0.1,::1

这导致验证时产生了一些警告，但并不影响集群的初始化，可以忽略，但是还是建议在初始化之前，将这些临时环境变量 unset 掉。

unset https_proxy http_proxy all_proxy no_proxy

拷贝 Kubernetes 配置文件⌗

执行上述初始化集群成功后的输出内容第 66~68 行的命令：

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

此时查看 Kubernetes 所启动的 Pod 中 coreDNS 应该是出于 Pending 状态，需要安装 CNI 后才能正常启动。

安装 Calico CNI⌗

我这里所使用的 CNI 的实现是 Calico，当前版本是 v3.24.1，更多内容可以参考官方文档：

kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.24.1/manifests/tigera-operator.yaml

如果在初始化集群时，pod-network-cidr 的值是默认的 192.168.0.0/16 这个网段，那么直接执行下面命令就可以了：

kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.24.1/manifests/custom-resources.yaml

否则需要将 custom-resources.yaml 配置文件下载下来，然后手动替换里面的 CIDR 设置：

curl -O https://raw.githubusercontent.com/projectcalico/calico/v3.24.1/manifests/custom-resources.yaml
cat custom-resources.yaml
# This section includes base Calico installation configuration.
# For more information, see: https://projectcalico.docs.tigera.io/master/reference/installation/api#operator.tigera.io/v1.Installation
apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
  name: default
spec:
  # Configures Calico networking.
  calicoNetwork:
    # Note: The ipPools section cannot be modified post-install.
    ipPools:
    - blockSize: 26
      cidr: 192.168.0.0/16
      encapsulation: VXLANCrossSubnet
      natOutgoing: Enabled
      nodeSelector: all()

---

# This section configures the Calico API server.
# For more information, see: https://projectcalico.docs.tigera.io/master/reference/installation/api#operator.tigera.io/v1.APIServer
apiVersion: operator.tigera.io/v1
kind: APIServer
metadata:
  name: default
spec: {}

修改第 15 行中的 cidr 值，与你初始化集群时的 pod-network-cidr 值保持一致，然后执行：

kubectl create -f custom-resources.yaml

如果之前给 Containerd 配置了代理，切记执行完上述命令后，将代理设置取消掉，否则将导致 Calico plugin 无法从 Kubernetes Service API 获取集群信息，从而无法启动 Pod！

查看集群 Pod 的运行情况⌗

安装好 Calico 后，使用如下命令查看集群的 Pod 运行情况 (每两秒更新一次)：

watch kubectl get pods -A

Every 2.0s: kubectl get pods -A                                     s01: Mon Oct  3 03:31:39 2022

NAMESPACE          NAME                                       READY   STATUS    RESTARTS      AGE
calico-apiserver   calico-apiserver-5c5d497dbc-cxb5q          1/1     Running   1 (10m ago)   14h
calico-apiserver   calico-apiserver-5c5d497dbc-gbh6m          1/1     Running   1 (10m ago)   14h
calico-system      calico-kube-controllers-85666c5b94-h7gnj   1/1     Running   1 (10m ago)   14h
calico-system      calico-node-6qrfk                          1/1     Running   1 (10m ago)   14h
calico-system      calico-typha-b84cfb796-ctzx2               1/1     Running   2 (10m ago)   14h
calico-system      calico-typha-b84cfb796-w9t7k               1/1     Running   1 (10m ago)   14h
calico-system      csi-node-driver-c94pg                      2/2     Running   2 (10m ago)   14h
kube-system        coredns-565d847f94-fm848                   1/1     Running   1 (10m ago)   15h
kube-system        coredns-565d847f94-tbhr2                   1/1     Running   1 (10m ago)   15h
kube-system        etcd-s01                                   1/1     Running   1 (10m ago)   15h
kube-system        kube-apiserver-s01                         1/1     Running   1 (10m ago)   15h
kube-system        kube-controller-manager-s01                1/1     Running   1 (10m ago)   15h
kube-system        kube-proxy-kmvzb                           1/1     Running   1 (10m ago)   15h
kube-system        kube-scheduler-s01                         1/1     Running   1 (10m ago)   15h
tigera-operator    tigera-operator-6675dc47f4-7w8gm           1/1     Running   1 (10m ago)   14h

当所有 Pod 的 STATUS 都是 Running 的时候，说明 Kubernetes 集群的 Master 节点已经初始化并启动完成。

加入工作节点⌗

工作节点基本上与 Master 节点配置差不多，请参考《准备 Kubernetes 集群环境》。

安装完 Containerd 和 Kubeadm 等 CLI 后，执行初始化 Master 节点时提供的加入节点命令：

kubeadm join cluster-endpoint:6443 --token aog7zw.pigdvq7fzg1e4y5w \
    --discovery-token-ca-cert-hash sha256:09149deed5c5697105c73c64168dd5d2e2e92fc565e94c04a61792f8012e514c

这是节点会拉取所需镜像，然后启动 Pod，此时可以去 Master 节点查看 Pod 的运行情况，如果所有 Pod 都出于 Running 状态，则表明节点加入成功：

watch kubectl get pods -A

Every 2.0s: kubectl get pods -A                                     s01: Mon Oct  3 03:31:39 2022

NAMESPACE          NAME                                       READY   STATUS    RESTARTS      AGE
calico-apiserver   calico-apiserver-5c5d497dbc-cxb5q          1/1     Running   1 (26m ago)   15h
calico-apiserver   calico-apiserver-5c5d497dbc-gbh6m          1/1     Running   1 (26m ago)   15h
calico-system      calico-kube-controllers-85666c5b94-h7gnj   1/1     Running   1 (26m ago)   15h
calico-system      calico-node-6qrfk                          1/1     Running   1 (26m ago)   14h
calico-system      calico-node-cnwdl                          1/1     Running   1 (26m ago)   15h
calico-system      calico-node-w8p2h                          1/1     Running   1 (26m ago)   14h
calico-system      calico-typha-b84cfb796-ctzx2               1/1     Running   2 (26m ago)   14h
calico-system      calico-typha-b84cfb796-w9t7k               1/1     Running   1 (26m ago)   15h
calico-system      csi-node-driver-c94pg                      2/2     Running   2 (26m ago)   15h
calico-system      csi-node-driver-jmmg2                      2/2     Running   2 (26m ago)   14h
calico-system      csi-node-driver-qmvpx                      2/2     Running   2 (26m ago)   14h
kube-system        coredns-565d847f94-fm848                   1/1     Running   1 (26m ago)   15h
kube-system        coredns-565d847f94-tbhr2                   1/1     Running   1 (26m ago)   15h
kube-system        etcd-s01                                   1/1     Running   1 (26m ago)   15h
kube-system        kube-apiserver-s01                         1/1     Running   1 (26m ago)   15h
kube-system        kube-controller-manager-s01                1/1     Running   1 (26m ago)   15h
kube-system        kube-proxy-kmvzb                           1/1     Running   1 (26m ago)   15h
kube-system        kube-proxy-w6swd                           1/1     Running   1 (26m ago)   14h
kube-system        kube-proxy-x7z96                           1/1     Running   1 (26m ago)   14h
kube-system        kube-scheduler-s01                         1/1     Running   1 (26m ago)   15h
tigera-operator    tigera-operator-6675dc47f4-7w8gm           1/1     Running   1 (26m ago)   15h

可以看到，上面是我又加入了两个 Node 后的所有 Pod 运行情况，每个节点会启动 calico-node 和 csi-node-driver 两个 Pod。

kubectl get nodes -o wide
NAME   STATUS   ROLES           AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
s01    Ready    control-plane   15h   v1.25.2   10.0.8.81     <none>        Ubuntu 22.04.1 LTS   5.15.0-48-generic   containerd://1.6.8
s02    Ready    <none>          14h   v1.25.2   10.0.8.82     <none>        Ubuntu 22.04.1 LTS   5.15.0-48-generic   containerd://1.6.8
s03    Ready    <none>          14h   v1.25.2   10.0.8.83     <none>        Ubuntu 22.04.1 LTS   5.15.0-48-generic   containerd://1.6.8

到这里整个 Kubernetes Cluster 就算是安装完成了，接下来就是安装 Dashboard 了……

I hope this is helpful, Happy hacking…

初始化 Kubernetes 集群并加入节点

Master 节点⌗

拷贝 Kubernetes 配置文件⌗

安装 Calico CNI⌗

查看集群 Pod 的运行情况⌗

加入工作节点⌗