在Ubuntu24 上安装 Kubernetes 集群的详细过程,包括:节点安装调试、网络插件安装、Dashboard安装、Nginx 部署测试。
基础安装
系统环境设置
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 sudo systemctl disable --now ufwsudo timedatectl set-timezone Asia/Shanghaisudo systemctl restart systemd-timesyncd.servicetimedatectl status sudo swapoff -asudo sed -i '/swap/d' /etc/fstabsudo apt install -y policycoreutilssestatus
系统参数配置
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 sudo vi /etc/hosts 192.168.159.200 master200 192.168.159.201 slave201 192.168.159.202 slave202 cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1 net.ipv4.ip_forward = 1 EOF sudo sysctl --systemcat <<EOF | sudo tee /etc/modules-load.d/k8s.conf overlay br_netfilter EOF sudo modprobe overlaysudo modprobe br_netfiltercat <<EOF | sudo tee /etc/sysctl.d/k8s.conf net.bridge.bridge-nf-call-iptables = 1 net.bridge.bridge-nf-call-ip6tables = 1 net.ipv4.ip_forward = 1 EOF sudo sysctl --systemsysctl net.bridge.bridge-nf-call-iptables net.bridge.bridge-nf-call-ip6tables net.ipv4.ip_forward lsmod | grep br_netfilter lsmod | grep overlay
安装containerd容器
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 sudo apt install -y containerdcontainerd -v sudo mkdir -p /etc/containerdcontainerd config default | sudo tee /etc/containerd/config.toml sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.tomlsudo vi /etc/containerd/config.tomlsudo systemctl enable --now containerdsudo systemctl restart containerdsudo systemctl enable containerdsudo systemctl status containerdctr image pull docker.io/library/nginx:latest ctr image ls ctr image rm docker.io/library/nginx:latest ctr image import image.tar ctr image export image.tar docker.io/library/nginx:latest ctr container create docker.io/library/nginx:latest my-nginx ctr task start my-nginx ctr task kill my-nginx ctr container rm my-nginx ctr task ls ctr task exec --tty --exec-id shell my-nginx /bin/sh ctr task logs my-nginx
安装Kubernetes组件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 sudo apt-get updatesudo apt-get install -y apt-transport-https ca-certificates curl gpgcurl -fsSL https://pkgs.k8s.io/core:/stable:/v1.28/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.28/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.listsudo apt-get updatesudo apt-get install -y kubelet kubeadm kubectlsudo apt-mark hold kubelet kubeadm kubectlkubeadm version
集群安装
初始化集群
在主节点上执行这行命令将主节点的镜像拉取下来:
1 2 3 4 5 sudo kubeadm config images pull \--image-repository=registry.aliyuncs.com/google_containers \ --kubernetes-version=v1.28.15 \ --cri-socket=unix:///run/containerd/containerd.sock
执行集群初始化,注意IP和版本以及网络地址(network、service 可保持默认,需要与其他对应)
1 2 3 4 5 6 7 sudo kubeadm init \--apiserver-advertise-address=192.168.159.200 \ --image-repository=registry.aliyuncs.com/google_containers \ --kubernetes-version=v1.28.15 \ --service-cidr=10.96.0.0/12 \ --pod-network-cidr=10.244.0.0/16 \ --cri-socket=unix:///run/containerd/containerd.sock
相关参数解释:
apiserver-advertise-address:集群广播地址,用 master 节点的内网 IP。
image-repository:由于默认拉取镜像地址 k8s.gcr.io 国内无法访问,这里指定阿里云镜像仓库地址。
kubernetes-version: K8s 版本,与上面安装的软件版本一致。
service-cidr:集群 Service 网段。
pod-network-cidr:集群 Pod 网段。
cri-socket:指定 cri-socket 接口,我们这里使用 unix:///var/run/cri-dockerd.sock。
初始化完成后如图所示:会出现两条命令,需要记录下来。
在主节点上执行以下操作:
1 2 3 4 mkdir -p $HOME /.kubesudo cp -i /etc/kubernetes/admin.conf $HOME /.kube/configsudo chown $(id -u):$(id -g) $HOME /.kube/config
节点安装
在所有工作节点上执行这行命令(注意修改为自己的token)
1 2 3 4 5 kubeadm join 192.168.159.200:6443 \ --token kxzrga.d74axaspi1patvof \ --discovery-token-ca-cert-hash sha256:062b1ad988e637ae9cbeaacf0a89a35fbb5fe582ebb55768fe0ac7c7e6f2ee45 \ --cri-socket=unix:///run/containerd/containerd.sock
集群查看
1 2 3 4 5 6 7 8 kubectl get nodes -o wide kubectl get nodes kubectl cluster-info
网络问题
couldn’t get current server API group list: Get “http://localhost:8080/api?timeout=32s ”: dial tcp 127.0.0.1:8080: connect: connection refused
1 2 3 4 5 echo "export KUBECONFIG=/etc/kubernetes/kubelet.conf" >> /etc/profilesource /etc/profile
NotReady(pod)
某些关键的 pod 没有运行起来,可以用以下方式调查原因:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 sudo kubectl get podssudo kubectl get pod --all-namespacessudo kubectl get pod -n kube-systemsudo kubectl describe pod coredns-66f779496c-fjq22 -n kube-systemsudo kubectl logs -f -n kube-system coredns-66f779496c-fjq22sudo kubectl get pod coredns-66f779496c-fjq22 -o yaml
资源查看:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 kubectl describe node slave201 |grep Taint sudo kubectl edit deployment coredns -n kube-systemsudo kubectl describe pod coredns-66f779496c-rhnb9 -n kube-system
问题 :coredns Pending
3 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.
解决 :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 sudo kubectl get pod --all-namespacessudo kubectl get pod -n kube-systemsudo kubectl get pods -n kube-system | grep corednssudo kubectl get svc -n kube-system | grep kube-dnssudo kubectl edit configmap coredns -n kube-systemsudo kubectl edit deployment coredns -n kube-systemsudo kubectl describe pod coredns-58fbbbd8c5-9f9hl -n kube-systemkubectl describe node master200 |grep Taint sudo kubectl taint node master200 env_role:NoSchedule-sudo kubectl taint node slave201 env_role:NoSchedule-sudo kubectl taint node slave202 env_role:NoSchedule-
启动
1 2 sudo systemctl restart kubeletsudo systemctl restart containerd
除了coredns必选外,其他三个插件选其一即可。
检查 kubelet 是否正常
CNI 插件依赖 kubelet,如果 kubelet 没有正确指定 CNI 目录,则会失败:
1 2 3 4 5 6 7 8 9 10 11 journalctl -u kubelet -f | grep cni systemctl restart kubelet ls -l /opt/cni/bin/ls -l /etc/cni/net.d/
coredns
DNS 是使用集群插件 管理器自动启动的内置的 Kubernetes 服务。
从 Kubernetes v1.12 开始,CoreDNS 是推荐的 DNS 服务器,取代了 kube-dns。 如果 你的集群原来使用 kube-dns,你可能部署的仍然是 kube-dns 而不是 CoreDNS。
重新安装 镜像
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 sudo kubectl delete deployment coredns -n kube-system \sudo kubectl delete configmaps coredns -n kube-systemsudo kubectl delete clusterrolebindings system:coredns \sudo kubectl delete clusterroles system:corednssudo kubectl delete serviceaccounts coredns -n kube-systemwget https://raw.githubusercontent.com/coredns/deployment/master/kubernetes/deploy.sh chmod +x deploy.shapt -y install jq wget https://raw.githubusercontent.com/coredns/deployment/master/kubernetes/coredns.yaml.sed ./deploy.sh -i 10.96.0.10 > coredns.yaml sudo kubectl apply -f coredns.yamlsudo kubectl get pod --all-namespacessudo kubectl get pods -n kube-system -wsudo kubectl get pods -n kube-system -l k8s-app=kube-dnsvim busybox.yaml apiVersion: v1 kind: Pod metadata: name: busybox namespace: default spec: containers: - name: busybox image: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/quay.io/prometheus/busybox command : - sleep - "3600" imagePullPolicy: IfNotPresent restartPolicy: Always sudo kubectl apply -f busybox.yamlsudo watch kubectl get pods busyboxsudo kubectl exec -ti busybox -- nslookup kubernetes.default
由 CoreOS 开发的项目 Flannel,可能是最直接和最受欢迎的 CNI 插件。
它是容器编排系统中最成熟的网络结构示例之一,旨在实现更好的容器间和主机间网络。随着 CNI 概念的兴起,Flannel CNI 插件算是早期的入门。
与其他方案相比,Flannel 相对容易安装和配置。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 curl -LO https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml sudo kubectl apply -f kube-flannel.ymlsudo watch kubectl get all -o wide -n kube-flannelsudo kubectl get pod -n kube-flannelsudo kubectl get pod --all-namespaceskubectl get nodes kubectl get nodes -o wide
Calico 是 Kubernetes 生态系统中另一种流行的网络选择。
虽然 Flannel 被公认为是最简单的选择,但 Calico 以其性能、灵活性而闻名。Calico 的功能更为全面,不仅提供主机和 pod 之间的网络连接,还涉及网络安全和管理。Calico CNI 插件在 CNI 框架内封装了 Calico 的功能。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 curl -LO https://raw.githubusercontent.com/projectcalico/calico/v3.30.2/manifests/tigera-operator.yaml sudo kubectl apply -f tigera-operator.yamlsudo kubectl delete -f tigera-operator.yamlcurl -LO https://raw.githubusercontent.com/projectcalico/calico/v3.30.2/manifests/custom-resources.yaml sed -i 's/cidr: 192.168.0.0/cidr: 10.244.0.0/g' custom-resources.yaml sudo kubectl apply -f custom-resources.yamlsudo kubectl delete -f custom-resources.yamlsudo watch kubectl get all -o wide -n calico-systemsudo kubectl describe pod csi-node-driver-ks4k9 -n calico-systemsudo kubectl get pod -n calico-systemsudo kubectl get pod --all-namespaces
cilium
1 kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/v1.12/install/kubernetes/quick-install.yaml
一定要与k8s版本对应,在官网查看支持的版本。
安装插件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 kubectl version wget https://raw.githubusercontent.com/kubernetes/dashboard/v2.7.0/aio/deploy/recommended.yaml spec: type : NodePort ports: - port: 443 targetPort: 8443 nodePort: 30012 sudo kubectl apply -f recommended.yamlsudo kubectl get all -n kubernetes-dashboardsudo kubectl get pod,svc -n kubernetes-dashboard
获取token:方式一
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 vim admin-token.yaml apiVersion: v1 kind: ServiceAccount metadata: name: admin namespace: kubernetes-dashboard --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: admin roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: cluster-admin subjects: - kind: ServiceAccount name: admin namespace: kubernetes-dashboard --- apiVersion: v1 kind: Secret metadata: name: kubernetes-dashboard-token namespace: kubernetes-dashboard annotations: kubernetes.io/service-account.name: "admin" type : kubernetes.io/service-account-tokensudo kubectl apply -f admin-token.yamlsudo kubectl get secret -n kubernetes-dashboardsudo kubectl describe secret kubernetes-dashboard-token -n kubernetes-dashboardsudo kubectl -n kubernetes-dashboard describe secret $(kubectl -n kubernetes-dashboard get secret | grep admin-user | awk '{print $1}' )
获取token:方式二
1 2 3 4 5 6 7 8 9 kubectl create serviceaccount dashboard-admin-sa -n kubernetes-dashboard kubectl create clusterrolebinding dashboard-admin-sa-binding \ --clusterrole=cluster-admin \ --serviceaccount=kubernetes-dashboard:dashboard-admin-sa kubectl -n kubernetes-dashboard get secret $(kubectl -n kubernetes-dashboard get sa dashboard-admin-sa -o jsonpath="{.secrets[0].name}" ) -o jsonpath="{.data.token}" | base64 --decode
常见问题及解答
问题
答案
1. 创建Service Account时提示权限不足
确保你有足够的权限在集群中创建Service Account。如果你是集群管理员,可以尝试使用cluster-admin权限。
2. 获取Token信息时提示找不到Secret
确认Service Account名称是否正确,且是否已经成功创建。可以通过kubectl get sa命令查看Service Account列表。
3. 登录时提示Token无效
确认Token信息是否正确复制,且Token是否已经过期。Token的有效期通常是24小时,过期后需要重新创建。
4. 无法打开Kubernetes Dashboard
确认Kubernetes Dashboard是否已经正确安装并启动。可以通过kubectl get pods -n kubernetes-dashboard命令查看Dashboard的运行状态。
5. 登录后无法看到集群资源
确认Service Account是否已经绑定到正确的ClusterRole,且ClusterRole是否拥有足够的权限。
效果展示
部署nginx
vim nginx-deploy.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 apiVersion: apps/v1 kind: Deployment metadata: labels: app: nginx-deploy name: nginx-deploy spec: replicas: 1 selector: matchLabels: app: nginx-deploy template: metadata: labels: app: nginx-deploy spec: containers: - image: registry.cn-shenzhen.aliyuncs.com/xiaohh-docker/nginx:1.25.4 name: nginx ports: - containerPort: 80 --- apiVersion: v1 kind: Service metadata: labels: app: nginx-deploy name: nginx-svc spec: ports: - port: 80 protocol: TCP targetPort: 80 nodePort: 30080 selector: app: nginx-deploy type: NodePort
部署并查看
1 2 3 4 5 6 7 8 9 sudo kubectl apply -f nginx-deploy.yamlsudo kubectl get all -o widekubectl get svc kubectl describe service nginx-svc kubectl get nodes