免费无码高潮又爽又刺激,伊人色综合网站精品无码

關(guān)于Kubernetes集群中常見問題的排查方法的一些筆記

寫在前面
學(xué)習(xí)K8s，所以整理記憶
文章由《Kubernetes權(quán)威指南:從Docker到Kubernetes實踐全接觸》讀書學(xué)習(xí)整理而來
「一切時代的藝術(shù)都在努力為我們內(nèi)心那神圣的無聲的欲望提供語言。 ——赫爾曼·黑塞《彼得·卡門青》」

?
「因為沒有具體的Demo，所以文章有些空，類似于一些指導(dǎo)思想，讀著乏味，這里先列出干貨：一些查問題的網(wǎng)站，關(guān)于內(nèi)容之后有機會在補充相關(guān)的案例，如果解決問題，時間緊張的小伙伴還是針對問題描述下面的平臺里找找」

?
查問題的網(wǎng)站
Kubernetes官網(wǎng)中監(jiān)控、記錄和調(diào)試相關(guān)問題: https://kubernetes.io/docs/tasks/debug-application-cluster/
Kubernetes官方論壇: https://discuss.kubernetes.io/(這個需要科學(xué)上網(wǎng))
GitHub庫關(guān)于Kubernetes問題列表:https://github.com/kubernetes/kubernetes/issues
StackOverflow網(wǎng)站上關(guān)于Kubernetes的問題討論：https://stackoverflow.com/questions/tagged/kubernetes
Kubernetes Slack聊天群組: https://kubernetes.slack.com/(需要谷歌賬號)
Kubernetes集群中常見問題的排查方法
為了跟蹤和發(fā)現(xiàn)在Kubernetes集群中運行的容器應(yīng)用出現(xiàn)的問題,我們常用如下查錯方法。

查看Kubernetes對象的當(dāng)前運行時信息,特別是與對象關(guān)聯(lián)的Event事件。這些事件記錄了相關(guān)主題、發(fā)生時間、最近發(fā)生時間、發(fā)生次數(shù)及事件原因等,對排查故障非常有價值。通過查看對象的運行時數(shù)據(jù),我們還可以發(fā)現(xiàn)參數(shù)錯誤、關(guān)聯(lián)錯誤、狀態(tài)異常等明顯問題。由于在Kubernetes中多種對象相互關(guān)聯(lián),因此這一步可能會涉及多·個相關(guān)對象的排查問題。

對于服務(wù)、容器方面的問題,可能需要深入容器內(nèi)部進(jìn)行故障診斷,此時可以通過查看容器的運行日志來定位具體問題。

對于某些復(fù)雜問題,例如Pod調(diào)度這種全局性的問題,可能需要結(jié)合集群中每個節(jié)點上的Kubernetes服務(wù)日志來排查。比如搜集Master上的kube-apiserver, kube-schedule, kube-controler-manager服務(wù)日志,以及各個Node上的kubelet, kube-proxy服務(wù)日志.

「查看系統(tǒng)Event」
在Kubernetes集群中創(chuàng)建Pod后,我們可以通過kubectl get pods命令查看Pod列表,但通過該命令顯示的信息有限。Kubernetes提供了kubectl describe pod命令來查看一個Pod的詳細(xì)信息,例如:

通過kubectl describe pod命令,可以顯示Pod創(chuàng)建時的配置定義、狀態(tài)等信息,還可以顯示與該Pod相關(guān)的最近的Event事件,事件信息對于查錯非常有用。

「如果某個Pod一直處于Pending狀態(tài),我們就可以通過kubectl describe了解具體的原因：」

沒有可用的Node以供調(diào)度，可能原因為pod端口沖突，或者受Taints影響，。
開啟了資源配額管理，但在當(dāng)前調(diào)度的目標(biāo)節(jié)點上資源不足。
鏡像下載失敗等。
「查看pod詳細(xì)信息」

┌──[root@vms81.liruilongs.github.io]-[~]
└─$kubectl describe pods   etcd-vms81.liruilongs.github.io -n kube-system
# pod創(chuàng)建的基本信息
Name:                 etcd-vms81.liruilongs.github.io
Namespace:            kube-system
Priority:             2000001000
Priority Class Name: system-node-critical
Node:                 vms81.liruilongs.github.io/192.168.26.81
Start Time:           Tue, 25 Jan 2022 21:54:20 +0800
Labels:               component=etcd
                      tier=control-plane
Annotations:          kubeadm.kubernetes.io/etcd.advertise-client-urls: https://192.168.26.81:2379
                      kubernetes.io/config.hash: 1502584f9ab841720212d4341d723ba2
                      kubernetes.io/config.mirror: 1502584f9ab841720212d4341d723ba2
                      kubernetes.io/config.seen: 2021-12-13T00:01:04.834825537+08:00
                      kubernetes.io/config.source: file
                      seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status:               Running # Node當(dāng)前的運行狀態(tài),
IP:                   192.168.26.81
IPs:
IP:           192.168.26.81
Controlled By: Node/vms81.liruilongs.github.io
Containers:
etcd: # pod的一些基礎(chǔ)信息
    Container ID: docker://20d99a98a4c2590e8726916932790200ba1cf93c48f3c84ca1298ffdcaa4f28a
    Image:         registry.aliyuncs.com/google_containers/etcd:3.5.0-0
    Image ID:      docker-pullable://registry.aliyuncs.com/google_containers/etcd@sha256:9ce33ba33d8e738a5b85ed50b5080ac746deceed4a7496c550927a7a19ca3b6d
    Port:          <none>
    Host Port:     <none>
    Command: # 容器運行的一些啟動參數(shù)
      etcd
      --advertise-client-urls=https://192.168.26.81:2379
      --cert-file=/etc/kubernetes/pki/etcd/server.crt
      --client-cert-auth=true
      --data-dir=/var/lib/etcd
      --initial-advertise-peer-urls=https://192.168.26.81:2380
      --initial-cluster=vms81.liruilongs.github.io=https://192.168.26.81:2380
      --key-file=/etc/kubernetes/pki/etcd/server.key
      --listen-client-urls=https://127.0.0.1:2379,https://192.168.26.81:2379
      --listen-metrics-urls=http://127.0.0.1:2381
      --listen-peer-urls=https://192.168.26.81:2380
      --name=vms81.liruilongs.github.io
      --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
      --peer-client-cert-auth=true
      --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
      --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
      --snapshot-count=10000
      --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    State:          Running
      Started:      Tue, 25 Jan 2022 21:54:20 +0800
    Last State:     Terminated
      Reason:       Error
      Exit Code:    255
      Started:      Mon, 24 Jan 2022 08:35:16 +0800
      Finished:     Tue, 25 Jan 2022 21:53:56 +0800
    Ready:          True
    Restart Count: 128
    Requests: # 涉及到的一些資源信息
      cpu:        100m
      memory:     100Mi
    Liveness:     http-get http://127.0.0.1:2381/health delay=10s timeout=15s period=10s #success=1 #failure=8
    Startup:      http-get http://127.0.0.1:2381/health delay=10s timeout=15s period=10s #success=1 #failure=24
    Environment: <none>
    Mounts:
      /etc/kubernetes/pki/etcd from etcd-certs (rw)
      /var/lib/etcd from etcd-data (rw)
Conditions:    #pod啟動以后會做一系列的自檢工作:
Type              Status
Initialized       True
Ready             True
ContainersReady   True
PodScheduled      True
Volumes:     # 映射的宿主機的數(shù)據(jù)卷信息，這里的定義為宿主機共享
etcd-certs:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/kubernetes/pki/etcd
    HostPathType: DirectoryOrCreate
etcd-data:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/etcd
    HostPathType: DirectoryOrCreate
QoS Class:         Burstable
Node-Selectors:    <none>
Tolerations:       :NoExecute op=Exists
Events:            <none>
┌──[root@vms81.liruilongs.github.io]-[~]
└─$
「查看集群中的Node節(jié)點和節(jié)點的詳細(xì)信息」

[root@liruilong k8s]# kubectl get nodes
NAME        STATUS    AGE
127.0.0.1   Ready     2d
[root@liruilong k8s]# kubectl describe node 127.0.0.1
# Node基本信息:名稱、標(biāo)簽、創(chuàng)建時間等。
Name:                   127.0.0.1
Role:
Labels:                 beta.kubernetes.io/arch=amd64
                        beta.kubernetes.io/os=linux
                        kubernetes.io/hostname=127.0.0.1
Taints:                 <none>
CreationTimestamp:      Fri, 27 Aug 2021 00:07:09 +0800
Phase:
# Node當(dāng)前的運行狀態(tài), Node啟動以后會做一系列的自檢工作:
# 比如磁盤是否滿了,如果滿了就標(biāo)注OutODisk=True
# 否則繼續(xù)檢查內(nèi)存是否不足(如果內(nèi)存不足,就標(biāo)注MemoryPressure=True)
# 最后一切正常,就設(shè)置為Ready狀態(tài)(Ready=True)
# 該狀態(tài)表示Node處于健康狀態(tài), Master將可以在其上調(diào)度新的任務(wù)了(如啟動Pod)
Conditions:
Type                  Status LastHeartbeatTime                       LastTransitionTime                      Reason                          Message
----                  ------ -----------------                       ------------------                      ------                          -------
OutOfDisk             False   Sun, 29 Aug 2021 23:05:53 +0800         Sat, 28 Aug 2021 00:30:35 +0800         KubeletHasSufficientDisk        kubelet has sufficient disk space available
MemoryPressure        False   Sun, 29 Aug 2021 23:05:53 +0800         Fri, 27 Aug 2021 00:07:09 +0800         KubeletHasSufficientMemory      kubelet has sufficient memory available
DiskPressure          False   Sun, 29 Aug 2021 23:05:53 +0800         Fri, 27 Aug 2021 00:07:09 +0800         KubeletHasNoDiskPressure        kubelet has no disk pressure
Ready                 True    Sun, 29 Aug 2021 23:05:53 +0800         Sat, 28 Aug 2021 00:30:35 +0800         KubeletReady                    kubelet is posting ready status
# Node的主機地址與主機名。
Addresses:              127.0.0.1,127.0.0.1,127.0.0.1
# Node上的資源總量:描述Node可用的系統(tǒng)資源,包括CPU、內(nèi)存數(shù)量、最大可調(diào)度Pod數(shù)量等,注意到目前Kubernetes已經(jīng)實驗性地支持GPU資源分配了(alpha.kubernetes.io/nvidia-gpu=0)
Capacity:
alpha.kubernetes.io/nvidia-gpu:        0
cpu:                                   1
memory:                                1882012Ki
pods:                                  110
# Node可分配資源量:描述Node當(dāng)前可用于分配的資源量。
Allocatable:
alpha.kubernetes.io/nvidia-gpu:        0
cpu:                                   1
memory:                                1882012Ki
pods:                                  110
# 主機系統(tǒng)信息:包括主機的唯一標(biāo)識UUID, Linux kernel版本號、操作系統(tǒng)類型與版本、Kubernetes版本號、kubelet與kube-proxy的版本號等。
System Info:
Machine ID:                    963c2c41b08343f7b063dddac6b2e486
System UUID:                   EB90EDC4-404C-410B-800F-3C65816C0E2D
Boot ID:                       4a9349b0-ce4b-4b4a-8766-c5c4256bb80b
Kernel Version:                3.10.0-1160.15.2.el7.x86_64
OS Image:                      CentOS Linux 7 (Core)
Operating System:              linux
Architecture:                  amd64
Container Runtime Version:     docker://1.13.1
Kubelet Version:               v1.5.2
Kube-Proxy Version:            v1.5.2
ExternalID:                     127.0.0.1
# 當(dāng)前正在運行的Pod列表概要信息
Non-terminated Pods:            (3 in total)
Namespace                     Name                    CPU Requests    CPU Limits      Memory Requests Memory Limits
---------                     ----                    ------------    ----------      --------------- -------------
default                       mysql-2cpt9             0 (0%)          0 (0%)          0 (0%)          0 (0%)
default                       myweb-53r32             0 (0%)          0 (0%)          0 (0%)          0 (0%)
default                       myweb-609w4             0 (0%)          0 (0%)          0 (0%)          0 (0%)
# 已分配的資源使用概要信息,例如資源申請的最低、最大允許使用量占系統(tǒng)總量的百分比。
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.
CPU Requests CPU Limits      Memory Requests Memory Limits
------------ ----------      --------------- -------------
0 (0%)        0 (0%)          0 (0%)          0 (0%)
# Node相關(guān)的Event信息。
Events:
FirstSeen     LastSeen        Count   From                    SubObjectPath   Type            Reason                  Message
---------     --------        -----   ----                    -------------   --------        ------                  -------
4h            27m             3       {kubelet 127.0.0.1}                     Warning         MissingClusterDNS       kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. pod: "myweb-609w4_default(01d719dd-08b1-11ec-9d6a-00163e1220cb)". Falling back to DNSDefault policy.
25m           25m             1       {kubelet 127.0.0.1}                     Warning         MissingClusterDNS       kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. pod: "mysql-2cpt9_default(1c9353ba-08d7-11ec-9d6a-00163e1220cb)". Falling back to DNSDefault policy.
查看容器日志
在需要排查容器內(nèi)部應(yīng)用程序生成的日志時，我們可以使用kubectl logs <pod_name>命令

這里打印etcd數(shù)據(jù)庫的日志信息,查看日志中異常的相關(guān)信息,這里用過過濾error關(guān)鍵字的方法來查看相關(guān)的信息

┌──[root@vms81.liruilongs.github.io]-[~]
└─$kubectl logs etcd-vms81.liruilongs.github.io -n kube-system | grep -i error | head -5
{"level":"info","ts":"2022-01-25T13:54:33.191Z","caller":"wal/repair.go:96","msg":"repaired","path":"/var/lib/etcd/member/wal/0000000000000014-0000000000185aba.wal","error":"unexpected EOF"}
{"level":"info","ts":"2022-01-25T13:54:33.192Z","caller":"etcdserver/storage.go:109","msg":"repaired WAL","error":"unexpected EOF"}
{"level":"warn","ts":"2022-01-25T13:54:33.884Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"127.0.0.1:53950","server-name":"","error":"EOF"}
{"level":"warn","ts":"2022-01-25T13:54:33.885Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"127.0.0.1:53948","server-name":"","error":"EOF"}
{"level":"warn","ts":"2022-01-28T03:00:37.549Z","caller":"etcdserver/util.go:166","msg":"apply request took too long","took":"628.230855ms","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"/registry/runtimeclasses/\" range_end:\"/registry/runtimeclasses0\" count_only:true ","response":"","error":"context canceled"}
┌──[root@vms81.liruilongs.github.io]-[~]
└─$
查看Kubernetes服務(wù)日志
如果在Linux系統(tǒng)上安裝Kubernetes,并且使用systemd系統(tǒng)管理Kubernetes服務(wù),那么systemd的journal系統(tǒng)會接管服務(wù)程序的輸出日志。在這種環(huán)境中,可以通過使用systemd status或journalct具來查看系統(tǒng)服務(wù)的日志。例如:

查看服務(wù)服務(wù)啟動的相關(guān)信息，通過這個，可以定位服務(wù)加載的配置文件信息，啟動參數(shù)配置情況

┌──[root@vms81.liruilongs.github.io]-[~]
└─$systemctl status kubelet.service -l
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since 二 2022-01-25 21:53:35 CST; 6 days ago
     Docs: https://kubernetes.io/docs/
Main PID: 1014 (kubelet)
   Memory: 208.2M
   CGroup: /system.slice/kubelet.service
           └─1014 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --network-plugin=cni --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.5

2月 01 17:47:14 vms81.liruilongs.github.io kubelet[1014]: W0201 17:47:14.258523    1014 container.go:586] Failed to update stats for container "/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pode1b874bfdef201d69db10b200b8f47d5.slice/docker-c20fa960cfebd38172e123a5d87ecd499518bf22381f7aaa62d57131e7eb1aae.scope": unable to determine device info for dir: /var/lib/docker/overlay2/07d7695f2c479fbd0b654016345fcbacd0838276fb57f8291f993ed6799fae8d/diff: stat failed on /var/lib/docker/overlay2/07d7695f2c479fbd0b654016345fcbacd0838276fb57f8291f993ed6799fae8d/diff with error: no such file or directory, continuing to push stats
。。。。。。。。。。
通過 journalct來查看相關(guān)的服務(wù)日志信息，查看當(dāng)前用戶下的kubelet服務(wù)日志中有error關(guān)鍵字的字段的報錯問題

┌──[root@vms81.liruilongs.github.io]-[~]
└─$journalctl -u kubelet.service | grep -i error | head -2
1月 25 21:53:55 vms81.liruilongs.github.io kubelet[1014]: I0125 21:53:55.865441    1014 docker_service.go:264] "Docker Info" dockerInfo=&{ID:HN3K:C6LG:QGV7:N2CG:VELF:CJ6T:HFR5:EEKH:HLPO:CDEU:GN3E:QAJJ Containers:32 ContainersRunning:11 ContainersPaused:0 ContainersStopped:21 Images:32 Driver:overlay2 DriverStatus:[[Backing Filesystem xfs] [Supports d_type true] [Native Overlay Diff true] [userxattr false]] SystemStatus:[] Plugins:{Volume:[local] Network:[bridge host ipvlan macvlan null overlay] Authorization:[] Log:[awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog]} MemoryLimit:true SwapLimit:true KernelMemory:true KernelMemoryTCP:true CPUCfsPeriod:true CPUCfsQuota:true CPUShares:true CPUSet:true PidsLimit:true IPv4Forwarding:true BridgeNfIptables:true BridgeNfIP6tables:true Debug:false NFd:26 OomKillDisable:true NGoroutines:39 SystemTime:2022-01-25T21:53:55.833509372+08:00 LoggingDriver:json-file CgroupDriver:systemd CgroupVersion:1 NEventsListener:0 KernelVersion:3.10.0-693.el7.x86_64 OperatingSystem:CentOS Linux 7 (Core) OSVersion:7 OSType:linux Architecture:x86_64 IndexServerAddress:https://index.docker.io/v1/ RegistryConfig:0xc000a8f960 NCPU:2 MemTotal:4126896128 GenericResources:[] DockerRootDir:/var/lib/docker HTTPProxy: HTTPSProxy: NoProxy: Name:vms81.liruilongs.github.io Labels:[] ExperimentalBuild:false ServerVersion:20.10.9 ClusterStore: ClusterAdvertise: Runtimes:map[io.containerd.runc.v2:{Path:runc Args:[] Shim:<nil>} io.containerd.runtime.v1.linux:{Path:runc Args:[] Shim:<nil>} runc:{Path:runc Args:[] Shim:<nil>}] DefaultRuntime:runc Swarm:{NodeID: NodeAddr: LocalNodeState:inactive ControlAvailable:false Error: RemoteManagers:[] Nodes:0 Managers:0 Cluster:<nil> Warnings:[]} LiveRestoreEnabled:false Isolation: InitBinary:docker-init ContainerdCommit:{ID:5b46e404f6b9f661a205e28d59c982d3634148f8 Expected:5b46e404f6b9f661a205e28d59c982d3634148f8} RuncCommit:{ID:v1.0.2-0-g52b36a2 Expected:v1.0.2-0-g52b36a2} InitCommit:{ID:de40ad0 Expected:de40ad0} SecurityOptions:[name=seccomp,profile=default] ProductLicense: DefaultAddressPools:[]
1月 25 21:53:56 vms81.liruilongs.github.io kubelet[1014]: E0125 21:53:56.293100    1014 controller.go:144] failed to ensure lease exists, will retry in 200ms, error: Get "https://192.168.26.81:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/vms81.liruilongs.github.io?timeout=10s": dial tcp 192.168.26.81:6443: connect: connection refused
┌──[root@vms81.liruilongs.github.io]-[~]
└─$
「如果不使用systemd系統(tǒng)接管Kubernetes服務(wù)的標(biāo)準(zhǔn)輸出，則也可以通過日志相關(guān)的啟動參數(shù)來指定日志的存放目錄。當(dāng)然，這里的相關(guān)啟動參數(shù)的配置信息需要通過查看pod文件來查看」

查看kube-controller-manager的啟動參數(shù)和認(rèn)證相關(guān)的配置文件

┌──[root@vms81.liruilongs.github.io]-[~]
└─$kubectl describe pod kube-controller-manager-vms81.liruilongs.github.io -n kube-system | grep -i -A 20 command
    Command:
      kube-controller-manager
      --allocate-node-cidrs=true
      --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf
      --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf
      --bind-address=127.0.0.1
      --client-ca-file=/etc/kubernetes/pki/ca.crt
      --cluster-cidr=10.244.0.0/16
      --cluster-name=kubernetes
      --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt
      --cluster-signing-key-file=/etc/kubernetes/pki/ca.key
      --controllers=*,bootstrapsigner,tokencleaner
      --kubeconfig=/etc/kubernetes/controller-manager.conf
      --leader-elect=true
      --port=0
      --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
      --root-ca-file=/etc/kubernetes/pki/ca.crt
      --service-account-private-key-file=/etc/kubernetes/pki/sa.key
      --service-cluster-ip-range=10.96.0.0/12
      --use-service-account-credentials=true
    State:          Running
┌──[root@vms81.liruilongs.github.io]-[~]
└─$kubectl describe pod kube-controller-manager-vms81.liruilongs.github.io -n kube-system | grep kubeconfig
      --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf
      --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf
      --kubeconfig=/etc/kubernetes/controller-manager.conf
      /etc/kubernetes/controller-manager.conf from kubeconfig (ro)
kubeconfig:
┌──[root@vms81.liruilongs.github.io]-[~]
└─$kubectl describe pod kube-controller-manager-vms81.liruilongs.github.io -n kube-system | grep -i -A 20 Volumes
Volumes:
ca-certs:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/ssl/certs
    HostPathType: DirectoryOrCreate
etc-pki:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/pki
    HostPathType: DirectoryOrCreate
flexvolume-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /usr/libexec/kubernetes/kubelet-plugins/volume/exec
    HostPathType: DirectoryOrCreate
k8s-certs:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/kubernetes/pki
    HostPathType: DirectoryOrCreate
kubeconfig:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/kubernetes/controller-manager.conf
    HostPathType: FileOrCreate
┌──[root@vms81.liruilongs.github.io]-[~]
└─$
「Pod資源對象相關(guān)的問題」,比如無法創(chuàng)建Pod, Pod啟動后就停止或者Pod副本無法增加,等等。此時,可以先確定Pod在哪個節(jié)點上,然后登錄這個節(jié)點,從kubelet的日志中查詢該Pod的完整日志,然后進(jìn)行問題排查。

「對于與Pod擴容相關(guān)或者與RC相關(guān)的問題」,則很可能在kube-controller-manager及kube-scheduler的日志中找出問題的關(guān)鍵點。

┌──[root@vms81.liruilongs.github.io]-[~]
└─$kubectl logs kube-scheduler-vms81.liruilongs.github.io
┌──[root@vms81.liruilongs.github.io]-[~]
└─$kubectl logs kube-controller-manager-vms81.liruilongs.github.io
「kube-proxy經(jīng)常被我們忽視,因為即使它意外停止」, Pod的狀態(tài)也是正常的,但會導(dǎo)致某些服務(wù)訪問異常。這些錯誤通常與每個節(jié)點上的kube-proxy服務(wù)有著密切的關(guān)系。遇到這些問題時,首先要排查kube-proxy服務(wù)的日志,同時排查防火墻服務(wù),要特別留意在防火墻中是否有人為添加的可疑規(guī)則。

┌──[root@vms81.liruilongs.github.io]-[~]
└─$kubectl logs kube-proxy-tbwz5
常見問題
由于無法下載pause鏡像導(dǎo)致Pod一直處于Pending狀態(tài)
Pod創(chuàng)建成功，但RESTARTS數(shù)量持續(xù)增加:容器的啟動命令不能保持在前臺運行。
通過服務(wù)名無法訪問服務(wù)
在Kubernetes集群中應(yīng)盡量使用服務(wù)名訪問正在運行的微服務(wù),但有時會訪問失敗。由于服務(wù)涉及服務(wù)名的DNS域名解析、kube-proxy組件的負(fù)載分發(fā)、后端Pod列表的狀態(tài)等,所以可通過以下幾方面排查問題。

「1.查看Service的后端Endpoint是否正常」

可以通過kubectl get endpoints <service name>命令查看某個服務(wù)的后端Endpoint列表,如果列表為空,則可能因為:

┌──[root@vms81.liruilongs.github.io]-[~]
└─$kubectl get svc
NAME                                TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                        AGE
kube-dns                            ClusterIP   10.96.0.10       <none>        53/UDP,53/TCP,9153/TCP         50d
liruilong-kube-prometheus-kubelet   ClusterIP   None             <none>        10250/TCP,10255/TCP,4194/TCP   16d
metrics-server                      ClusterIP   10.111.104.173   <none>        443/TCP                        50d
┌──[root@vms81.liruilongs.github.io]-[~]
└─$kubectl get endpoints
NAME                                ENDPOINTS                                                                 AGE
kube-dns                            10.244.88.66:53,10.244.88.67:53,10.244.88.66:53 + 3 more...               50d
liruilong-kube-prometheus-kubelet   192.168.26.81:10250,192.168.26.82:10250,192.168.26.83:10250 + 6 more...   16d
metrics-server                      <none>                                                                    50d
┌──[root@vms81.liruilongs.github.io]-[~]
└─$
Service的Label Selector與Pod的Label不匹配,沒有相關(guān)的pod可以提供能力
后端Pod一直沒有達(dá)到Ready狀態(tài)(通過kubectl get pods進(jìn)一步查看Pod的狀態(tài))
「Service的targetPort端口號與Pod的containerPort不一致等」。即容器暴露的端口不是SVC暴露的端口，需要使用targetPort來轉(zhuǎn)發(fā)
「2·查看Service的名稱能否被正確解析為ClusterIP地址」

可以通過在客戶端容器中ping..svc進(jìn)行檢查,如果能夠得到Service的ClusterlP地址,則說明DNS服務(wù)能夠正確解析Service的名稱;如果不能得到Service的ClusterlP地址,則可能是因為Kubernetes集群的DNS服務(wù)工作異常。

「3·查看kube-proxy的轉(zhuǎn)發(fā)規(guī)則是否正確」

我們可以將kube-proxy服務(wù)設(shè)置為IPVS或iptables負(fù)載分發(fā)模式。

對于IPVS負(fù)載分發(fā)模式,可以通過ipvsadm工具查看Node上的IPVS規(guī)則,查看是否正確設(shè)置Service ClusterlP的相關(guān)規(guī)則。

對于iptables負(fù)載分發(fā)模式,可以通過查看Node上的iptables規(guī)則,查看是否正確設(shè)置Service ClusterlP的相關(guān)規(guī)則。

尋求幫助
網(wǎng)站和社區(qū)
Kubernetes官網(wǎng)中監(jiān)控、記錄和調(diào)試相關(guān)問題: https://kubernetes.io/docs/tasks/debug-application-cluster/

Kubernetes官方論壇: https://discuss.kubernetes.io/(這個需要科學(xué)上網(wǎng))

GitHub庫關(guān)于Kubernetes問題列表:https://github.com/kubernetes/kubernetes/issues

StackOverflow網(wǎng)站上關(guān)于Kubernetes的問題討論：https://stackoverflow.com/questions/tagged/kubernetes

Kubernetes Slack聊天群組: https://kubernetes.slack.com/(需要谷歌賬號)

作者：山河已無恙

歡迎關(guān)注微信公眾號：山河已無恙

在线午夜精品自拍小视频_无码av无码专区线_亚洲无码精品人妻_人人澡欧美一区

運維