/
How to check Network Problems

How to check Network Problems

Problem

앞서 metrics-server pod를 추가하였으나, 아래와 같이 서로 다른 node들에 걸친 pod 간 routing이 되지 않는 현상이 있을 수 있다.

기본적으로 k8s cni addon을 설치하였다면,
각 node간, pod간, node ↔︎ pod 간 모두 routing되어 통신이 가능해야 한다.

# Label k8s-app: metrics-server인 pod의 로그를 확인하는 명령어. $ kubectl logs --tail=20 -n kube-system -l k8s-app=metrics-server ... E0811 02:12:20.843207 1 manager.go:111] unable to fully collect metrics: unable to fully scrape metrics from source kubelet_summary:k8s-02: unable to fetch metrics from Kubelet k8s-02 (10.0.2.6): Get https://10.0.2.6:10250/stats/summary?only_cpu_and_memory=true: dial tcp 10.0.2.6:10250: connect: no route to host E0811 02:13:16.345348 1 reststorage.go:135] unable to fetch node metrics for node "k8s-02": no metrics known for node E0811 02:13:16.352394 1 reststorage.go:160] unable to fetch pod metrics for pod kube-system/netbox-86cdd5bdc6-jsbhn: no metrics known for pod E0811 02:13:16.352406 1 reststorage.go:160] unable to fetch pod metrics for pod kube-system/kube-proxy-cmp85: no metrics known for pod E0811 02:13:16.352411 1 reststorage.go:160] unable to fetch pod metrics for pod kube-system/kube-flannel-ds-amd64-5nktb: no metrics known for pod E0811 05:23:16.347050 1 reststorage.go:160] unable to fetch pod metrics for pod kube-system/netbox-z6nbz: no metrics known for pod

Deploy netbox

보통 최적화된 어플리케이션 container의 경우 bash로 container내부에 접속한다 하더라도, networking 관련하여 확인할 수 있는 util이 포함되어 있지 않은 경우가 대부분이다.(bash나 ping도 없는 경우가 많다.)

해서, 아래와 같이 netbox라고 하는 k8s DaemonSet을 배포하면 위의 curl -k -X Get https://10.0.2.6:10250/stats/summary?only_cpu_and_memory=true 쿼리가 잘 되는지 등, 갖가지 tool로 확인이 용이하다.(이 밖에도 tcpdump가 포함된 container등 문제 해결을 위한 container가 많이 존재한다. 익숙해지면, 주로 사용하는 util들을 담아서 직접 docker build 하여 사용할 것이다.)

# 편의상, metrics-server와 같은 namespace 및 serviceAccount로 지정하였다. # 이렇게 하면, metrics-server가 사용하는 role 및 token을 사용할 수 있다. $ vim netbox.yaml apiVersion: apps/v1 kind: DaemonSet metadata: labels: app: netbox name: netbox namespace: kube-system spec: selector: matchLabels: app: netbox updateStrategy: type: RollingUpdate template: metadata: labels: app: netbox spec: serviceAccountName: metrics-server serviceAccount: metrics-server containers: - image: quay.io/gravitational/netbox:latest imagePullPolicy: Always name: netbox securityContext: runAsUser: 0 terminationGracePeriodSeconds: 30 tolerations: - key: node-role.kubernetes.io/master effect: NoSchedule $ kubectl apply -f netbox.yaml $ kubectl get pods -A -owide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES ... kube-system netbox-kvlsf 1/1 Running 0 3m16s 10.244.0.21 k8s-01 <none> <none> kube-system netbox-svqdr 1/1 Running 0 3m16s 10.244.1.40 k8s-02 <none> <none> ... # netbox-kvlsf 내부 진입. $ kubectl exec -n kube-system -it netbox-kvlsf -- /bin/bash # netbox-kvlsf 내부 # telnet같은 툴은 설치되어 있지 않으므로, 간단한 python 스크립트를 작성하여 port 접속을 테스트 한다. $ python Python 3.3.6 (default, Sep 14 2017, 23:28:12) [GCC 4.9.2] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import socket >>> socket.create_connection(('10.0.2.5', 10250)) <socket.socket object, fd=3, family=2, type=1, proto=6> # 성공! >>> >>> socket.create_connection(('10.0.2.6', 10250)) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.3/socket.py", line 435, in create_connection raise err File "/usr/local/lib/python3.3/socket.py", line 426, in create_connection sock.connect(sa) OSError: [Errno 113] No route to host # 실패! # 접속이 잘 된다면, 아래와 같이 실제 kubelet api도 호출하여 본다. # metrics-server가 사용하는 token이 아래 경로에 마운트되어 있을 것이다. $ cat /run/secrets/kubernetes.io/serviceaccount/token eyJhbGciOiJSUzI1NiIsImtpZCI6IjNqSm8xaXJ0MDZsaGxjdzVndWozY1A5VXBGbTdwX3VDUzBpd0J2a3ItR0EifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJtZXRyaWNzLXNlcnZlci10b2tlbi01bTdtdCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50Lm5hbWUiOiJtZXRyaWNzLXNlcnZlciIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6ImI4NGZjMmRjLTQ1MDItNGJhNi1iNWE5LWEwMjA0NmVhOTdjNiIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDprdWJlLXN5c3RlbTptZXRyaWNzLXNlcnZlciJ9.LPzyvfQiT294NE-53AaVDkR9SV-AKJhs62g0LAX3iril2H3wvqfF2w6h0vz5SpZhVSLC9rKEHClbDSF1w88rdGr6bn3R4dlmogzb6nw2N1dcCHR8LnDlA2AbZsSBYAYrIWpYIV1mxu4r60HFPoGE3JbpnRxKeC3KKXEfhnOILDulox_xNyvLd46_T4wZqglwqJvo-Ogkl8GBlw8-kRr04_TXB1hrTuDCGfRnNpb7RGcBVHlsIq_qZFXMsWEGp_pGf24_nYQ5w-dOWlKPMeoZ44BfVS_mas6ZFdraFoiCdPXlNC3GeeN0t1n4fbix1VTxxJtsLCcwcY8aG3THCC0PHw $ curl -k https://10.0.2.5:10250/stats/summary?only_cpu_and_memory=true -H 'Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6IjNqSm8xaXJ0MDZsaGxjdzVndWozY1A5VXBGbTdwX3VDUzBpd0J2a3ItR0EifQ.eyJpc3MiOiJdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJtZXRyaWNzLXNlcnZlci10b2tlbi01bTdtdCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50Lm5hbWUiOiJtZXRyaWNzLXNlcnZlciIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6ImI4NGZjMmRjLTQ1MDItNGJhNi1iNWE5LWEwMjA0NmVhOTdjNiIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDprdWJlLXN5c3RlbTptZXRyaWNzLXNlcnZlciJ9.LPzyvfQiT294NE-53AaVDkR9SV-AKJhs62g0LAX3iril2H3wvqfF2w6h0vz5SpZhVSLC9rKEHClbDSF1w88rdGr6bn3R4dlmogzb6nw2N1dcCHR8LnDlA2AbZsSBYAYrIWpYIV1mxu4r60HFPoGE3JbpnRxKeC3KKXEfhnOILDulox_xNyvLd46_T4wZqglwqJvo-Ogkl8GBlw8-kRr04_TXB1hrTuDCGfRnNpb7RGcBVHlsIq_qZFXMsWEGp_pGf24_nYQ5w-dOWlKPMeoZ44BfVS_mas6ZFdraFoiCdPXlNC3GeeN0t1n4fbix1VTxxJtsLCcwcY8aG3THCC0PHw' { "node": { "nodeName": "k8s-01", "systemContainers": [ { "name": "pods", "startTime": "2020-08-06T23:38:19Z", "cpu": { "time": "2020-08-11T06:07:17Z", "usageNanoCores": 66319281, "usageCoreNanoSeconds": 34786278186285 }, "memory": { "time": "2020-08-11T06:07:17Z", "availableBytes": 3032084480, ... # 이렇게 출력되면 정상이다.

Resolve

사용하는 port를 모두 firewalld에서 open하였는데도, 마찬가지 현상이라면 대부분 firewalld에 masquerade가 추가/적용되어 있지 않기 때문일 것이다.

아래 페이지를 참고하여 해결한다.

Open ports via Firewalld

 

Related content

Setting up K8s Metrics Server Addon
Setting up K8s Metrics Server Addon
More like this
Creating a single control-plane cluster with kubeadm
Creating a single control-plane cluster with kubeadm
More like this
[Certain Version] Installation for test Env.
[Certain Version] Installation for test Env.
More like this
Using Service & Ingress
Using Service & Ingress
More like this
Highly Available topology of Control Plane
Highly Available topology of Control Plane
Read with this
Service Discovery
Service Discovery
More like this