/
How to check Network Problems
How to check Network Problems
Problem
앞서 metrics-server
pod를 추가하였으나, 아래와 같이 서로 다른 node들에 걸친 pod 간 routing이 되지 않는 현상이 있을 수 있다.
기본적으로 k8s cni addon을 설치하였다면,
각 node간, pod간, node ↔︎ pod 간 모두 routing되어 통신이 가능해야 한다.
# Label k8s-app: metrics-server인 pod의 로그를 확인하는 명령어.
$ kubectl logs --tail=20 -n kube-system -l k8s-app=metrics-server
...
E0811 02:12:20.843207 1 manager.go:111] unable to fully collect metrics: unable to fully scrape metrics from source kubelet_summary:k8s-02: unable to fetch metrics from Kubelet k8s-02 (10.0.2.6): Get https://10.0.2.6:10250/stats/summary?only_cpu_and_memory=true: dial tcp 10.0.2.6:10250: connect: no route to host
E0811 02:13:16.345348 1 reststorage.go:135] unable to fetch node metrics for node "k8s-02": no metrics known for node
E0811 02:13:16.352394 1 reststorage.go:160] unable to fetch pod metrics for pod kube-system/netbox-86cdd5bdc6-jsbhn: no metrics known for pod
E0811 02:13:16.352406 1 reststorage.go:160] unable to fetch pod metrics for pod kube-system/kube-proxy-cmp85: no metrics known for pod
E0811 02:13:16.352411 1 reststorage.go:160] unable to fetch pod metrics for pod kube-system/kube-flannel-ds-amd64-5nktb: no metrics known for pod
E0811 05:23:16.347050 1 reststorage.go:160] unable to fetch pod metrics for pod kube-system/netbox-z6nbz: no metrics known for pod
Deploy netbox
보통 최적화된 어플리케이션 container의 경우 bash로 container내부에 접속한다 하더라도, networking 관련하여 확인할 수 있는 util이 포함되어 있지 않은 경우가 대부분이다.(bash나 ping도 없는 경우가 많다.)
해서, 아래와 같이 netbox라고 하는 k8s DaemonSet
을 배포하면 위의 curl -k -X Get https://10.0.2.6:10250/stats/summary?only_cpu_and_memory=true
쿼리가 잘 되는지 등, 갖가지 tool로 확인이 용이하다.(이 밖에도 tcpdump가 포함된 container등 문제 해결을 위한 container가 많이 존재한다. 익숙해지면, 주로 사용하는 util들을 담아서 직접 docker build 하여 사용할 것이다.)
# 편의상, metrics-server와 같은 namespace 및 serviceAccount로 지정하였다.
# 이렇게 하면, metrics-server가 사용하는 role 및 token을 사용할 수 있다.
$ vim netbox.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
labels:
app: netbox
name: netbox
namespace: kube-system
spec:
selector:
matchLabels:
app: netbox
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
app: netbox
spec:
serviceAccountName: metrics-server
serviceAccount: metrics-server
containers:
- image: quay.io/gravitational/netbox:latest
imagePullPolicy: Always
name: netbox
securityContext:
runAsUser: 0
terminationGracePeriodSeconds: 30
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
$ kubectl apply -f netbox.yaml
$ kubectl get pods -A -owide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
...
kube-system netbox-kvlsf 1/1 Running 0 3m16s 10.244.0.21 k8s-01 <none> <none>
kube-system netbox-svqdr 1/1 Running 0 3m16s 10.244.1.40 k8s-02 <none> <none>
...
# netbox-kvlsf 내부 진입.
$ kubectl exec -n kube-system -it netbox-kvlsf -- /bin/bash
# netbox-kvlsf 내부
# telnet같은 툴은 설치되어 있지 않으므로, 간단한 python 스크립트를 작성하여 port 접속을 테스트 한다.
$ python
Python 3.3.6 (default, Sep 14 2017, 23:28:12)
[GCC 4.9.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> socket.create_connection(('10.0.2.5', 10250))
<socket.socket object, fd=3, family=2, type=1, proto=6> # 성공!
>>>
>>> socket.create_connection(('10.0.2.6', 10250))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.3/socket.py", line 435, in create_connection
raise err
File "/usr/local/lib/python3.3/socket.py", line 426, in create_connection
sock.connect(sa)
OSError: [Errno 113] No route to host # 실패!
# 접속이 잘 된다면, 아래와 같이 실제 kubelet api도 호출하여 본다.
# metrics-server가 사용하는 token이 아래 경로에 마운트되어 있을 것이다.
$ cat /run/secrets/kubernetes.io/serviceaccount/token
eyJhbGciOiJSUzI1NiIsImtpZCI6IjNqSm8xaXJ0MDZsaGxjdzVndWozY1A5VXBGbTdwX3VDUzBpd0J2a3ItR0EifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJtZXRyaWNzLXNlcnZlci10b2tlbi01bTdtdCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50Lm5hbWUiOiJtZXRyaWNzLXNlcnZlciIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6ImI4NGZjMmRjLTQ1MDItNGJhNi1iNWE5LWEwMjA0NmVhOTdjNiIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDprdWJlLXN5c3RlbTptZXRyaWNzLXNlcnZlciJ9.LPzyvfQiT294NE-53AaVDkR9SV-AKJhs62g0LAX3iril2H3wvqfF2w6h0vz5SpZhVSLC9rKEHClbDSF1w88rdGr6bn3R4dlmogzb6nw2N1dcCHR8LnDlA2AbZsSBYAYrIWpYIV1mxu4r60HFPoGE3JbpnRxKeC3KKXEfhnOILDulox_xNyvLd46_T4wZqglwqJvo-Ogkl8GBlw8-kRr04_TXB1hrTuDCGfRnNpb7RGcBVHlsIq_qZFXMsWEGp_pGf24_nYQ5w-dOWlKPMeoZ44BfVS_mas6ZFdraFoiCdPXlNC3GeeN0t1n4fbix1VTxxJtsLCcwcY8aG3THCC0PHw
$ curl -k https://10.0.2.5:10250/stats/summary?only_cpu_and_memory=true -H 'Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6IjNqSm8xaXJ0MDZsaGxjdzVndWozY1A5VXBGbTdwX3VDUzBpd0J2a3ItR0EifQ.eyJpc3MiOiJdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJtZXRyaWNzLXNlcnZlci10b2tlbi01bTdtdCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50Lm5hbWUiOiJtZXRyaWNzLXNlcnZlciIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6ImI4NGZjMmRjLTQ1MDItNGJhNi1iNWE5LWEwMjA0NmVhOTdjNiIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDprdWJlLXN5c3RlbTptZXRyaWNzLXNlcnZlciJ9.LPzyvfQiT294NE-53AaVDkR9SV-AKJhs62g0LAX3iril2H3wvqfF2w6h0vz5SpZhVSLC9rKEHClbDSF1w88rdGr6bn3R4dlmogzb6nw2N1dcCHR8LnDlA2AbZsSBYAYrIWpYIV1mxu4r60HFPoGE3JbpnRxKeC3KKXEfhnOILDulox_xNyvLd46_T4wZqglwqJvo-Ogkl8GBlw8-kRr04_TXB1hrTuDCGfRnNpb7RGcBVHlsIq_qZFXMsWEGp_pGf24_nYQ5w-dOWlKPMeoZ44BfVS_mas6ZFdraFoiCdPXlNC3GeeN0t1n4fbix1VTxxJtsLCcwcY8aG3THCC0PHw'
{
"node": {
"nodeName": "k8s-01",
"systemContainers": [
{
"name": "pods",
"startTime": "2020-08-06T23:38:19Z",
"cpu": {
"time": "2020-08-11T06:07:17Z",
"usageNanoCores": 66319281,
"usageCoreNanoSeconds": 34786278186285
},
"memory": {
"time": "2020-08-11T06:07:17Z",
"availableBytes": 3032084480,
...
# 이렇게 출력되면 정상이다.
Resolve
사용하는 port를 모두 firewalld에서 open하였는데도, 마찬가지 현상이라면 대부분 firewalld에 masquerade
가 추가/적용되어 있지 않기 때문일 것이다.
아래 페이지를 참고하여 해결한다.
, multiple selections available,
Related content
VMware Monitoring
VMware Monitoring
More like this
Kubernetes Monitoring
Kubernetes Monitoring
More like this
Highly Available topology of Control Plane
Highly Available topology of Control Plane
Read with this
kubectl 명령 시, Unable to connect to the server: x509: certificate has expired or is not yet valid
kubectl 명령 시, Unable to connect to the server: x509: certificate has expired or is not yet valid
More like this
Using Service & Ingress
Using Service & Ingress
Read with this