Kubernetes网络分析
17 min read

Kubernetes网络分析

Master机器

eth0接口的地址:192.168.199.238
docker0接口的地址:10.1.97.1

运行了g2服务

pi@piw:~ $ sudo iptables -t nat -L -v -n
Chain PREROUTING (policy ACCEPT 56 packets, 5242 bytes)
pkts bytes target prot opt in out source destination
69510 6306K KUBE-SERVICES all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes service portals */ ALL -> KUBE-SERVICES
1512 90080 DOCKER all -- * * 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL ALL -> DOCKER

Chain INPUT (policy ACCEPT 19 packets, 1130 bytes)
pkts bytes target prot opt in out source destination

Chain OUTPUT (policy ACCEPT 13 packets, 747 bytes)
pkts bytes target prot opt in out source destination
5778 344K KUBE-SERVICES all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes service portals */ ALL -> KUBE-SERVICES
1 60 DOCKER all -- * * 0.0.0.0/0 !127.0.0.0/8 ADDRTYPE match dst-type LOCAL 非127.0.0.0/8 ->DOCKER

Chain POSTROUTING (policy ACCEPT 13 packets, 747 bytes)
pkts bytes target prot opt in out source destination
26639 2310K KUBE-POSTROUTING all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes postrouting rules */ ALL -> KUBE-POSTROUTING
20737 1954K MASQUERADE all -- * !docker0 10.1.97.0/24 0.0.0.0/0 从Docker0发往其它网卡的,需要SNAT

Chain DOCKER (2 references)
pkts bytes target prot opt in out source destination
0 0 RETURN all -- docker0 * 0.0.0.0/0 0.0.0.0/0 目前DOCKER啥也没干

Chain KUBE-MARK-MASQ (5 references)
pkts bytes target prot opt in out source destination
110 10384 MARK all -- * * 0.0.0.0/0 0.0.0.0/0 MARK or 0x4000 标记0x4000

Chain KUBE-NODEPORTS (1 references)
pkts bytes target prot opt in out source destination

Chain KUBE-POSTROUTING (1 references)
pkts bytes target prot opt in out source destination
0 0 MASQUERADE all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes service traffic requiring SNAT */ mark match 0x4000/0x4000

#对于标记了0x4000的,需要SNAT

#10.0.0.1 为MASTER的服务地址,192.168.199.238本机地址

#发往10.0.0.1:443,强制转向192.168.199.238:6443。源地址为192.168.199.238的,需要做SNAT。(为什么? 这是为了防止机器有多个路由出口,不能保证原路返回。因为DNAT操作之后,回包必须原路回来才会改回去。否则,如果走的另一条路由,就没有机会改写了。所以,强制加上SNAT,这样就会保证原路返回了!http://netsecinfo.blogspot.com/2008/02/snat-and-dnat-on-same-connection.html 一般情况下,我们只有一条默认的路由,所以单个SNAT或者DNAT就足够了,并不需要同时使用)。

这里,192.168.199.238正好是本机地址,但也可以是其它机器的地址,如下面各种服务一样。DNAT把发往服务地址的目标地址强转了,而SNAT则是把来自服务地址的来源地址再次强转,以保证都经过同一个网络接口。

有SNAT的情况,来回都经过了eth0和eth1。

x <-->eth0 #DNAT在这里发生
|
eth1<--> y #SNAT在这里发生

无SNAT的情况,回路绕过eth0,导致DNAT的操作无法还原。

x --> eth0 #DNAT在这里发生
^ |
^--- eht1<--> y

后面有示例分析,为什么在worker上无法直接访问服务地址,这里就有两个网卡(物理网卡和隧道,eth0和flannel0)。

Chain KUBE-SEP-6BJZJ65EXTPP6DDP (1 references)
pkts bytes target prot opt in out source destination
0 0 KUBE-MARK-MASQ all -- * * 192.168.199.238 0.0.0.0/0 /* default/kubernetes:https /
18 1080 DNAT tcp -- * * 0.0.0.0/0 0.0.0.0/0 /
default/kubernetes:https */ tcp to:192.168.199.238:6443 DNAT->192.168.199.238:6443

#10.0.0.10 为DNS服务地址 10.1.97.4为container的地址

#发往10.0.0.10:53,强制转向10.1.97.4:53。来源是10.1.97.4,需要做SNAT。

Chain KUBE-SEP-AYYJX6SWUVLQ32JT (1 references)
pkts bytes target prot opt in out source destination
110 10384 KUBE-MARK-MASQ all -- * * 10.1.97.4 0.0.0.0/0 /* kube-system/kube-dns:dns /
110 10384 DNAT udp -- * * 0.0.0.0/0 0.0.0.0/0 /
kube-system/kube-dns:dns */ udp to:10.1.97.4:53

#上为udp,此为tcp
Chain KUBE-SEP-C6CZV7HDHZ4XDC2D (1 references)
pkts bytes target prot opt in out source destination
0 0 KUBE-MARK-MASQ all -- * * 10.1.97.4 0.0.0.0/0 /* kube-system/kube-dns:dns-tcp /
0 0 DNAT tcp -- * * 0.0.0.0/0 0.0.0.0/0 /
kube-system/kube-dns:dns-tcp */ tcp to:10.1.97.4:53

#10.0.0.220为g2的服务地址,10.1.97.2为container的地址

#发往10.0.0.220:80,强制转向10.1.97.2:80。来源是10.1.97.2,需要做SNAT
Chain KUBE-SEP-JZ3RL2WX244TSVRU (1 references)
pkts bytes target prot opt in out source destination
0 0 KUBE-MARK-MASQ all -- * * 10.1.97.2 0.0.0.0/0 /* default/g2: /
0 0 DNAT tcp -- * * 0.0.0.0/0 0.0.0.0/0 /
default/g2: */ tcp to:10.1.97.2:80

#10.0.0.234为g3的服务地址,10.1.97.5为container的地址 可是,g3是运行在另一台机器上的,如何从container地址转换到物理地址?

#发往10.0.0.234:80,强制转向10.1.97.5:80。来源是10.1.97.5,需要做SNAT
Chain KUBE-SEP-URSRY67MZM3XT7NU (1 references)
pkts bytes target prot opt in out source destination
0 0 KUBE-MARK-MASQ all -- * * 10.1.97.5 0.0.0.0/0 /* default/g3: /
0 0 DNAT tcp -- * * 0.0.0.0/0 0.0.0.0/0 /
default/g3: */ tcp to:10.1.97.5:80

#把服务地址转向container地址
Chain KUBE-SERVICES (2 references)
pkts bytes target prot opt in out source destination
0 0 KUBE-SVC-ERIFXISQEP7F7OF4 tcp -- * * 0.0.0.0/0 10.0.0.10 /* kube-system/kube-dns:dns-tcp cluster IP / tcp dpt:53
0 0 KUBE-SVC-GWYMBTJ7K7KSWVEF tcp -- * * 0.0.0.0/0 10.0.0.220 /
default/g2: cluster IP / tcp dpt:80
0 0 KUBE-SVC-7FIXWSZP6P4B2WBX tcp -- * * 0.0.0.0/0 10.0.0.234 /
default/g3: cluster IP / tcp dpt:80
18 1080 KUBE-SVC-NPX46M4PTMTKRN6Y tcp -- * * 0.0.0.0/0 10.0.0.1 /
default/kubernetes:https cluster IP / tcp dpt:443
110 10384 KUBE-SVC-TCOU7JCQXEZGVUNU udp -- * * 0.0.0.0/0 10.0.0.10 /
kube-system/kube-dns:dns cluster IP / udp dpt:53
3 170 KUBE-NODEPORTS all -- * * 0.0.0.0/0 0.0.0.0/0 /
kubernetes service nodeports; NOTE: this must be the last rule in this chain */ ADDRTYPE match dst-type LOCAL

Chain KUBE-SVC-7FIXWSZP6P4B2WBX (1 references)
pkts bytes target prot opt in out source destination
0 0 KUBE-SEP-URSRY67MZM3XT7NU all -- * * 0.0.0.0/0 0.0.0.0/0 /* default/g3: */

Chain KUBE-SVC-ERIFXISQEP7F7OF4 (1 references)
pkts bytes target prot opt in out source destination
0 0 KUBE-SEP-C6CZV7HDHZ4XDC2D all -- * * 0.0.0.0/0 0.0.0.0/0 /* kube-system/kube-dns:dns-tcp */

Chain KUBE-SVC-GWYMBTJ7K7KSWVEF (1 references)
pkts bytes target prot opt in out source destination
0 0 KUBE-SEP-JZ3RL2WX244TSVRU all -- * * 0.0.0.0/0 0.0.0.0/0 /* default/g2: */

Chain KUBE-SVC-NPX46M4PTMTKRN6Y (1 references)
pkts bytes target prot opt in out source destination
18 1080 KUBE-SEP-6BJZJ65EXTPP6DDP all -- * * 0.0.0.0/0 0.0.0.0/0 /* default/kubernetes:https */

Chain KUBE-SVC-TCOU7JCQXEZGVUNU (1 references)
pkts bytes target prot opt in out source destination
110 10384 KUBE-SEP-AYYJX6SWUVLQ32JT all -- * * 0.0.0.0/0 0.0.0.0/0 /* kube-system/kube-dns:dns */

Worker机器

docker0接口的地址:10.1.9.1

运行了g3服务

[root@kubepi3 ~]# iptables -t nat -v -L -n
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
135 38306 KUBE-SERVICES all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes service portals */
47 17274 DOCKER all -- * * 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL

Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
2296 143K KUBE-SERVICES all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes service portals */
2 138 DOCKER all -- * * 0.0.0.0/0 !127.0.0.0/8 ADDRTYPE match dst-type LOCAL

Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
2258 140K KUBE-POSTROUTING all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes postrouting rules */
2 138 MASQUERADE all -- * !docker0 10.1.9.0/24 0.0.0.0/0

Chain DOCKER (2 references)
pkts bytes target prot opt in out source destination
0 0 RETURN all -- docker0 * 0.0.0.0/0 0.0.0.0/0

Chain KUBE-MARK-MASQ (5 references)
pkts bytes target prot opt in out source destination
0 0 MARK all -- * * 0.0.0.0/0 0.0.0.0/0 MARK or 0x4000

Chain KUBE-NODEPORTS (1 references)
pkts bytes target prot opt in out source destination

Chain KUBE-POSTROUTING (1 references)
pkts bytes target prot opt in out source destination
0 0 MASQUERADE all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes service traffic requiring SNAT */ mark match 0x4000/0x4000

#10.0.0.1:443 -> 192.168.199.238:6443
Chain KUBE-SEP-6BJZJ65EXTPP6DDP (1 references)
pkts bytes target prot opt in out source destination
0 0 KUBE-MARK-MASQ all -- * * 192.168.199.238 0.0.0.0/0 /* default/kubernetes:https /
0 0 DNAT tcp -- * * 0.0.0.0/0 0.0.0.0/0 /
default/kubernetes:https */ tcp to:192.168.199.238:6443

#10.0.0.10:53 ->10.1.97.4:53
Chain KUBE-SEP-AYYJX6SWUVLQ32JT (1 references)
pkts bytes target prot opt in out source destination
0 0 KUBE-MARK-MASQ all -- * * 10.1.97.4 0.0.0.0/0 /* kube-system/kube-dns:dns /
0 0 DNAT udp -- * * 0.0.0.0/0 0.0.0.0/0 /
kube-system/kube-dns:dns */ udp to:10.1.97.4:53

Chain KUBE-SEP-C6CZV7HDHZ4XDC2D (1 references)
pkts bytes target prot opt in out source destination
0 0 KUBE-MARK-MASQ all -- * * 10.1.97.4 0.0.0.0/0 /* kube-system/kube-dns:dns-tcp /
0 0 DNAT tcp -- * * 0.0.0.0/0 0.0.0.0/0 /
kube-system/kube-dns:dns-tcp */ tcp to:10.1.97.4:53

#10.0.0.220:80 -> 10.1.97.2:80
Chain KUBE-SEP-JZ3RL2WX244TSVRU (1 references)
pkts bytes target prot opt in out source destination
0 0 KUBE-MARK-MASQ all -- * * 10.1.97.2 0.0.0.0/0 /* default/g2: /
0 0 DNAT tcp -- * * 0.0.0.0/0 0.0.0.0/0 /
default/g2: */ tcp to:10.1.97.2:80

#10.0.0.234:80 -> 10.1.97.5:80
Chain KUBE-SEP-URSRY67MZM3XT7NU (1 references)
pkts bytes target prot opt in out source destination
0 0 KUBE-MARK-MASQ all -- * * 10.1.97.5 0.0.0.0/0 /* default/g3: /
0 0 DNAT tcp -- * * 0.0.0.0/0 0.0.0.0/0 /
default/g3: */ tcp to:10.1.97.5:80

#把服务地址转向container地址
Chain KUBE-SERVICES (2 references)
pkts bytes target prot opt in out source destination
0 0 KUBE-SVC-NPX46M4PTMTKRN6Y tcp -- * * 0.0.0.0/0 10.0.0.1 /* default/kubernetes:https cluster IP / tcp dpt:443
0 0 KUBE-SVC-TCOU7JCQXEZGVUNU udp -- * * 0.0.0.0/0 10.0.0.10 /
kube-system/kube-dns:dns cluster IP / udp dpt:53
0 0 KUBE-SVC-ERIFXISQEP7F7OF4 tcp -- * * 0.0.0.0/0 10.0.0.10 /
kube-system/kube-dns:dns-tcp cluster IP / tcp dpt:53
0 0 KUBE-SVC-GWYMBTJ7K7KSWVEF tcp -- * * 0.0.0.0/0 10.0.0.220 /
default/g2: cluster IP / tcp dpt:80
0 0 KUBE-SVC-7FIXWSZP6P4B2WBX tcp -- * * 0.0.0.0/0 10.0.0.234 /
default/g3: cluster IP / tcp dpt:80
0 0 KUBE-NODEPORTS all -- * * 0.0.0.0/0 0.0.0.0/0 /
kubernetes service nodeports; NOTE: this must be the last rule in this chain */ ADDRTYPE match dst-type LOCAL

Chain KUBE-SVC-7FIXWSZP6P4B2WBX (1 references)
pkts bytes target prot opt in out source destination
0 0 KUBE-SEP-URSRY67MZM3XT7NU all -- * * 0.0.0.0/0 0.0.0.0/0 /* default/g3: */

Chain KUBE-SVC-ERIFXISQEP7F7OF4 (1 references)
pkts bytes target prot opt in out source destination
0 0 KUBE-SEP-C6CZV7HDHZ4XDC2D all -- * * 0.0.0.0/0 0.0.0.0/0 /* kube-system/kube-dns:dns-tcp */

Chain KUBE-SVC-GWYMBTJ7K7KSWVEF (1 references)
pkts bytes target prot opt in out source destination
0 0 KUBE-SEP-JZ3RL2WX244TSVRU all -- * * 0.0.0.0/0 0.0.0.0/0 /* default/g2: */

Chain KUBE-SVC-NPX46M4PTMTKRN6Y (1 references)
pkts bytes target prot opt in out source destination
0 0 KUBE-SEP-6BJZJ65EXTPP6DDP all -- * * 0.0.0.0/0 0.0.0.0/0 /* default/kubernetes:https */

Chain KUBE-SVC-TCOU7JCQXEZGVUNU (1 references)
pkts bytes target prot opt in out source destination
0 0 KUBE-SEP-AYYJX6SWUVLQ32JT all -- * * 0.0.0.0/0 0.0.0.0/0 /* kube-system/kube-dns:dns */

查看g3的container地址,的确是10.1.97.5,跟master的docker0是同一个网段。

pi@piw:~ $ kubectl exec g3-1645837653-ce1js -ti ash
/ #
/ # ifconfig
eth0 Link encap:Ethernet HWaddr 02:42:0A:01:61:05
inet addr:10.1.97.5 Bcast:0.0.0.0 Mask:255.255.255.0
inet6 addr: fe80::42:aff:fe01:6105/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:554 errors:0 dropped:0 overruns:0 frame:0
TX packets:44 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:182794 (178.5 KiB) TX bytes:4517 (4.4 KiB)

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

做到这里,才注意到,原本运行在worker的g3服务突然就跑到master上了(估计是中途断网,导致跟master和worker之间的连接中断有关)! 所以上面测试结果不能说明问题!

也许是Worker的问题,再后来创建的pod都会中断然后自动转到master上,所以想master和worker上运行pod是做不到了。为了能够继续测试,pods还是运行在master,然后在worker上手动运行docker。

先简单总结的一下当前的环境, 因为后来的环境又有了变化。

我们在master上启用了SkyDNS,这样就可以通过10.0.0.10来解析服务名称。

使用flannel作为网络驱动,并且使用了udp类型。因为master和worker不在同一个内网,所以不能使用默认的host-gw模式。

master

eth0接口的地址:192.168.199.238
docker0接口的地址:10.1.63.1

$ kubectl get svc
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
g4 10.0.0.103 80/TCP 3d #container 10.1.63.3:80
g5 10.0.0.2 80/TCP 16h #container 10.1.63.4:80
kubernetes 10.0.0.1 443/TCP 5d

$ cat /etc/resolv.conf
domain lan
search default.svc.cluster.local svc.cluster.local cluster.local lan #完整的域名路径
nameserver 10.0.0.10 #skydns
nameserver 192.168.199.1

~$ route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
default OpenWrt.lan 0.0.0.0 UG 202 0 0 eth0
10.1.0.0 * 255.255.0.0 U 0 0 0 flannel0
10.1.63.0 * 255.255.255.0 U 0 0 0 docker0
link-local * 255.255.0.0 U 210 0 0 veth9b7959f
link-local * 255.255.0.0 U 212 0 0 veth9608905
link-local * 255.255.0.0 U 226 0 0 vethcae16c6
link-local * 255.255.0.0 U 236 0 0 docker0
link-local * 255.255.0.0 U 240 0 0 veth52e22c5
link-local * 255.255.0.0 U 244 0 0 veth827f0d2
link-local * 255.255.0.0 U 246 0 0 veth7acdc4c
192.168.199.0 * 255.255.255.0 U 202 0 0 eth0

~$ etcdctl get "/coreos.com/network/config"
{ "Network": "10.1.0.0/16", "Backend": { "Type": "udp" } }

worker

eth0接口的地址:192.168.1.12
docker0接口的地址:10.1.9.1

~# cat /etc/resolv.conf
nameserver 10.0.0.10
nameserver 192.168.1.1
search default.svc.cluster.local svc.cluster.local cluster.local

~# route -n
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 192.168.1.1 0.0.0.0 UG 1024 0 0 eth0
10.1.0.0 0.0.0.0 255.255.0.0 U 0 0 0 flannel0
10.1.9.0 0.0.0.0 255.255.255.0 U 0 0 0 docker0
192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
192.168.1.1 0.0.0.0 255.255.255.255 UH 1024 0 0 eth0
192.168.199.0 192.168.1.4 255.255.255.0 UG 0 0 0 eth0

每台机器上的Kube-Proxy会监视etcd,获得所有的服务地址(10.0.0.0/24),并通过iptables映射为container的地址(10.1.0.0/16),然后由flannel保证10.1.0.0/16之间的联通性!

在worker上可以直接使用container地址,因为路由表里有10.1.0.0/16
~# curl 10.1.63.3

WELCOME TO NGINX

但不能使用服务地址,因为路由表并没有10.0.0.0/24,这时候会走默认的网关192.168.1.1。
~# curl 10.0.0.103
不通, 原因请看后面的分析

在work上运行一个alphine
~# docker run -ti alphine /bin/ash

在container里
~# curl 10.0.0.103 #服务ip可以访问

WELCOME TO NGINX

~# cat /etc/resolv.conf #resolve跟宿主一样
nameserver 10.0.0.10
nameserver 192.168.1.1 #因为指向外部运营商的dns,对不认识的域名可能会返回一个导航ip,会干扰我们,必要时应该注释掉。
search default.svc.cluster.local svc.cluster.local cluster.local

~# drill g4 #没结果,

~# drill g4.default.svc.cluster.local #这样就可以了,看来不会默认带上domain,需要完整指定。
g4.default.svc.cluster.local. 30 IN A 10.0.0.103

~# curl g4.default.svc.cluster.local #使用完整域名访问正常

WELCOME TO NGINX

在master上就比较简单

~$ curl g4 #可以直接使用主机名/服务名称访问

WELCOME TO NGINX

为什么worker不能直接访问服务地址?

~#curl 10.0.0.103

~# tcpdump -ni any not port 22 and not port 4001 and not port 8080
12:05:45.073872 IP 192.168.1.12.50712 > 10.1.63.3.80: Flags [S], seq 352689382, win 29200, options [mss 1460,sackOK,TS val 8430068 ecr 0,nop,wscale 7], length 0

eth0 -> container SYN

PREROUTING起作用: 10.0.0.103 被转到了10.1.63.3 , 原本是192.168.1.12 -> 10.0.0.103 , DNAT 变成了 192.168.1.12 -> 10.1.63.3。

12:05:45.074450 IP 192.168.1.12.8285 > 192.168.199.238.8285: UDP, length 60

eth0 -> flannel隧道

tcpdump在这里应该有数据丢失,没有抓到 flannel隧道的回包,因为下面显示的确回来了。

12:05:45.084428 IP 10.1.63.3.80 > 192.168.1.12.50712: Flags [S.], seq 1751434350, ack 352689383, win 28960, options [mss 1260,sackOK,TS val 11149615 ecr 8430068,nop,wscale 6], length 0

eth0 <-- container SYN+ACK

可是,貌似连接并没有成功,因为下面还在重复发包。

12:05:46.067321 IP 192.168.1.12.50712 > 10.1.63.3.80: Flags [S], seq 352689382, win 29200, options [mss 1460,sackOK,TS val 8430168 ecr 0,nop,wscale 7], length 0
12:05:46.067811 IP 192.168.1.12.8285 > 192.168.199.238.8285: UDP, length 60

重复的SYN

12:05:46.077835 IP 10.1.63.3.80 > 192.168.1.12.50712: Flags [S.], seq 1751434350, ack 352689383, win 28960, options [mss 1260,sackOK,TS val 11149714 ecr 8430068,nop,wscale 6], length 0
12:05:47.074279 IP 10.1.63.3.80 > 192.168.1.12.50712: Flags [S.], seq 1751434350, ack 352689383, win 28960, options [mss 1260,sackOK,TS val 11149814 ecr 8430068,nop,wscale 6], length 0

重复的SYN+ACK

重新抓包,我们发现SYN是在flannel0上发出的,可是SYN+ACK却是在eth0上接收的。因为是两次抓包,请忽视端口的不一致(50724和50722)。

~# tcpdump -ni flannel0 not port 22 and not port 4001 and not port 8080
12:50:42.739209 IP 192.168.1.12.50724 > 10.1.63.3.80: Flags [S], seq 3250104472, win 29200, options [mss 1460,sackOK,TS val 8699835 ecr 0,nop,wscale 7], length 0

~# tcpdump -ni eth0 not port 22 and not port 4001 and not port 8080
12:46:09.956699 IP 192.168.1.12.8285 > 192.168.199.238.8285: UDP, length 60
12:46:09.963917 IP 10.1.63.3.80 > 192.168.1.12.50722: Flags [S.], seq 2588662544, ack 2323701189, win 28960, options [mss 1260,sackOK,TS val 11392103 ecr 8672556,nop,wscale 6], length 0

来回包的路径不一致,导致DNAT设置的变换无法正确还原。 这就是为什么前面要加SNAT的原因。

为了验证,我们新增一条规则,使得来回的包都经过flannel0
~# iptables -t nat -A POSTROUTING -o flannel0 -j MASQUERADE

现在果然可以了
~#curl 10.0.0.103 或者 curl g4.default.svc.cluster.local

WELCOME TO NGINX

抓包看
~# tcpdump -ni flannel0 not port 22 and not port 4001 and not port 8080
13:03:23.483916 IP 10.1.9.0.50728 > 10.1.63.3.80: Flags [S], seq 3700831439, win 29200, options [mss 1460,sackOK,TS val 8775909 ecr 0,nop,wscale 7], length 0
13:03:23.502069 IP 10.1.63.3.80 > 10.1.9.0.50728: Flags [S.], seq 1824541104, ack 3700831440, win 28960, options [mss 1460,sackOK,TS val 11495457 ecr 8775909,nop,wscale 6], length 0

为什么master可以直接访问服务地址?

这其实是一种特例,因为在这个例子里,所有的服务都运行在master上,不必通过flannel隧道。

如果服务运行在work上,那么在master上也是不能直接访问服务地址的!

$ sudo tcpdump -ni docker0 port 80
13:18:19.175684 IP 192.168.199.238.53301 > 10.1.63.3.80: Flags [S], seq 1284668269, win 29200, options [mss 1460,sackOK,TS val 11585025 ecr 0,nop,wscale 6], length 0
13:18:19.176108 IP 10.1.63.3.80 > 192.168.199.238.53301: Flags [S.], seq 2103662230, ack 1284668270, win 28960, options [mss 1460,sackOK,TS val 11585025 ecr 11585025,nop,wscale 6], length 0

为什么container可以直接访问服务地址?

container都通过docker0接口,来回路线一致。

三层地址:

Container通过10.0.0.0/16服务地址相互引用,然后服务地址会映射到Container地址上,后者再由网络驱动进行连接。服务地址简化了Container里的应用关联,Container的地址简化了Kubernetes的管理工作,而网络服务只负责连接,分工明确。巧妙地使用了etcd作为集中配置管理,iptables进行地址转换,然后把ip连通性交给隧道服务。

原文时间:2016.8