httpc 的链接池在multi microThread中失效引起高负载

本文主要涉及内容如下

  1. 系统负载高的一种可能
  2. 如何分析系统负载高
  3. httpc的使用注意

需要的pre知识

  1. tcpdump 的使用
  2. tcp 三次握手,四次挥手的具体表现
  3. linux perf的使用
  4. 连接池

背景

之前有一个离线任务是同步一些历史数据到新的数据库,这个数据库不在我们部门,对方提供了一个cgi的写数据存储的接口. 我最初的想法就是用go简单写一个, 写一个发请求的函数func handle,然后起for i in n { go handle } . 这样很简单就实现了并发请求.

遇到的问题

但是当我跑这个程序的时候,cpu非常高. 下面先说下条件

条件

  1. 机器信息
    Architecture: x86_64
    CPU op-mode(s): 32-bit, 64-bit
    Byte Order: Little Endian
    CPU(s): 8
    On-line CPU(s) list: 0-7
    Thread(s) per core: 1
    Core(s) per socket: 8
    Socket(s): 1
    NUMA node(s): 1
    Vendor ID: GenuineIntel
    CPU family: 6
    Model: 94
    Model name: Intel(R) Xeon(R) Gold 61xx CPU
    Stepping: 3
    CPU MHz: 2494.140
    BogoMIPS: 4988.28
    Hypervisor vendor: KVM
    Virtualization type: full
    L1d cache: 32K
    L1i cache: 32K
    L2 cache: 4096K
    NUMA node0 CPU(s): 0-7

  2. 50 个microThread

  3. 一次处理延迟15 ms左右

  4. 稳定请求qps = 50000 q/min

  5. cpu : 30 % ~ 600 % 间循环波动

  6. 程序基本信息: 一个微线程 读取本地key列表文件, push到 chan, 50 个worker 微线程从chan 中读取key 后,根据key发出请求. 请求函数如下

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    func httpPost(qzCtx *svrCtx.Context,syncReq *SyncReq) int{
    if len(syncReq.Use_records) <= 0 {
    return 0
    }
    hc := httpc.New()
    hc.SetJSONBody(syncReq)
    sTime := time.Now()
    _,err := hc.Do(qzCtx)
    if err != nil{
    qzCtx.Fatal("do http err = %v",err)
    return -1
    }
    type RspStruct struct{
    Ret int
    Msg interface{}
    Err_records interface{}
    }
    var dat RspStruct
    err = hc.GetJSONBody(&dat);if err != nil{
    qzCtx.Fatal("get http rsp err %v",err)
    return -1
    }

    if dat.Ret != 0{
    qzCtx.Fatal("http fail rsp = %v,uin = %s",dat,syncReq.Use_records[0].Uid)
    return dat.Ret
    }else{
    return 0
    }
    return 0
    }

7 perf结果:

1
2
3
4
5
6
7
8
-  64.96%  WanbaDataSync  [kernel.kallsyms]    [k] _raw_spin_lock
- _raw_spin_lock
+ 64.71% __inet_hash_connect
+ 34.60% __inet_check_established
+ 0.66% inet_hash_connect
+ 17.49% WanbaDataSync [kernel.kallsyms] [k] __inet_check_established
+ 9.89% WanbaDataSync [kernel.kallsyms] [k] __inet_hash_connect
+ 3.16% WanbaDataSync [kernel.kallsyms] [k] tcp_twsk_unique

分析

  1. 看到这个perf结果,我第一反应就是 wtf ? 这么低的qps , 凭什么这么多cpu花在tcp 连接相关的函数上了.
  1. 系统的绝大部分时间花在spin_lock上了, 50个微线程竞争 port 导致了cpu高

  2. 这说明系统肯定在大量connect和close

  3. http是有长链接版的, go的httpc是否也有连接池或者长链接呢?

  4. 上官网查, 结果发现tcp是有连接池的, 那这个httpc应该也是有连接池的?

  5. 但是perf的结果告诉我们不是这样的, 那来抓包咯

  6. sudo tcpdump -iany port 10121 -Xnlps0 | grep “\[F\.\]” 我只要抓到fin包就可以知道close函数的调用信息了. 如果真的有连接池,那么fin包必然是很少的,结果我这个命令一敲下去,fin包就刷屏了. 这说明我的猜测初步是正确的,但是文档上说有连接池,肯定也不会骗我,估计是我哪里用错了, 那再整一个简单版的实验一下. fin包就是tcp协议中的四次挥手的时候会出现的一种协议包,一般来说close system call会弄出这种包.

htpc test

这里放两个版本,一个是单线程循环发包, 一个是10线程并发发包, 然后再来抓包看fin包的情况

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
func Doreq(syncReq *SyncReq) int{
if len(syncReq.Use_records) <= 0 {
return 0
}
hc := httpc.New()
hc.SetJSONBody(syncReq)
_,err := hc.Do(context.Background())
if err != nil{
fmt.Printf("do http err = %v",err)
return -1
}
type RspStruct struct{
Ret int
Msg interface{}
Err_records interface{}
}
var dat RspStruct
err = hc.GetJSONBody(&dat);if err != nil{
fmt.Printf("get http rsp err %v",err)
return -1
}

if dat.Ret != 0{
fmt.Printf("http fail rsp = %v,uin = %s",dat,syncReq.Use_records[0].Uid)
return dat.Ret
}
return 0
}

func main(){
ele := OneEle{"834425151","123456",1552800702}
req := SyncReq{}
req.Use_records = append(req.Use_records,ele)
for i := 0; i < 10;i++ {
//这个地方,单线程版的,去掉go编译即可
go Doreq(&req)
}
time.Sleep(time.Second * 3)
}

抓包命令:

1
sudo tcpdump -iany port 10121 -Xnlps0 | grep "\[F\.\]"

单线程的抓包结果

1
2
3
4
5
6
13:44:00.832192 IP 9.137.218.46.41582 > 9.137.182.90.10121: Flags [F.], seq 1065, ack 645, win 37, length 0
13:44:00.832207 IP 9.137.218.46.21216 > 10.101.83.143.10121: Flags [F.], seq 802, ack 484, win 35, length 0
13:44:00.832223 IP 9.137.218.46.44976 > 10.101.82.89.10121: Flags [F.], seq 799, ack 484, win 35, length 0
13:44:00.832453 IP 9.137.182.90.10121 > 9.137.218.46.41582: Flags [F.], seq 645, ack 1066, win 74, length 0
13:44:00.832878 IP 10.101.83.143.10121 > 9.137.218.46.21216: Flags [F.], seq 484, ack 803, win 70, length 0
13:44:00.832896 IP 10.101.82.89.10121 > 9.137.218.46.44976: Flags [F.], seq 484, ack 800, win 70, length 0

多线程的抓包结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
13:47:11.219686 IP 9.137.218.46.45093 > 10.101.82.89.10121: Flags [F.], seq 267, ack 162, win 31, length 0
13:47:11.219985 IP 9.137.218.46.21335 > 10.101.83.143.10121: Flags [F.], seq 268, ack 162, win 31, length 0
13:47:11.220390 IP 10.101.82.89.10121 > 9.137.218.46.45093: Flags [F.], seq 162, ack 268, win 62, length 0
13:47:11.220601 IP 10.101.83.143.10121 > 9.137.218.46.21335: Flags [F.], seq 162, ack 269, win 62, length 0
13:47:11.221653 IP 9.137.218.46.21332 > 10.101.83.143.10121: Flags [F.], seq 268, ack 162, win 31, length 0
13:47:11.222266 IP 10.101.83.143.10121 > 9.137.218.46.21332: Flags [F.], seq 162, ack 269, win 62, length 0
13:47:11.227169 IP 9.137.218.46.41700 > 9.137.182.90.10121: Flags [F.], seq 267, ack 162, win 31, length 0
13:47:11.227328 IP 9.137.182.90.10121 > 9.137.218.46.41700: Flags [F.], seq 162, ack 268, win 62, length 0
13:47:14.208108 IP 9.137.218.46.45089 > 10.101.82.89.10121: Flags [F.], seq 267, ack 162, win 31, length 0
13:47:14.208127 IP 9.137.218.46.21331 > 10.101.83.143.10121: Flags [F.], seq 268, ack 162, win 31, length 0
13:47:14.208130 IP 9.137.218.46.41706 > 9.137.182.90.10121: Flags [F.], seq 267, ack 162, win 31, length 0
13:47:14.208133 IP 9.137.218.46.45095 > 10.101.82.89.10121: Flags [F.], seq 267, ack 162, win 31, length 0
13:47:14.208137 IP 9.137.218.46.41704 > 9.137.182.90.10121: Flags [F.], seq 267, ack 162, win 31, length 0
13:47:14.208139 IP 9.137.218.46.21338 > 10.101.83.143.10121: Flags [F.], seq 268, ack 162, win 31, length 0
13:47:14.208428 IP 9.137.182.90.10121 > 9.137.218.46.41706: Flags [F.], seq 162, ack 268, win 62, length 0
13:47:14.208449 IP 9.137.182.90.10121 > 9.137.218.46.41704: Flags [F.], seq 162, ack 268, win 62, length 0
13:47:14.208827 IP 10.101.82.89.10121 > 9.137.218.46.45089: Flags [F.], seq 162, ack 268, win 62, length 0
13:47:14.208836 IP 10.101.83.143.10121 > 9.137.218.46.21331: Flags [F.], seq 162, ack 269, win 62, length 0
13:47:14.208842 IP 10.101.82.89.10121 > 9.137.218.46.45095: Flags [F.], seq 162, ack 268, win 62, length 0
13:47:14.208881 IP 10.101.83.143.10121 > 9.137.218.46.21338: Flags [F.], seq 162, ack 269, win 62, length 0

可以看到,在单线程的时候, httpc 确实只发出了少量的fin包,那说明复用了链接,也就是连接池起作用了,但是在多线程的时候, httpc一共发出去10个fin包,和代码中写的10个请求正好相等,这说明连接池完全没起作用