Golang生产级可靠UDP库

更新时间：2022-09-19 11:36:11

Introduction

kcp-go is a Production-Grade Reliable-UDP library for golang.

This library intents to provide a smooth, resilient, ordered, error-checked and anonymous delivery of streams over UDPpackets, it has been battle-tested with opensource project kcptun. Millions of devices(from low-end MIPS routers to high-end servers) have deployed kcp-go powered program in a variety of forms like online games, live broadcasting, file synchronization and network acceleration.

Lastest Release

Features

Designed for Latency-sensitive scenarios.
Cache friendly and Memory optimized design, offers extremely High Performance core.
Handles >5K concurrent connections on a single commodity server.
Compatible with net.Conn and net.Listener, a drop-in replacement for net.TCPConn.
FEC(Forward Error Correction) Support with Reed-Solomon Codes
Packet level encryption support with AES, TEA, 3DES, Blowfish, Cast5, Salsa20, etc. in CFB mode, which generates completely anonymous packet.
Only A fixed number of goroutines will be created for the entire server application, costs in context switch between goroutines have been taken into consideration.

Compatible with skywind3000's C version with various improvements.

Documentation

For complete documentation, see the associated Godoc.

Specification

+-----------------+

| SESSION |

| KCP(ARQ) |


+-----------------+

| FEC(OPTIONAL) |


+-----------------+
+-----------------+

| UDP(PACKET) |


| CRYPTO(OPTIONAL)|
+-----------------+
+-----------------+

| PHY |


| IP |
+-----------------+
| LINK |
+-----------------+

(LAYER MODEL OF KCP-GO)

+-----------------+1

Usage

Client: full demo

kcpconn, err := kcp.DialWithOptions("192.168.0.1:10000", nil, 10, 3)

Server: full demo

lis, err := kcp.ListenWithOptions(":10000", nil, 10, 3)

Benchmark

Model Name: MacBook Pro

Model Identifier: MacBookPro14,1


Processor Name:	Intel Core i5

Number of Processors: 1


Processor Speed:	3.1 GHz
 Total Number of Cores:	2

Memory: 8 GB


L2 Cache (per Core):	256 KB

L3 Cache: 4 MB

$ go test -v -run=^$ -bench .

beginning tests, encryption:salsa20, fec:10/3


goos: darwin
goarch: amd64

BenchmarkSM4-4 50000 32180 ns/op 93.23 MB/s 0 B/op 0 allocs/op


pkg: github.com/xtaci/kcp-go

BenchmarkAES128-4 500000 3285 ns/op 913.21 MB/s 0 B/op 0 allocs/op


BenchmarkAES192-4 	 300000	 3623 ns/op	 827.85 MB/s	 0 B/op	 0 allocs/op

BenchmarkTEA-4 100000 15384 ns/op 195.00 MB/s 0 B/op 0 allocs/op


BenchmarkAES256-4 	 300000	 3874 ns/op	 774.20 MB/s	 0 B/op	 0 allocs/op
BenchmarkXOR-4 	20000000	 89.9 ns/op	33372.00 MB/s	 0 B/op	 0 allocs/op

BenchmarkNone-4 30000000 45.7 ns/op 65597.94 MB/s 0 B/op 0 allocs/op


BenchmarkBlowfish-4 	 50000	 26927 ns/op	 111.41 MB/s	 0 B/op	 0 allocs/op
BenchmarkCast5-4 	 50000	 34258 ns/op	 87.57 MB/s	 0 B/op	 0 allocs/op
Benchmark3DES-4 	 10000	 117149 ns/op	 25.61 MB/s	 0 B/op	 0 allocs/op

BenchmarkCRC32-4 20000000 65.2 ns/op 15712.43 MB/s


BenchmarkTwofish-4 	 50000	 33538 ns/op	 89.45 MB/s	 0 B/op	 0 allocs/op
BenchmarkXTEA-4 	 30000	 45666 ns/op	 65.69 MB/s	 0 B/op	 0 allocs/op
BenchmarkSalsa20-4 	 500000	 3308 ns/op	 906.76 MB/s	 0 B/op	 0 allocs/op
BenchmarkCsprngSystem-4 	 1000000	 1150 ns/op	 13.91 MB/s

BenchmarkFECDecode-4 1000000 1119 ns/op 1339.61 MB/s 1606 B/op 2 allocs/op


BenchmarkCsprngMD5-4 	10000000	 145 ns/op	 110.26 MB/s
BenchmarkCsprngSHA1-4 	10000000	 158 ns/op	 126.54 MB/s
BenchmarkCsprngNonceMD5-4 	10000000	 153 ns/op	 104.22 MB/s
BenchmarkCsprngNonceAES128-4 	100000000	 19.1 ns/op	 837.81 MB/s
BenchmarkFECEncode-4 	 2000000	 832 ns/op	1801.83 MB/s	 17 B/op	 0 allocs/op

BenchmarkEchoSpeed1M-4 30 34859104 ns/op 30.08 MB/s 1143773 B/op 27186 allocs/op


BenchmarkFlush-4 	 5000000	 272 ns/op	 0 B/op	 0 allocs/op
BenchmarkEchoSpeed4K-4 	 5000	 259617 ns/op	 15.78 MB/s	 5451 B/op	 149 allocs/op
BenchmarkEchoSpeed64K-4 	 1000	 1706084 ns/op	 38.41 MB/s	 56002 B/op	 1604 allocs/op
BenchmarkEchoSpeed512K-4 	 100	 14345505 ns/op	 36.55 MB/s	 482597 B/op	 13045 allocs/op

ok github.com/xtaci/kcp-go 50.349s


BenchmarkSinkSpeed4K-4 	 50000	 31369 ns/op	 130.57 MB/s	 1566 B/op	 30 allocs/op
BenchmarkSinkSpeed64K-4 	 5000	 329065 ns/op	 199.16 MB/s	 21529 B/op	 453 allocs/op
BenchmarkSinkSpeed256K-4 	 500	 2373354 ns/op	 220.91 MB/s	 166332 B/op	 3554 allocs/op
BenchmarkSinkSpeed1M-4 	 300	 5117927 ns/op	 204.88 MB/s	 310378 B/op	 6988 allocs/op

PASS

Key Design Considerations

slice vs. container/list

kcp.flush() loops through the send queue for retransmission checking for every 20ms(interval).

I've wrote a benchmark for comparing sequential loop through slice and container/list here:

https://github.com/xtaci/notes/blob/master/golang/benchmark2/cachemiss_test.go

BenchmarkLoopSlice-4 2000000000 0.39 ns/op

BenchmarkLoopList-4 100000000 54.6 ns/op

List structure introduces heavy cache misses compared to slice which owns better locality, 5000 connections with 32 window size and 20ms interval will cost 6us/0.03%(cpu) using slice, and 8.7ms/43.5%(cpu) for list for each kcp.flush().

Timing accuracy vs. syscall clock_gettime

Timing is critical to RTT estimator, inaccurate timing leads to false retransmissions in KCP, but calling time.Now() costs 42 cycles(10.5ns on 4GHz CPU, 15.6ns on my MacBook Pro 2.7GHz).

The benchmark for time.Now() lies here:

https://github.com/xtaci/notes/blob/master/golang/benchmark2/syscall_test.go

BenchmarkNow-4 	100000000	 15.6 ns/op

In kcp-go, after each kcp.output() function call, current clock time will be updated upon return, and for a single kcp.flush()operation, current time will be queried from system once. For most of the time, 5000 connections costs 5000 * 15.6ns = 78us(a fixed cost while no packet needs to be sent), as for 10MB/s data transfering with 1400 MTU, kcp.output() will be called around 7500 times and costs 117us for time.Now() in every second.

Connection Termination

Control messages like SYN/FIN/RST in TCP are not defined in KCP, you need some keepalive/heartbeat mechanism in the application-level. A real world example is to use some multiplexing protocol over session, such as smux(with embedded keepalive mechanism), see kcptun for example.

FAQ

Q: I'm handling >5K connections on my server, the CPU utilization is so high.

A: A standalone agent or gate server for running kcp-go is suggested, not only for CPU utilization, but also important to the precision of RTT measurements(timing) which indirectly affects retransmission. By increasing update interval with SetNoDelay like conn.SetNoDelay(1, 40, 1, 1) will dramatically reduce system load, but lower the performance.

Who is using this?

https://github.com/xtaci/kcptun -- A Secure Tunnel Based On KCP over UDP.
https://github.com/get***/*** -- *** delivers fast access to the open Internet.
https://github.com/smallnest/rpcx -- A RPC service framework based on net/rpc like alibaba Dubbo and weibo Motan.
https://github.com/gonet2/agent -- A gateway for games with stream multiplexing.
https://github.com/syncthing/syncthing -- Open Source Continuous File Synchronization.

https://play.google.com/store/apps/details?id=com.k17game.k3 -- Battle Zone - Earth 2048, a world-wide strategy game.

Links

https://github.com/xtaci/libkcp -- FEC enhanced KCP session library for iOS/Android in C++
https://github.com/skywind3000/kcp -- A Fast and Reliable ARQ Protocol
https://github.com/klauspost/reedsolomon -- Reed-Solomon Erasure Coding in Go

原文发布时间为：2018-10-13

本文来自云栖社区合作伙伴“Golang语言社区”，了解相关信息可以关注“Golang语言社区”。

上一篇 : ：裸辞后，从Android转战Web前端的学习以及求职之路下一篇 : 前端安全系列之二：如何防止CSRF攻击？

Golang生产级可靠UDP库

相关阅读

推荐文章