commit 6ef6ccecaaae5921473f73f09213830c14a5eb6c Author: SoniEx2 Date: Sat Feb 9 15:40:38 2019 -0200 Add tcpquic.md diff --git a/tcpquic.md b/tcpquic.md new file mode 100644 index 0000000..98377ed --- /dev/null +++ b/tcpquic.md @@ -0,0 +1,228 @@ +# Connection Latency + +Hi! Let's say you go to visit your favorite website, and it takes 3 seconds to show up. You run a speed test, and your +connection is doing fine at around 100 Mbps. So how come it took 3 seconds to show up? That's what connection latency is. + +Let's say our network has 3 nodes: + +``` +your phone ---------> our router ---------> the server + <--------- <--------- +``` + +(This is a rather simplistic model, real networks are a lot more complicated. But it'll work for this demonstration.) + +Let's say sending a packet between two adjacent nodes takes n seconds. This means that for a packet to go from you to +the server, it takes 2\*n seconds (n seconds to go from you to our router, then another n to go from our router to the +server). As such, we want to keep this n very low. Ideally, it'd be 0, but in practice we're limited by things like the +speed of light. + +``` +your phone ---------> our router ---------> the server + <--------- <--DATA--- + +your phone ---------> our router ---------> the server + <--DATA--- <--------- +``` + +So you might be thinking, "it takes 3 seconds for the server to send me the page?! n is 1.5s?!" No, not quite. + +Before the server can send you anything, it first needs to know what you want. You need to tell the server what you want. +So our n is no longer 1.5s but 0.75s instead. + +``` +your phone ---HTTP--> our router ---------> the server + <--------- <--------- + +your phone ---------> our router ---HTTP--> the server + <--------- <--------- + +your phone ---------> our router ---------> the server + <--------- <--DATA--- + +your phone ---------> our router ---------> the server + <--DATA--- <--------- +``` + + +But that's not all! You can't just ask the server to send you stuff and have it send you stuff! There are things like +spoofing that we need to be concerned about, otherwise we'd get massive DDoS amplification attacks! Instead, meet TCP: + +## TCP Connection + +Transmission Control Protocol, or TCP, is the protocol used to prevent evil hackers from bringing down the internet. It +accomplishes that by employing a 3-way handshake. So, how does it work? Well, first, you ask for a connection. This is +called a SYN in TCP: + +``` +your phone ---SYN---> our router ---------> the server + <--------- <--------- + +your phone ---------> our router ---SYN---> the server + <--------- <--------- +``` + +This lets the server know you want to send data. + +When the server receives the SYN, it then tells you that it got the SYN, and asks *you* for a connection. This is called +a SYN-ACK: + +``` +your phone ---------> our router ---------> the server + <--------- <-SYN-ACK- + +your phone ---------> our router ---------> the server + <-SYN-ACK- <--------- +``` + +This lets you know the server wants to send data, and acknowledges that you want to send data. But we're not quite done yet. + +We still need to acknowledge that the server wants to send data. So, we send an ACK: + + +``` +your phone ---ACK---> our router ---------> the server + <--------- <--------- + +your phone ---------> our router ---ACK---> the server + <--------- <--------- +``` + +*Now you can get your data.* + +We took 6n to get a connection, and 2n to get our data... and another 2n to request the data. As such, 10n = 3s, or +n = 0.3s... So if we were to simply send a packet to the server and get a similar packet back, it'd take about 1.2s. However, +we're not quite done yet. Before your phone can talk to the server, it needs to know where the server is. When you type an +address into the browser's address bar, that's only the name of the server - we need instructions to get the packets there. + +This is where DNS comes in: + +## DNS Queries + +Domain Name System, or DNS, is the protocol that takes a domain name and converts it into an IP address - the latter is +basically a map/instructions on how to get the packets to the destination. + +Thankfully, DNS is usually stored in the router. Additionally, it doesn't use TCP, so there's no 3-way handshake. + +``` +your phone ---NAME--> our router ---------> the server + <--------- <--------- + +your phone ---------> our router ---------> the server + <---IP---- <--------- +``` + +If the router doesn't know a name, it has to ask another router about it. However, this generally only happens once every few +hours, so it's not something we have to worry about. + +This adds another 2n to our time. We're up to 12n = 3s, or n = 0.25s. It only takes 1 second to send a packet to the server +and get it back! TCP is awful! ... Not so fast, tho. You might've noticed that the network is busy with only one packet at a +time. Maybe we can do something to improve this. Okay, we can't improve the DNS query, as it's required to happen before we +can do anything. But can we improve the TCP? What if we terminate the TCP at the router? + +## Terminating the TCP at the router + +While not strictly allowed by the internet specifications, it's not strictly disallowed either. If implemented, our flow can +look like this: + +``` +your phone ---NAME--> our router ---------> the server + <--------- <--------- + +your phone ---------> our router ---------> the server + <---IP---- <--------- + +your phone ---SYN---> our router ---------> the server + <--------- <--------- + +your phone ---------> our router ---SYN---> the server + <-SYN-ACK- <--------- + +your phone ---ACK---> our router ---------> the server + <--------- <-SYN-ACK- + +your phone ---HTTP--> our router ---ACK---> the server + <--------- <--------- + +your phone ---------> our router ---HTTP--> the server + <--------- <--------- + +your phone ---------> our router ---------> the server + <--------- <--DATA--- + +your phone ---------> our router ---------> the server + <--DATA--- <--------- +``` + +We're down from 12n to only 9n! With our n = 0.25s, we've shaved off 0.75s from our original 3s! This is a noticeable +improvement. + +However, you might've noticed I've been talking about `HTTP` so far. Additionally, you can have both an ACK and an HTTP in +transit at the same time, this shaves off 1n from both our original 12n and our 9n, so we have 11n = 3s and an improvement +of approximately 0.81s. So it's even slightly better. + +HTTPS, on the other hand, also has its own handshake after TCP's. I don't wanna get into this, because you can probably see +how ridiculous it's getting by now. This handshake can also be partially terminated by the router, so *it* can also be +optimized slightly, and we can shave off more n's. + +But let's look at QUIC real quick: + +## QUIC + +(I don't know what QUIC stands for.) + +QUIC is a protocol that does something similar to TCP, with one major difference: it uses UDP. + +User Datagram Protocol, or UDP, is also used by DNS (see above). This means it has no handshake. QUIC implements its own +handshake, on top of UDP. This means QUIC is basically like TCP, but it comes with a serious caveat: being UDP-based, it +DOESN'T benefit from our TCP optimization from earlier! + +As such, going QUIC over existing networks has one serious drawback: it adds back those 3n that we were able to shave off! +And if we optimize for QUIC in addition to TCP, we still only manage to shave off those 3n again. + +So, is there any room for improvement? Can we shave off more n's? + +... Maybe. It would require some changes to the web. More specifically, what if the router could serve some of the content +directly, without ever reaching the server? + +That's where we need to change the protocols slightly: + +## Terminating "HTTP" at the router + +Rather than terminating just TCP at the router, can we go one step further? + +Can we create a protocol such that the great majority of the connections look more like this: + +``` +your phone ---NAME--> our router ---------> the server + <--------- <--------- + +your phone ---------> our router ---------> the server + <---IP---- <--------- + +your phone ---SYN---> our router ---------> the server + <--------- <--------- + +your phone ---------> our router ---SYN---> the server + <-SYN-ACK- <--------- + +your phone ---ACK---> our router ---------> the server + <--------- <-SYN-ACK- + +your phone --NHTTP--> our router ---ACK---> the server + <--------- <--------- + +your phone ---------> our router ---------> the server + <--DATA--- <--------- +``` + +(shave off another n if you combine the ACK and the NHTTP) + +We just managed to shave off another 2n! While this requires extensive changes to the existing infrastructure, the load +times go from the 2.25s/2.19s from our "Terminating TCP at the router" to an even lower 1.64s! This is almost half the +original 3s! However, this improvement is not as perfect as our "Terminating TCP at the router" and "Partially terminating +HTTPS at the router" - you want your private data to go encrypted all the way to the server, so anything dealing with +private data would be back to the original 3s/2.25s/2.19s depending on optimizations. This is okay tho, as most data on the +web - images, videos, HTML (page layout/behaviour), CSS (also page layout), Javascript (also page behaviour) - are generally +not private. For example, your neighbor probably watches the same videos as you - thus the videos are not private - but your +bank statement is exclusive to you - and as such, private.