In this post, I will divert from my usual topic of C++ to jog down my thoughts about TCP and SSL.  I have limited knowledge of networking and even more limited understanding of security so my ramblings here might be full of flaws and security holes. Nevertheless, I thought it would be fun to share a random idea on how to get the web to run a little faster.

Background

SSL is used by HTTPS (and others) to secure the pipe between the browser and the web server. Once used primarily by sites performing financial transactions (banks, ecommerce), it is more and more used by services which require a login (e.g. Gmail, Twitter). As such, a fast HTTPS connection is more important than ever. HTTPS connection starts out as a TCP 3-way handshake, followed by an SSL/TLS handshake. Let's take a quick look at each one in more detail.

TCP 3-way Handshake

TCP 3-way handshake starts with the client sending a SYN packet to the server. Upon receipt, the server replies with its own SYN packet but also piggybacks the acknowledgement (that it received the client's SYN) in the same packet. Thus, the server replies with a SYN-ACK packet. Finally, when the client receives the server's SYN-ACK packet, it replies with an ACK to signify that it has received the SYN from the server. At this point the TCP connection is established and the client can immediately start sending data. Therefore, the latency cost of connection establishment is 1 round-trip in the case of the first message sent by the client (as in HTTP case) and 1.5 round-trips in the case where the server is the first to speak its mind (e.g. POP3).

There are a number of reasons for SYN-packets and 3-way handshake. First, the SYN (synchronize) packets are used to exchange the initial sequence numbers that will be used by both parties to ensure transport reliability (and flow control). Second, it is used to detect old (duplicate) SYN packets arriving at the host by asking the sender to validate them (see RFC793, Figure 8). Lastly, the handshake is used as a security measure. For example, suppose a TCP server is sitting behind a firewall that only accepts traffic from IP 66.55.44.33. If the connection would become established with the very first SYN packet, an attacker could create TCP/IP packet whose source IP would be 66.55.44.33 instead of his own and put the data in that very packet. The server would receive the SYN+data, create a connection and forward the data to the application. By sending a SYN-ACK and expecting it to echo the sequence number in the final ACK packet, the server can be certain that the source IP was not spoofed.

Interestingly, RFC793 does allow data to be included in the SYN packets. However, for the reasons described above, it requires the stack to queue the data for the delivery to the application only after the successful 3-way handshake.

SSL/TLS Handshake

Once a TCP connection is established, SSL handshake begins with the client sending ClientHello message containing version, random bits, and a list of cipher-suites it supports (e.g. AES-256). It can also include a number of extension fields. The server responds with ServerHello containing selected version and cipher-suite, random bits and extension fields. The server then follows up with its certificate and ServerHelloDone message. The three messages can end up within one TCP packet. Finally, the client sends the cipher key (actually bits that will be used to compute it) and both the client and server send messages to switch from plain-text to encrypted communication.

The takeaway here is that just like a TCP handshake, SSL also requires several packets to be exchanged before the application level communication can commence. These exchanges add another 2 round-trips worth of latency.

Proposal: Combine TCP and SSL Handshakes

What if TCP and SSL handshakes could be combined to save a round trip? Conceptually, TCP resides at layer 4 (transport) of the OSI reference model. Since it seems that nobody can quite figure out where SSL fits in, it doesn't seem all that unnatural to create "Secure TCP" protocol (not unlike IPSec). SSL handshake can then begin with the very first SYN packet:

Client                                                     Server
~~~~~~                                                     ~~~~~~
TCP SYN, SSL ClientHello -->
            <-- TCP SYN+ACK, SSL ServerHello+Cert+ServerHelloDone 
TCP ACK, SSL ClientKeyExch+... -->
                            <-- TCP ACK, SSL ChangeCipherSpec+...

This scheme would almost have to push SSL processing into the kernel (since that's where TCP is usually handled) but perhaps some hybrid solution could be implemented (e.g. PKI done in userspace).

Backward Compatibility

Recall that the original TCP specification (RFC793) allows data to be exchanged in SYN packets (I am not sure, though, how OS kernels actually handle this situation). If a server not supporting this new scheme were to receive a SYN/ClientHello packet, it would simply respond with the usual SYN-ACK and queue the ClientHello until it receives the final ACK. The client would see a SYN-ACK packet with no data, send an ACK and wait for the ServerHello to come a bit later.

Security Considerations

One security vulnerability with the proposed scheme is that it would make SYN flood DoS attack easier to execute. Since the data from ClientHello would need to be stored in the embryonic connection, it would use up more memory, causing the system to run out of memory earlier than today. The new scheme would also be incompatible with SYN cookies, a countermeasure against the SYN flood attack.

What about SSL Sessions?

SSL actually has mechanisms for turning the 4-way handshake into a 2-way handshake for all but the first connection (by a client caching a server cookie). This can continue to function as is, and, combined with the proposed scheme would make the whole TCP/SSL connection establishment take just a single round trip.

TCP Fast Open

A recent proposal from Google, called TCP Fast Open, is designed to reduce the overhead of the TCP 3-way handshake. In a similar way to SSL sessions, TCP Fast Open generates a cryptographically signed cookie that the client can store and then use for subsequent connections. While the first client-server connection uses the full 3-way handshake, subsequent ones can have data sent and delivered to the application via the SYN packet. While TCP Fast Open should be able to peacefully coexist with this proposal, it would not help in reducing the number of round trips for SSL's session establishment.

Related Work

A proposal from 2009, attempts to solve a similar problem in a generic way. It proposes the ability to include a limited amount of data (up to 64 bytes) in the SYN-ACK packet coming from the server. It could then be used to include a first part of the handshake. However, although the proposal aims to be generic (not tied to a specific upper layer protocol), it does not allow data to be included in the client's SYN packet, making it incompatible with protocols that rely on the client to initiate the handshake. The 64 byte limitation may also not be enough for SSL needs.

Conclusion

With more services utilizing HTTPS and a growing number of wireless devices whose communication latencies are often greater than those of their wired counterparts, it is imperative to minimize the number of round-trips made during connection establishment. While I am assuming that the proposed scheme is not adequate for implementation, I hope it can be used as food for thought on how to improve performance of the future web applications.