drio

I wanted to know how WireGuard works internally but I never thought reading the code and understanding the internals was going to be such a fun and rewarding experience. I used the Go implementation and also the WireGuard paper as references (more the paper initially). The most interesting aspect to me was how this piece of software touches on many networking and cryptographic concepts that present a great opportunity for learning. During the process, I decided to write my own version of WireGuard (miniwg). It is a purely didactic implementation and despite being fully functional, it is not production ready.

I have also been doing some exploratory work to understand how you can run WireGuard in a userspace TCP/IP networking stack. Spanza is a POC on how to use DERP servers to overcome the WireGuard requirement of having a direct connection between peers. You have a process that runs alongside your WireGuard, and you point the WireGuard endpoint of the remote peer to that local process. Spanza will forward the traffic over a DERP server, and the DERP server will deliver the traffic to the final peer. Your traffic will follow an extra jump (the DERP server) which will have a performance penalty, but you’ll have the benefit of always being able to maintain connectivity between those two peers - regardless of networking restrictions.

As long as outbound 443/tcp traffic is allowed, you’ll always be able to connect. Also note that if you run WireGuard in a userspace network stack, you can link the Spanza logic into your process and have access to that functionality.

But today I wanted to capture the knowledge I’ve gained by reading the WireGuard paper while trying to understand how it works. This post captures my understanding of how WireGuard works and I thought it may be useful for others who are interested in studying this wonderful piece of software.

If you see mistakes or have comments on how to improve this post, please, reach out and let me know.

Should we begin?

Table of Contents #

Intro
A high level view
Basic crypto concepts
Crypto primitives
The handshake
- Packets on the wire
  - Packet 1: Initiator → Responder
  - Packet 2: Responder → Initiator
- Generating the handshake packets
Transport Data Packets
- Transport Packet Structure
Managing state
- When to rekey (transport message limits)
- How WireGuard manages multiple concurrent sessions during transitions
Final thoughts

Intro #

The wireguard logo. Do you know the story behind it?

WireGuard has been a piece of technology that I have always admired for its simplicity and the immensely useful functionality it provides. It transformed a corner of networking that desperately needed new ideas: VPNs. IPsec and OpenVPN are great technologies but they are difficult to master and to deploy properly. Too many options and configuration knobs can translate into security problems. Jason A. Donenfeld, the author of WireGuard, came along and released (2016) a piece of software that followed the Unix philosophy: do one thing and do it well. I would argue that WireGuard is to networking and VPNs what Linux was to operating systems or Git was to version control systems.

I remember tinkering with it when it came out and thinking, “OK, I can set this up and be reasonably confident I’m not doing something wrong that may translate into security issues down the road.” I also remember how it delivered consistently great latency and throughput. As WireGuard became more and more popular, it eventually moved from a standalone Linux kernel module to being part of the mainline Linux kernel (2019). Here is a quote from the always eloquent Linus Torvalds about WireGuard:

Can I just once again state my love for [WireGuard] and hope it gets merged soon? Maybe the code isn't perfect, but I've skimmed it, and compared to the horrors that are OpenVPN and IPSec, it's a work of art.

And my favorite networking product (Tailscale) uses WireGuard for the data plane (the component that pushes data once a connection is created between two peers). Another reason for wanting to learn more about its internals.

Someone once told me they didn’t care that Tailscale used WireGuard for the data plane - Tailscale could have just implemented their own, right? After all, WireGuard follows the Noise Protocol Framework for the cryptographic handshake (more on that later). I disagree. Writing my toy version and reading the paper taught me just how many things can go wrong. WireGuard was written by Jason A. Donenfeld, someone with extensive cryptography and security experience, and has even been formally verified by the research community - proving it’s mathematically sound.

But what does formally verified mean?

I’m not an expert here, but the way I see it: when we test code, we check a specific number of inputs for our algorithm. When we formally verify it, we mathematically prove it works for ALL possible inputs and that the outputs are correct. A proof, by the way, is like an unbreakable chain of reasoning where each link is solid and builds upon the previous one.

But formal verification has limits. You can prove the protocol design is correct and that the handshake has no flaws. But you’re not proving that the implementation is bug-free, or that there won’t be human configuration errors, or that operating system interactions won’t cause issues.

If you want more details, see:

Benjamin Dowling and Kenneth G. Paterson. 2018. “A Cryptographic Analysis of the WireGuard Protocol.” In International Conference on Applied Cryptography and Network Security (ACNS 2018), pages 3-21. Springer. PDF

A high level view #

Let’s look at how WireGuard works at a very high level. Here are a couple of drawings that capture the flow of data between a peer (process) connecting to another peer. I have created two separate drawings for inbound and outbound traffic.

Note: The diagrams below show a “standard” kernel-mode WireGuard setup. There are also userspace implementations like wireguard-go and boringtun which work differently, but the core protocol concepts remain the same.

Wireguard components: outbound traffic.

Everything starts with a process making an operating system syscall to open a socket. In Unix, the process will get a file descriptor that it can use to read and write data. When creating a socket, you specify different things, among them the IP address of the machine/process you want to send the data to.

The call enters the kernel and traverses the TCP/IP stack. Along the way, headers are added. When we reach the IP layer, the kernel sees the device that handles traffic for that destination address is a virtual interface (TUN device). A virtual interface looks like a hardware network interface but packets do not flow down the lower/hardware levels. Instead, the kernel delivers them to the process (or kernel module) that created that device. In this case, our WireGuard process.

Wireguard components: inbound traffic.

The WireGuard process examines the IP headers and searches for a peer in its cryptokey routing table that matches that IP address (or network address). If it finds one, it encrypts the inner IP packet using ChaCha20-Poly1305 and sends the encrypted blob over the UDP connection that it has open to the remote peer.

The Cryptokey Routing Table

This is one of WireGuard’s clever simplifications. Instead of complex routing rules, WireGuard uses a simple table that maps:

Public Key → Allowed IPs

For example:

Peer: ABC123... (public key)
  Allowed IPs: 10.0.0.5/32, 192.168.1.0/24

Outbound: When sending a packet to 10.0.0.5, WireGuard looks up which peer is allowed to use that IP, finds the public key, and encrypts the packet for that peer.

Inbound: When receiving an encrypted packet, WireGuard decrypts it, checks the source IP, and verifies that the sending peer (identified by their public key from the handshake) is actually allowed to use that source IP. If not, the packet is dropped.

This cryptographically binds IP addresses to public keys, preventing IP spoofing and making WireGuard’s access control simple and secure. See section 4 of the paper for more details.

Packets coming from the other peer follow the opposite direction. The NIC delivers the packets to the TCP/IP stack, the kernel removes the headers and delivers the packet to the process that is at the end of that UDP connection (our WireGuard process). There, WireGuard authenticates and decrypts the payload. Now we have a plaintext IP packet that can be injected into the TUN device. The packet traverses the TCP/IP stack, headers are removed, and the payload is delivered to the process at the other end of the connection.

And that’s it! A high-level view of how WireGuard works. Armed with this knowledge, we can now iterate and add more details and nuances. Let’s start with the cryptographic handshake that every peer performs to create encrypted, authenticated connections. I have to tell you, this is the part I spent the most time on. I wasn’t very familiar with the cryptographic concepts that WireGuard uses. But it was incredibly rewarding when things clicked. This cryptographic choreography we are going to describe here is very powerful and is what makes WireGuard secure.

Basic crypto concepts #

We want to create a secure connection between two peers. But what does that mean? Here are some things we’d like to achieve:

Prove both parties are who they claim: Authentication.
Derive a shared secret only they know (key agreement) so we can then use it to encrypt the traffic and verify it hasn’t been tampered with.
Protect past sessions if keys leak later (forward secrecy): if an attacker manages to get the keys used for encryption, we don’t want her to be able to decode old packets.
Hide who’s talking to whom (identity hiding): We don’t want attackers to see what peers a peer is talking to (the identity comes from the public key)
Prevent replay attacks (freshness): We don’t want an attacker to use past packets and replay them with malicious intentions.
But we need to be fast and still provide all that security. Running crypto algorithms can be computationally expensive. We want to choose wisely so we can run fast without compromising security.

Achieving that level of security on your own is hard, and you can easily make mistakes while building a protocol to achieve those goals. Jason A. Donenfeld, the author of WireGuard, used proven cryptographic protocols to achieve those goals. He specifically chose the Noise Protocol Framework, and more specifically the IKpsk2 Noise pattern. I means the Initiator’s static key is sent (encrypted) during the handshake, and K means the Responder’s static key is Known beforehand (pre-shared in the configuration). The psk2 part is for additional security to make the protocol quantum-resistant.

Crypto primitives #

Now, we need to learn a few cryptographic primitives and concepts to understand the WireGuard handshake. Think about these as cryptographic building blocks that we put together to achieve the goals described above.

Symmetric Key Cryptography: Systems using this approach use the same key for encryption and decryption. It’s fast but requires the key to be securely shared beforehand. AES and ChaCha20 are concrete examples of this.
Asymmetric Key (Public Key Cryptography): Systems using this approach have a key pair: public key (shareable) and private key (secret). WireGuard uses Curve25519 for its elliptic curve operations.

Hash Function: A one-way function that produces fixed-size output (digest) from any input. It has very important properties: deterministic, irreversible, and collision-resistant. WireGuard uses BLAKE2s.
MAC (Message Authentication Code): A tag that proves message authenticity and integrity using a shared secret. A concrete implementation is HMAC. If I use HMAC(text), I can verify it came from a specific person and the text has not been tampered with. Note that there is no encryption, only authentication. WireGuard uses HMAC.
AEAD (Authenticated Encryption with Associated Data): Combines encryption + authentication in one operation. WireGuard uses ChaCha20-Poly1305. Gotta love the name.
KDF (Key Derivation Function): When we build our crypto protocol - our crypto recipe - we are going to be creating cryptographic material (our main ingredient), those are basically bits that have a high level of entropy. Entropy is a measure of randomness or unpredictability in data. It’s not just about the length in bits, but how random those bits are. The password “1234” has very low entropy (predictable), while a key like 0x7a3f2e1d9c8b5a4f3e2d1c0b9a8f7e6d has high entropy (unpredictable, random).

We measure entropy in bits: a 256-bit key has up to 256 bits of entropy if all bits are truly random, but if the bits follow a pattern (like all zeros), the entropy is much lower. In cryptography, we typically want 128 bits or more of entropy for strong security.

If entropy is weak, an attacker can perform a brute force attack by trying all possible values. For example, if a key only has 40 bits of entropy (even if it’s 256 bits long), an attacker only needs to try 2^40 (~1 trillion) possibilities instead of 2^256, making it feasible to crack with modern computers. This is why truly random, high-entropy keys are critical.

We apply a KDF to cryptographic material to extract and concentrate entropy so we can use it as a key. Why not just use the cryptographic material directly, you may ask? The raw output from operations like Diffie-Hellman (see below) may have patterns or biases - it’s not uniformly random. A KDF takes this imperfect input and produces clean, uniformly random output that looks indistinguishable from true randomness. It also ensures we get exactly the key length we need. WireGuard uses HKDF. We will talk more about it as we go over the exact handshake operations.
DH (Diffie-Hellman Key Exchange): A protocol for two parties to agree on a shared secret over a public channel. Alice and Bob derive the same secret without ever sending it. They use the other party’s public key and their own private key to derive that secret value. Mind-blowing. No wonder Diffie and Hellman won the Turing Award for their work on public-key cryptography.
Nonce (Number Used Once): A value that must never repeat with the same key. Ensures uniqueness. Example: A counter incremented for each message: 0, 1, 2, 3… This nonce adds critical uniqueness to make each cryptographic operation produce different output, even when the inputs (key + message) are the same.

The handshake #

Packets on the wire #

Let’s look at the actual data WireGuard sends over the wire during the handshake. Most of the time, with only two packets and 1-RTT (one round-trip time), both peers get what they need to create a secure connection. Other protocols require more packets, making them slower in this regard. WireGuard is a fast protocol, and this is one of the reasons why.

Packet 1: Initiator → Responder #

Don’t worry if not everything makes sense. How and why we use these bytes in the packet will become more clear as we study the computations required to implement the Noise protocol selected by WireGuard.

The Initiator creates a packet that contains the following:

0x01 (1 byte): Message type indicating this is the first packet in the handshake (called handshake initiation message in section 5.4.2 of the paper).
Reserved (3 bytes): Set to zeros. These are reserved for memory alignment and future protocol versions.
Sender index (4 bytes): A locally-chosen identifier for this session. It’s a way for WireGuard to identify the session with a peer (analogous to IPsec’s SPI, as mentioned in the paper).

Ephemeral public key (32 bytes): The ephemeral public key that the Initiator generated (we will see how soon). This provides forward secrecy.
Encrypted static public key (48 bytes): This is the public WireGuard key of the Initiator, encrypted (32 bytes + 16-byte authentication tag) to hide the Initiator’s identity.
Encrypted timestamp (28 bytes): A TAI64N timestamp (12 bytes + 16-byte auth tag). This ensures we don’t process old packets (replay protection).
MAC1 (16 bytes): A MAC using the Responder’s public key. This lets the Responder know we know their public key. If this MAC fails, the packet can be dropped quickly to avoid more computation (as described in section 5.4.4 of the paper).
MAC2 (16 bytes): For preventing DoS attacks. If the Responder is under heavy load, it can send a cookie reply message containing a cookie. This cookie must be MACed by the Initiator in subsequent attempts. This way the Responder can drop packets quickly and avoid expensive cryptographic operations (see section 5.3 of the paper).

Packet 2: Responder → Initiator #

The Responder validates packet 1, completes the handshake, and sends back a response containing:

0x02 (1 byte): Message type indicating this is the handshake response packet
Reserved (3 bytes): Set to zeros for memory alignment and future protocol versions
Sender index (4 bytes): The Responder’s locally-chosen identifier for this session
Receiver index (4 bytes): Echoes back the Initiator’s sender index from packet 1, confirming which session this response is for
Ephemeral public key (32 bytes): The Responder’s ephemeral public key. Combined with the Initiator’s ephemeral key, this completes the forward secrecy guarantee
Encrypted empty payload (16 bytes): An empty payload encrypted with the derived session key (0 bytes + 16-byte auth tag). This proves the Responder successfully completed all cryptographic operations and has the correct keys
MAC1 (16 bytes): MAC using the Initiator’s public key. Proves the Responder knows who it’s talking to
MAC2 (16 bytes): Optional cookie-based MAC for DoS protection, same mechanism as in packet 1

After the handshake (the initiator receives the packet from the responder) we have:

Mutually authenticated each other (Authentication)
Derived identical session keys from multiple DH operations (Key agreement)
Established forward secrecy through ephemeral keys
Protected against replay attacks through timestamps
Hidden the Initiator’s identity (Identity hiding)

The handshake is complete! Both peers can now start encrypting and sending data packets using the derived transport keys.

This is for the “happy path,” so we have 1-RTT (Round Trip Time). But if the Responder is under heavy load, we would have 3 packets (the additional packet being the cookie reply message described in section 5.4.7 of the paper). We will focus on the happy path for our implementation.

Generating the handshake packets #

But how does WG generate the values necessary to include in the packets that are sent over the wire? Following the “recipe” from the Noise Protocol Framework. Let’s start with the first packet the initiator sends.

We need two main components for our handshake state: the hash and the chaining key (Hi and Ci respectively in the paper).

The hash serves two purposes. First, it keeps a running transcript of all operations we perform during the handshake, allowing both peers to verify they followed the same steps. Second, we use it as “additional authenticated data” in AEAD operations, which binds each encryption to the specific handshake context so attackers cannot replay encrypted data in a different context.

The chaining key accumulates cryptographic material (entropy) from which we derive encryption keys. When I first read about “accumulating crypto material,” I didn’t fully understand it. But here’s what’s happening: we’re continuously mixing in new sources of randomness (like Diffie-Hellman results) to increase the entropy pool. Each time we perform a DH operation, we extract its output and mix it into the chaining key using a KDF (Key Derivation Function). The KDF does two things: it “cleans” the imperfect randomness from the DH operation and outputs both an updated chaining key (with more entropy) and a temporary encryption key. By the end of the handshake, the chaining key contains accumulated entropy from multiple sources - ephemeral keys, static keys, and optional pre-shared keys - making our final transport keys cryptographically strong. By strong I mean they have high entropy, which makes it much harder for an adversary to brute-force or exploit patterns in the key material. There’s much more to talk about regarding entropy and its impact on cryptography. I’m not an expert, but this mental model helps me understand how crypto is used in this context.

2 Packets, 4 Computational Phases

While we send only two packets over the wire (I→R and R→I in the happy path), there are actually four computational phases in the handshake.

The Initiator doesn’t just create and send a packet - the Responder must also replay those same computations to verify the packet and reconstruct the matching cryptographic state (Ci and Hi). Same goes for the response: the Responder creates packet 2, then the Initiator validates it by replaying the Responder’s computations.

The flow:

I creates packet 1 (compute Ci, Hi, encrypt, MAC) → send
R validates packet 1 (replay I’s operations, verify state matches)
R creates packet 2 (continue Ci, Hi, encrypt, MAC) → send
I validates packet 2 (replay R’s operations, verify final state)

Both peers must arrive at identical Ci and Hi values. If anything fails (wrong MAC, decryption error, state mismatch) the handshake is rejected and the packet is silently dropped.

Within each of those phases, I like to group the computations so we understand what purpose they serve. When I was studying the paper, I created four different diagrams to help me follow the different computations (see below).

HS part 1: creating the first packet #

Let’s start with the first phase where the Initiator creates the first packet.

GROUP 1: Initialize State

We start by initializing the chaining key and hash to their initial values. The chaining key starts with a hash of the protocol name (“Noise_IKpsk2…”), and the hash starts by mixing in the protocol identifier (“WireGuard v1…”) and then the Responder’s static public key. This binds the handshake to a specific protocol and a specific peer.

GROUP 2: Generate Ephemeral Key & Mix Into State

We generate a fresh ephemeral keypair using our language’s cryptographic random number generator (which taps into the OS’s random subsystem). The ephemeral public key (32 bytes) goes into the packet unencrypted (everyone can see it, and that’s fine because it provides forward secrecy). We mix this ephemeral public key into both our hash and our chaining key using a KDF. Now our state contains this “fresh randomness”.

GROUP 3: First DH + Encrypt Static Key

We perform our first Diffie-Hellman operation using our ephemeral private key and the Responder’s static public key. This creates a shared secret that only we and the Responder can compute. We use a KDF to mix this shared secret into the chaining key, which also gives us a temporary encryption key. We use that temporary key to encrypt our static public key with AEAD (this hides our identity from eavesdroppers). The encrypted result (48 bytes: 32 bytes of encrypted key + 16-byte auth tag) gets added to the packet and mixed into the hash.

GROUP 4: Second DH + Encrypt Timestamp

We perform a second Diffie-Hellman operation, this time using our static private key and the Responder’s static public key. This provides mutual authentication (we’re proving we know the private key for our static public key). Again, we use a KDF to mix this new shared secret into the chaining key and get another fresh temporary encryption key. We grab the current time as a TAI64N timestamp (12 bytes), encrypt it with AEAD for replay protection, and add the result (28 bytes: 12 + 16-byte tag) to the packet. We mix the encrypted timestamp into the hash.

GROUP 5: Build Packet & Compute MACs

Now we assemble the final packet. We generate a random sender index (4 bytes) to identify this session locally. We set the message type to 0x01 (handshake initiation) and add 3 reserved zero bytes for memory alignment.

For MAC1, we compute a MAC using a hash of the Responder’s public key as the MAC key. This lets the Responder quickly verify we know their public key (if we don’t, they can drop the packet immediately without doing any expensive crypto). This protects against port scanning and DoS attacks. Isn’t it brilliant?

MAC2 is set to all zeros because we haven’t received a cookie yet. If the Responder is under heavy load, they’ll send us a cookie, and we’ll need to MAC our next attempt with it.

And just like that, we’ve created our first handshake packet (148 bytes total). Boom!

Computations required to generate the first packet in the handshake.

HS part 2: the responder receives the packet and validates #

The Responder receives the packet and unmarshals it. It confirms the message type is 0x01 (handshake initiation), saves the sender index for later use, and extracts the ephemeral public key. Immediately after that, it validates MAC1 and MAC2. Remember that for the happy path, MAC2 will be all zeros.

GROUP 1: Initialize State (Match Initiator)

We initialize the hash and chaining key to match exactly what the Initiator did. This is critical - both peers must follow the same steps to arrive at the same cryptographic state. We start with the protocol name, mix in the identifier, and then mix in our own static public key (we’re the Responder, so we use our key here).

GROUP 2: First DH + Decrypt Static Key

We perform our first Diffie-Hellman operation using our static private key and the Initiator’s ephemeral public key (the one we just extracted from the packet). This generates the same shared secret the Initiator computed in their GROUP 3. We run a KDF to update the chaining key and generate a temporary decryption key. We use that key with AEAD to decrypt and verify the Initiator’s static public key from the packet (48 bytes: 32 bytes encrypted + 16-byte auth tag). If decryption fails, we silently drop the packet. Finally, we update our hash with the encrypted static key.

GROUP 3: Second DH + Decrypt Timestamp

We perform our second Diffie-Hellman operation using our static private key and the Initiator’s static public key (which we just decrypted). This generates the same shared secret the Initiator computed in their GROUP 4. We run another KDF to update the chaining key and generate another temporary decryption key. We use AEAD to decrypt the timestamp (28 bytes: 12 bytes encrypted + 16-byte auth tag). We then validate the timestamp - it must be newer than any timestamp we’ve previously seen from this peer. If it’s older or equal, we silently drop the packet (replay protection). As always, we update our hash with the encrypted timestamp.

At this point, our chaining key (Ci) and hash (Hi) match exactly what the Initiator has. Both peers are now in sync! If all validations passed, we’re ready to create the response packet.

Responder: computations to validate first handshake packet.

HS part 3: the responder creates the response packet #

Now that the Responder has validated the Initiator’s packet and reconstructed the matching cryptographic state (Ci and Hi), it continues building on that state to create the response packet.

GROUP 1: Generate Ephemeral Key & Mix Into State

Just like the Initiator did, we generate a fresh ephemeral keypair using our cryptographic random number generator. The ephemeral public key (32 bytes) goes into the response packet unencrypted. We mix this ephemeral public key into both our hash and our chaining key.

GROUP 2: Third DH + Mix Pre-Shared Key

We perform our third Diffie-Hellman operation, this time using our ephemeral private key and the Initiator’s ephemeral public key (from their packet). This creates a shared secret that depends on both ephemeral keys. We use a KDF to mix this shared secret into the chaining key. This is the key operation for forward secrecy - even if static keys leak later, this ephemeral-to-ephemeral DH cannot be replayed.

Next, if a pre-shared key (PSK) was configured, we mix it into the chaining key using another KDF. This provides quantum resistance. Even if quantum computers break Diffie-Hellman in the future, the PSK (shared beforehand via a secure out-of-band channel) protects the session. We update the hash to include the PSK as well.

GROUP 3: Derive Transport Keys

Now we have accumulated enough entropy from all three DH operations (and optionally the PSK). We run a final KDF on the chaining key to derive two transport keys:

One for sending (Responder → Initiator)
One for receiving (Initiator → Responder)

These are the keys both peers will use to encrypt data packets after the handshake completes. The responder has generated them. The Initiator will do so once it receives this packet we are generating from the responder.

GROUP 4: Encrypt Empty Payload

To prove we completed all operations correctly and derived the right keys, we encrypt an empty payload (0 bytes) using AEAD with the transport sending key. The result is just a 16-byte authentication tag. This serves as cryptographic proof that we have the correct keys. We mix this encrypted empty payload into the hash.

GROUP 5: Build Response Packet & Compute MACs

We assemble the response packet. We generate a random sender index (4 bytes) for our side of the session and include the Initiator’s sender index as the receiver index (4 bytes) to confirm which session this response is for. We set the message type to 0x02 (handshake response) and add 3 reserved zero bytes for memory alignment.

For MAC1, we compute a MAC using a hash of the Initiator’s public key (which we decrypted earlier) as the MAC key. MAC2 is set to zeros (unless we’re responding with a cookie, which is outside the happy path).

And that’s it! We’ve created the response packet (92 bytes total). We can now send it to the Initiator.

You won’t find any protocol getting so much out of only 92 bytes.

Responder: computations to generate the response packet.

HS part 4: the initiator validates the response packet #

The Initiator receives the response packet from the Responder. It unmarshals the packet, confirms the message type is 0x02 (handshake response), verifies the receiver index matches the sender index from our original packet, and extracts the Responder’s ephemeral public key. It validates MAC1 and MAC2, then continues the cryptographic computations where it left off.

GROUP 1: Mix Responder’s Ephemeral Key

We take the Responder’s ephemeral public key (32 bytes) from the packet and mix it into our hash and chaining key. At this point, our state should match exactly what the Responder has after their GROUP 1.

GROUP 2: Third DH + Mix Pre-Shared Key (Match Responder)

We perform the same third Diffie-Hellman operation the Responder did - using our ephemeral private key and the Responder’s ephemeral public key (which we just extracted). This generates the same shared secret the Responder computed. We run a KDF to mix this into the chaining key.

If a pre-shared key was configured, we mix it into the chaining key exactly as the Responder did. We update the hash with the PSK. Our state now matches the Responder’s after their GROUP 2.

GROUP 3: Derive Transport Keys (Match Responder)

We run the same final KDF on the chaining key to derive the two transport keys. However, notice the keys are swapped from our perspective:

Their “sending” key is our “receiving” key
Their “receiving” key is our “sending” key

Both peers now have identical key material, just labeled differently based on their role.

GROUP 4: Decrypt and Verify Empty Payload

We use AEAD to decrypt the empty payload (16-byte auth tag) from the response packet using the transport receiving key. If decryption fails, we silently drop the packet - it means the Responder didn’t derive the correct keys, so the handshake failed. If it succeeds, this is cryptographic proof that both peers completed all operations correctly and possess the same keys. We mix the encrypted empty payload into our hash.

At this point, our hash (Hi) and chaining key (Ci) match exactly what the Responder has. The handshake is complete! Both peers now have:

Identical transport keys for sending/receiving data packets
Mutual authentication (both proved their identities through static-to-static DH)
Forward secrecy (ephemeral-to-ephemeral DH protects future key compromises)
Replay protection (timestamps prevent old packets)

We can now start encrypting and sending data packets.

Initiator: computations to validate the responder packet and finish the handshake.

Transport Data Packets #

After the handshake completes, peers exchange encrypted data packets (message type 0x04). These packets carry the actual payload - your TCP/UDP traffic that’s being tunneled through the VPN.

Transport Packet Structure #

Each data packet contains:

0x04 (1 byte): Message type indicating this is a transport data packet
Reserved (3 bytes): Set to zeros for memory alignment
Receiver index (4 bytes): The recipient’s session index (from their handshake packet). This allows the receiver to quickly look up which session keys to use for decryption in constant time
Counter (8 bytes): A nonce that increments for each packet (0, 1, 2, 3…). This ensures each packet is encrypted differently even if the payload is identical, and provides replay protection - the receiver rejects packets with counters it has already seen (using a sliding window as described in section 5.4.6 of the paper)
Encrypted payload (variable): The actual data (IP packet) encrypted using ChaCha20-Poly1305 with the transport keys derived during the handshake. Includes a 16-byte authentication tag to verify integrity

With the receiver index we can very quickly (O(1)) identify a session, while the counter provides replay protection by rejecting previously seen packet numbers.

Every packet is authenticated through the Poly1305 MAC to ensure it hasn’t been tampered with, and forward secrecy is maintained since the transport keys were derived from ephemeral keys during the handshake.

Managing state #

In section 6 of the WireGuard paper, Jason (notice how casually I reference the author by his first name) covers timers and the stateless nature of WireGuard. Stateless here means that from the user’s perspective, there is nothing they have to manage once the WireGuard interface is enabled and active. Behind the scenes, WireGuard manages connections, reconnections, rekeying, etc.

To achieve that, WireGuard keeps internal state: sessions between peers, timers (to know when to rekey and start new sessions), and counters. Users do not interact with that at all.

In the paper, we can see some defined constants (section 6.1) used when implementing the state machine that handles state management logic.

Constant	Value	Purpose
Rekey-After-Messages	2^60	Soft limit - start rekeying after this many packets
Reject-After-Messages	2^64 - 2^13 - 1	Hard limit - stop all traffic
Rekey-After-Time	120 seconds	Soft limit - opportunistic rekey
Reject-After-Time	180 seconds	Hard limit - session expires
Rekey-Attempt-Time	90 seconds	How long to retry handshakes
Rekey-Timeout	5 seconds	Wait between handshake retries
Keepalive-Timeout	10 seconds	Send keepalive if idle

Notice that WireGuard never sends handshake initiations more than once every 5 seconds.

When to rekey (transport message limits) #

This section of the paper (section 6.2) captures when WireGuard triggers rekeying (meaning creating a new session).

WireGuard defines a set of triggers that, when activated, cause the system to perform rekeying. These are soft triggers (called “opportunistic” in the paper). What that means is that we are flexible about when we can start a new session; it doesn’t have to happen immediately.

WireGuard tries to create a new session when:

1. Packet-based trigger:

After sending 2^60 transport data messages (called Rekey-After-Messages in the paper). This is a very high number: 1,152,921,504,606,846,976 packets to be precise. We should never hit that trigger in practice.

2. Time-based trigger (initiator only):

After 120s (Rekey-After-Time), when sending a packet
After 165s (Reject-After-Time - Keepalive-Timeout - Rekey-Timeout), when receiving a packet (safety net if only receiving)

No timer fires at exactly 120s. Instead, WireGuard waits until you naturally send a packet, then checks: “Has it been ≥120s?” If yes, it starts the handshake. It could happen at 120.5s, 145s, whenever the next packet goes out. There is no need to rekey if we’re not sending any data. This makes WireGuard efficient and silent when idle.

Notice that only the initiator of the current session does time-based rekeying. The reason is to avoid both peers rekeying simultaneously. That would create “collisions” and make WireGuard less efficient (the paper uses the term thundering herd problem).

There is a final special case when the initiator is only receiving packets, not sending. In that case, at 165s (180 - 10 - 5) it will trigger a handshake. The reason is to make sure we don’t enter a situation where packets arrive for a session that has expired.

How WireGuard manages multiple concurrent sessions during transitions #

A new handshake starts around 120s, but packets encrypted with old keys might still be in-flight. The Responder can’t send until receiving the first packet from the Initiator. We need a way to transition to new sessions without dropping packets.

The solution is to have three session “slots” that WireGuard keeps in memory (described in section 6.3 of the paper):

1. Previous session:

Old keys, still accepting packets encrypted with them
Valid for up to 180s (Reject-After-Time) after creation
Then discarded and memory is zeroed (see section 7.4 of the paper)

2. Current session:

Active session, encrypting/decrypting packets
Ages from 0s → 180s

3. Next session (responder only):

Temporary slot for unconfirmed handshake
Moves to “current” after first data packet is received

Notice that the “next” session slot is only used by the responder. The reason is that the responder can only send a packet with the new keys after it has received a packet from the initiator with the new keys (to confirm the handshake, as described in section 5 of the paper). Until then, that session stays in the “next” slot.

Final thoughts #

Hopefully you have found this useful and it can help you read the paper and understand better how WireGuard works. It took me a lot of time to put all the pieces together and there are still some aspects I’m working through.

Please, reach out if you have questions or suggestions on how to improve this document. Or just drop me a message if you found it useful. I am planning on releasing more posts where we will focus on the actual implementation. Hearing what you think will help me write those future posts.

drio out!

WireGuard: A Treasure Trove for Learning Cryptography and Networking