I wanted to know how WireGuard works internally but I never thought reading the code and understanding the internals was going to be such a fun and rewarding experience. I used the Go implementation and also the WireGuard paper as references (more the paper initially). The most interesting aspect to me was how this piece of software touches on many networking and cryptographic concepts that present a great opportunity for learning. During the process, I decided to write my own version of WireGuard (miniwg). It is a purely didactic implementation and despite being fully functional, it is not production ready.
I have also been doing some exploratory work to understand how you can run WireGuard in a userspace TCP/IP networking stack. Spanza is a POC on how to use DERP servers to overcome the WireGuard requirement of having a direct connection between peers. You have a process that runs alongside your WireGuard, and you point the WireGuard endpoint of the remote peer to that local process. Spanza will forward the traffic over a DERP server, and the DERP server will deliver the traffic to the final peer. Your traffic will follow an extra jump (the DERP server) which will have a performance penalty, but you’ll have the benefit of always being able to maintain connectivity between those two peers - regardless of networking restrictions.
As long as outbound 443/tcp traffic is allowed, you’ll always be able to connect. Also note that if you run WireGuard in a userspace network stack, you can link the Spanza logic into your process and have access to that functionality.
But today I wanted to capture the knowledge I’ve gained by reading the WireGuard paper while trying to understand how it works. This post captures my understanding of how WireGuard works and I thought it may be useful for others who are interested in studying this wonderful piece of software.
If you see mistakes or have comments on how to improve this post, please, reach out and let me know.
Should we begin?
The wireguard logo. Do you know the story behind it?
WireGuard has been a piece of technology that I have always admired for its simplicity and the immensely useful functionality it provides. It transformed a corner of networking that desperately needed new ideas: VPNs. IPsec and OpenVPN are great technologies but they are difficult to master and to deploy properly. Too many options and configuration knobs can translate into security problems. Jason A. Donenfeld, the author of WireGuard, came along and released (2016) a piece of software that followed the Unix philosophy: do one thing and do it well. I would argue that WireGuard is to networking and VPNs what Linux was to operating systems or Git was to version control systems.
I remember tinkering with it when it came out and thinking, “OK, I can set this up and be reasonably confident I’m not doing something wrong that may translate into security issues down the road.” I also remember how it delivered consistently great latency and throughput. As WireGuard became more and more popular, it eventually moved from a standalone Linux kernel module to being part of the mainline Linux kernel (2019). Here is a quote from the always eloquent Linus Torvalds about WireGuard:
Can I just once again state my love for [WireGuard] and hope it gets merged soon? Maybe the code isn't perfect, but I've skimmed it, and compared to the horrors that are OpenVPN and IPSec, it's a work of art.
And my favorite networking product (Tailscale) uses WireGuard for the data plane (the component that pushes data once a connection is created between two peers). Another reason for wanting to learn more about its internals.
Someone once told me they didn’t care that Tailscale used WireGuard for the data plane - Tailscale could have just implemented their own, right? After all, WireGuard follows the Noise Protocol Framework for the cryptographic handshake (more on that later). I disagree. Writing my toy version and reading the paper taught me just how many things can go wrong. WireGuard was written by Jason A. Donenfeld, someone with extensive cryptography and security experience, and has even been formally verified by the research community - proving it’s mathematically sound.
Let’s look at how WireGuard works at a very high level. Here are a couple of drawings that capture the flow of data between a peer (process) connecting to another peer. I have created two separate drawings for inbound and outbound traffic.
Note: The diagrams below show a “standard” kernel-mode WireGuard setup. There are also userspace implementations like wireguard-go and boringtun which work differently, but the core protocol concepts remain the same.
Wireguard components: outbound traffic.
Everything starts with a process making an operating system syscall to open a socket. In Unix, the process will get a file descriptor that it can use to read and write data. When creating a socket, you specify different things, among them the IP address of the machine/process you want to send the data to.
The call enters the kernel and traverses the TCP/IP stack. Along the way, headers are added. When we reach the IP layer, the kernel sees the device that handles traffic for that destination address is a virtual interface (TUN device). A virtual interface looks like a hardware network interface but packets do not flow down the lower/hardware levels. Instead, the kernel delivers them to the process (or kernel module) that created that device. In this case, our WireGuard process.
Wireguard components: inbound traffic.
The WireGuard process examines the IP headers and searches for a peer in its cryptokey routing table that matches that IP address (or network address). If it finds one, it encrypts the inner IP packet using ChaCha20-Poly1305 and sends the encrypted blob over the UDP connection that it has open to the remote peer.
Packets coming from the other peer follow the opposite direction. The NIC delivers the packets to the TCP/IP stack, the kernel removes the headers and delivers the packet to the process that is at the end of that UDP connection (our WireGuard process). There, WireGuard authenticates and decrypts the payload. Now we have a plaintext IP packet that can be injected into the TUN device. The packet traverses the TCP/IP stack, headers are removed, and the payload is delivered to the process at the other end of the connection.
And that’s it! A high-level view of how WireGuard works. Armed with this knowledge, we can now iterate and add more details and nuances. Let’s start with the cryptographic handshake that every peer performs to create encrypted, authenticated connections. I have to tell you, this is the part I spent the most time on. I wasn’t very familiar with the cryptographic concepts that WireGuard uses. But it was incredibly rewarding when things clicked. This cryptographic choreography we are going to describe here is very powerful and is what makes WireGuard secure.
We want to create a secure connection between two peers. But what does that mean? Here are some things we’d like to achieve:
Achieving that level of security on your own is hard, and you can easily make mistakes while building a protocol to achieve those goals. Jason A. Donenfeld, the author of WireGuard, used proven cryptographic protocols to achieve those goals. He specifically chose the Noise Protocol Framework, and more specifically the IKpsk2 Noise pattern. I means the Initiator’s static key is sent (encrypted) during the handshake, and K means the Responder’s static key is Known beforehand (pre-shared in the configuration). The psk2 part is for additional security to make the protocol quantum-resistant.
Now, we need to learn a few cryptographic primitives and concepts to understand the WireGuard handshake. Think about these as cryptographic building blocks that we put together to achieve the goals described above.
Symmetric Key Cryptography: Systems using this approach use the same key for encryption and decryption. It’s fast but requires the key to be securely shared beforehand. AES and ChaCha20 are concrete examples of this.
Asymmetric Key (Public Key Cryptography): Systems using this approach have a key pair: public key (shareable) and private key (secret). WireGuard uses Curve25519 for its elliptic curve operations.
Hash Function: A one-way function that produces fixed-size output (digest) from any input. It has very important properties: deterministic, irreversible, and collision-resistant. WireGuard uses BLAKE2s.
MAC (Message Authentication Code): A tag that proves message authenticity and integrity using a shared secret. A concrete implementation is HMAC. If I use HMAC(text), I can verify it came from a specific person and the text has not been tampered with. Note that there is no encryption, only authentication. WireGuard uses HMAC.
AEAD (Authenticated Encryption with Associated Data): Combines encryption + authentication in one operation. WireGuard uses ChaCha20-Poly1305. Gotta love the name.
KDF (Key Derivation Function): When we build our crypto protocol - our crypto recipe - we are going to be creating cryptographic material (our main ingredient), those are basically bits that have a high level of entropy. Entropy is a measure of randomness or unpredictability in data. It’s not just about the length in bits, but how random those bits are. The password “1234” has very low entropy (predictable), while a key like 0x7a3f2e1d9c8b5a4f3e2d1c0b9a8f7e6d has high entropy (unpredictable, random).
We measure entropy in bits: a 256-bit key has up to 256 bits of entropy if all bits are truly random, but if the bits follow a pattern (like all zeros), the entropy is much lower. In cryptography, we typically want 128 bits or more of entropy for strong security.
If entropy is weak, an attacker can perform a brute force attack by trying all possible values. For example, if a key only has 40 bits of entropy (even if it’s 256 bits long), an attacker only needs to try 2^40 (~1 trillion) possibilities instead of 2^256, making it feasible to crack with modern computers. This is why truly random, high-entropy keys are critical.
We apply a KDF to cryptographic material to extract and concentrate entropy so we can use it as a key. Why not just use the cryptographic material directly, you may ask? The raw output from operations like Diffie-Hellman (see below) may have patterns or biases - it’s not uniformly random. A KDF takes this imperfect input and produces clean, uniformly random output that looks indistinguishable from true randomness. It also ensures we get exactly the key length we need. WireGuard uses HKDF. We will talk more about it as we go over the exact handshake operations.
DH (Diffie-Hellman Key Exchange): A protocol for two parties to agree on a shared secret over a public channel. Alice and Bob derive the same secret without ever sending it. They use the other party’s public key and their own private key to derive that secret value. Mind-blowing. No wonder Diffie and Hellman won the Turing Award for their work on public-key cryptography.
Nonce (Number Used Once): A value that must never repeat with the same key. Ensures uniqueness. Example: A counter incremented for each message: 0, 1, 2, 3… This nonce adds critical uniqueness to make each cryptographic operation produce different output, even when the inputs (key + message) are the same.
Let’s look at the actual data WireGuard sends over the wire during the handshake. Most of the time, with only two packets and 1-RTT (one round-trip time), both peers get what they need to create a secure connection. Other protocols require more packets, making them slower in this regard. WireGuard is a fast protocol, and this is one of the reasons why.
Don’t worry if not everything makes sense. How and why we use these bytes in the packet will become more clear as we study the computations required to implement the Noise protocol selected by WireGuard.
The Initiator creates a packet that contains the following:
The Responder validates packet 1, completes the handshake, and sends back a response containing:
After the handshake (the initiator receives the packet from the responder) we have:
The handshake is complete! Both peers can now start encrypting and sending data packets using the derived transport keys.
This is for the “happy path,” so we have 1-RTT (Round Trip Time). But if the Responder is under heavy load, we would have 3 packets (the additional packet being the cookie reply message described in section 5.4.7 of the paper). We will focus on the happy path for our implementation.
But how does WG generate the values necessary to include in the packets that are sent over the wire? Following the “recipe” from the Noise Protocol Framework. Let’s start with the first packet the initiator sends.
We need two main components for our handshake state: the hash and the chaining key (Hi and Ci respectively in the paper).
The hash serves two purposes. First, it keeps a running transcript of all operations we perform during the handshake, allowing both peers to verify they followed the same steps. Second, we use it as “additional authenticated data” in AEAD operations, which binds each encryption to the specific handshake context so attackers cannot replay encrypted data in a different context.
The chaining key accumulates cryptographic material (entropy) from which we derive encryption keys. When I first read about “accumulating crypto material,” I didn’t fully understand it. But here’s what’s happening: we’re continuously mixing in new sources of randomness (like Diffie-Hellman results) to increase the entropy pool. Each time we perform a DH operation, we extract its output and mix it into the chaining key using a KDF (Key Derivation Function). The KDF does two things: it “cleans” the imperfect randomness from the DH operation and outputs both an updated chaining key (with more entropy) and a temporary encryption key. By the end of the handshake, the chaining key contains accumulated entropy from multiple sources - ephemeral keys, static keys, and optional pre-shared keys - making our final transport keys cryptographically strong. By strong I mean they have high entropy, which makes it much harder for an adversary to brute-force or exploit patterns in the key material. There’s much more to talk about regarding entropy and its impact on cryptography. I’m not an expert, but this mental model helps me understand how crypto is used in this context.
Within each of those phases, I like to group the computations so we understand what purpose they serve. When I was studying the paper, I created four different diagrams to help me follow the different computations (see below).
Let’s start with the first phase where the Initiator creates the first packet.
GROUP 1: Initialize State
We start by initializing the chaining key and hash to their initial values. The chaining key starts with a hash of the protocol name (“Noise_IKpsk2…”), and the hash starts by mixing in the protocol identifier (“WireGuard v1…”) and then the Responder’s static public key. This binds the handshake to a specific protocol and a specific peer.
GROUP 2: Generate Ephemeral Key & Mix Into State
We generate a fresh ephemeral keypair using our language’s cryptographic random number generator (which taps into the OS’s random subsystem). The ephemeral public key (32 bytes) goes into the packet unencrypted (everyone can see it, and that’s fine because it provides forward secrecy). We mix this ephemeral public key into both our hash and our chaining key using a KDF. Now our state contains this “fresh randomness”.
GROUP 3: First DH + Encrypt Static Key
We perform our first Diffie-Hellman operation using our ephemeral private key and the Responder’s static public key. This creates a shared secret that only we and the Responder can compute. We use a KDF to mix this shared secret into the chaining key, which also gives us a temporary encryption key. We use that temporary key to encrypt our static public key with AEAD (this hides our identity from eavesdroppers). The encrypted result (48 bytes: 32 bytes of encrypted key + 16-byte auth tag) gets added to the packet and mixed into the hash.
GROUP 4: Second DH + Encrypt Timestamp
We perform a second Diffie-Hellman operation, this time using our static private key and the Responder’s static public key. This provides mutual authentication (we’re proving we know the private key for our static public key). Again, we use a KDF to mix this new shared secret into the chaining key and get another fresh temporary encryption key. We grab the current time as a TAI64N timestamp (12 bytes), encrypt it with AEAD for replay protection, and add the result (28 bytes: 12 + 16-byte tag) to the packet. We mix the encrypted timestamp into the hash.
GROUP 5: Build Packet & Compute MACs
Now we assemble the final packet. We generate a random sender index (4 bytes) to identify this session locally. We set the message type to 0x01 (handshake initiation) and add 3 reserved zero bytes for memory alignment.
For MAC1, we compute a MAC using a hash of the Responder’s public key as the MAC key. This lets the Responder quickly verify we know their public key (if we don’t, they can drop the packet immediately without doing any expensive crypto). This protects against port scanning and DoS attacks. Isn’t it brilliant?
MAC2 is set to all zeros because we haven’t received a cookie yet. If the Responder is under heavy load, they’ll send us a cookie, and we’ll need to MAC our next attempt with it.
And just like that, we’ve created our first handshake packet (148 bytes total). Boom!
Computations required to generate the first packet in the handshake.
The Responder receives the packet and unmarshals it. It confirms the message type is 0x01 (handshake initiation), saves the sender index for later use, and extracts the ephemeral public key. Immediately after that, it validates MAC1 and MAC2. Remember that for the happy path, MAC2 will be all zeros.
GROUP 1: Initialize State (Match Initiator)
We initialize the hash and chaining key to match exactly what the Initiator did. This is critical - both peers must follow the same steps to arrive at the same cryptographic state. We start with the protocol name, mix in the identifier, and then mix in our own static public key (we’re the Responder, so we use our key here).
GROUP 2: First DH + Decrypt Static Key
We perform our first Diffie-Hellman operation using our static private key and the Initiator’s ephemeral public key (the one we just extracted from the packet). This generates the same shared secret the Initiator computed in their GROUP 3. We run a KDF to update the chaining key and generate a temporary decryption key. We use that key with AEAD to decrypt and verify the Initiator’s static public key from the packet (48 bytes: 32 bytes encrypted + 16-byte auth tag). If decryption fails, we silently drop the packet. Finally, we update our hash with the encrypted static key.
GROUP 3: Second DH + Decrypt Timestamp
We perform our second Diffie-Hellman operation using our static private key and the Initiator’s static public key (which we just decrypted). This generates the same shared secret the Initiator computed in their GROUP 4. We run another KDF to update the chaining key and generate another temporary decryption key. We use AEAD to decrypt the timestamp (28 bytes: 12 bytes encrypted + 16-byte auth tag). We then validate the timestamp - it must be newer than any timestamp we’ve previously seen from this peer. If it’s older or equal, we silently drop the packet (replay protection). As always, we update our hash with the encrypted timestamp.
At this point, our chaining key (Ci) and hash (Hi) match exactly what the Initiator has. Both peers are now in sync! If all validations passed, we’re ready to create the response packet.
Responder: computations to validate first handshake packet.
Now that the Responder has validated the Initiator’s packet and reconstructed the matching cryptographic state (Ci and Hi), it continues building on that state to create the response packet.
GROUP 1: Generate Ephemeral Key & Mix Into State
Just like the Initiator did, we generate a fresh ephemeral keypair using our cryptographic random number generator. The ephemeral public key (32 bytes) goes into the response packet unencrypted. We mix this ephemeral public key into both our hash and our chaining key.
GROUP 2: Third DH + Mix Pre-Shared Key
We perform our third Diffie-Hellman operation, this time using our ephemeral private key and the Initiator’s ephemeral public key (from their packet). This creates a shared secret that depends on both ephemeral keys. We use a KDF to mix this shared secret into the chaining key. This is the key operation for forward secrecy - even if static keys leak later, this ephemeral-to-ephemeral DH cannot be replayed.
Next, if a pre-shared key (PSK) was configured, we mix it into the chaining key using another KDF. This provides quantum resistance. Even if quantum computers break Diffie-Hellman in the future, the PSK (shared beforehand via a secure out-of-band channel) protects the session. We update the hash to include the PSK as well.
GROUP 3: Derive Transport Keys
Now we have accumulated enough entropy from all three DH operations (and optionally the PSK). We run a final KDF on the chaining key to derive two transport keys:
These are the keys both peers will use to encrypt data packets after the handshake completes. The responder has generated them. The Initiator will do so once it receives this packet we are generating from the responder.
GROUP 4: Encrypt Empty Payload
To prove we completed all operations correctly and derived the right keys, we encrypt an empty payload (0 bytes) using AEAD with the transport sending key. The result is just a 16-byte authentication tag. This serves as cryptographic proof that we have the correct keys. We mix this encrypted empty payload into the hash.
GROUP 5: Build Response Packet & Compute MACs
We assemble the response packet. We generate a random sender index (4 bytes) for our side of the session and include the Initiator’s sender index as the receiver index (4 bytes) to confirm which session this response is for. We set the message type to 0x02 (handshake response) and add 3 reserved zero bytes for memory alignment.
For MAC1, we compute a MAC using a hash of the Initiator’s public key (which we decrypted earlier) as the MAC key. MAC2 is set to zeros (unless we’re responding with a cookie, which is outside the happy path).
And that’s it! We’ve created the response packet (92 bytes total). We can now send it to the Initiator.
You won’t find any protocol getting so much out of only 92 bytes.
Responder: computations to generate the response packet.
The Initiator receives the response packet from the Responder. It unmarshals the packet, confirms the message type is 0x02 (handshake response), verifies the receiver index matches the sender index from our original packet, and extracts the Responder’s ephemeral public key. It validates MAC1 and MAC2, then continues the cryptographic computations where it left off.
GROUP 1: Mix Responder’s Ephemeral Key
We take the Responder’s ephemeral public key (32 bytes) from the packet and mix it into our hash and chaining key. At this point, our state should match exactly what the Responder has after their GROUP 1.
GROUP 2: Third DH + Mix Pre-Shared Key (Match Responder)
We perform the same third Diffie-Hellman operation the Responder did - using our ephemeral private key and the Responder’s ephemeral public key (which we just extracted). This generates the same shared secret the Responder computed. We run a KDF to mix this into the chaining key.
If a pre-shared key was configured, we mix it into the chaining key exactly as the Responder did. We update the hash with the PSK. Our state now matches the Responder’s after their GROUP 2.
GROUP 3: Derive Transport Keys (Match Responder)
We run the same final KDF on the chaining key to derive the two transport keys. However, notice the keys are swapped from our perspective:
Both peers now have identical key material, just labeled differently based on their role.
GROUP 4: Decrypt and Verify Empty Payload
We use AEAD to decrypt the empty payload (16-byte auth tag) from the response packet using the transport receiving key. If decryption fails, we silently drop the packet - it means the Responder didn’t derive the correct keys, so the handshake failed. If it succeeds, this is cryptographic proof that both peers completed all operations correctly and possess the same keys. We mix the encrypted empty payload into our hash.
At this point, our hash (Hi) and chaining key (Ci) match exactly what the Responder has. The handshake is complete! Both peers now have:
We can now start encrypting and sending data packets.
Initiator: computations to validate the responder packet and finish the handshake.
After the handshake completes, peers exchange encrypted data packets (message type 0x04). These packets carry the actual payload - your TCP/UDP traffic that’s being tunneled through the VPN.
Each data packet contains:
With the receiver index we can very quickly (O(1)) identify a session, while the counter provides replay protection by rejecting previously seen packet numbers.
Every packet is authenticated through the Poly1305 MAC to ensure it hasn’t been tampered with, and forward secrecy is maintained since the transport keys were derived from ephemeral keys during the handshake.
In section 6 of the WireGuard paper, Jason (notice how casually I reference the author by his first name) covers timers and the stateless nature of WireGuard. Stateless here means that from the user’s perspective, there is nothing they have to manage once the WireGuard interface is enabled and active. Behind the scenes, WireGuard manages connections, reconnections, rekeying, etc.
To achieve that, WireGuard keeps internal state: sessions between peers, timers (to know when to rekey and start new sessions), and counters. Users do not interact with that at all.
In the paper, we can see some defined constants (section 6.1) used when implementing the state machine that handles state management logic.
| Constant | Value | Purpose |
|---|---|---|
| Rekey-After-Messages | 2^60 | Soft limit - start rekeying after this many packets |
| Reject-After-Messages | 2^64 - 2^13 - 1 | Hard limit - stop all traffic |
| Rekey-After-Time | 120 seconds | Soft limit - opportunistic rekey |
| Reject-After-Time | 180 seconds | Hard limit - session expires |
| Rekey-Attempt-Time | 90 seconds | How long to retry handshakes |
| Rekey-Timeout | 5 seconds | Wait between handshake retries |
| Keepalive-Timeout | 10 seconds | Send keepalive if idle |
This section of the paper (section 6.2) captures when WireGuard triggers rekeying (meaning creating a new session).
WireGuard defines a set of triggers that, when activated, cause the system to perform rekeying. These are soft triggers (called “opportunistic” in the paper). What that means is that we are flexible about when we can start a new session; it doesn’t have to happen immediately.
WireGuard tries to create a new session when:
1. Packet-based trigger:
Rekey-After-Messages in the paper).
This is a very high number: 1,152,921,504,606,846,976 packets to be precise. We should never hit that trigger in practice.2. Time-based trigger (initiator only):
Rekey-After-Time), when sending a packetReject-After-Time - Keepalive-Timeout - Rekey-Timeout), when receiving a packet (safety net if only receiving)No timer fires at exactly 120s. Instead, WireGuard waits until you naturally send a packet, then checks: “Has it been ≥120s?” If yes, it starts the handshake. It could happen at 120.5s, 145s, whenever the next packet goes out. There is no need to rekey if we’re not sending any data. This makes WireGuard efficient and silent when idle.
Notice that only the initiator of the current session does time-based rekeying. The reason is to avoid both peers rekeying simultaneously. That would create “collisions” and make WireGuard less efficient (the paper uses the term thundering herd problem).
There is a final special case when the initiator is only receiving packets, not sending. In that case, at 165s (180 - 10 - 5) it will trigger a handshake. The reason is to make sure we don’t enter a situation where packets arrive for a session that has expired.
A new handshake starts around 120s, but packets encrypted with old keys might still be in-flight. The Responder can’t send until receiving the first packet from the Initiator. We need a way to transition to new sessions without dropping packets.
The solution is to have three session “slots” that WireGuard keeps in memory (described in section 6.3 of the paper):
1. Previous session:
Reject-After-Time) after creation2. Current session:
3. Next session (responder only):
Notice that the “next” session slot is only used by the responder. The reason is that the responder can only send a packet with the new keys after it has received a packet from the initiator with the new keys (to confirm the handshake, as described in section 5 of the paper). Until then, that session stays in the “next” slot.
Hopefully you have found this useful and it can help you read the paper and understand better how WireGuard works. It took me a lot of time to put all the pieces together and there are still some aspects I’m working through.
Please, reach out if you have questions or suggestions on how to improve this document. Or just drop me a message if you found it useful. I am planning on releasing more posts where we will focus on the actual implementation. Hearing what you think will help me write those future posts.
drio out!