CPA attack on an AES Key Unwrap implementation (cont'd)

1 – Introduction

In the previous article (here), we demonstrated how the Key Encryption Key (KEK) used in an AES Key Unwrap operation can be recovered.

We performed a side-channel attack using Correlation Power Analysis (CPA) on electromagnetic (EM) traces.

However, the attack was somewhat simplified, as we had access to a trigger signal indicating the start of the AES operation, and we also knew the AES key in use.

Knowing the key allowed us to evaluate the efficiency of the attack by computing key rankings.

In this new article, we’ll take on a more realistic challenge: attacking a black-box device with no embedded trigger and no knowledge of the encryption key.

The device will not be opened; only a small hole was made to insert the EM probe and place it near the microcontroller performing the decryption.

2 – Synchronization

Since we don’t have an internal trigger, we need a way to synchronize the EM captures with the Key Unwrap operation. To do this, we’ll leverage the known system behavior to identify an optimal synchronization point.

Our interface with the system is a serial port (or something similar). We can send wrapped data, encrypted with the KEK to the device. The response tells us if the data were successfully wrapped. We can use specific data patterns within the communication stream as a makeshift trigger.

In the traces below, you can see:

In blue, the serial frames transmitted to (the longer one) and received from (the shorter one) the device,
In purple, the pulses that indicate the start of the Key Unwrap operation (only present in the patched firmware, on the known device),
In orange the EM captures by the probe.

What we observe is that the Key Unwrap operation occurs very shortly before the device begins its response. Therefore, we can use the start of the response as our trigger point.

To implement this, I connected a Raspberry Pi Pico to my ad-hoc serial interface and programmed it to detect the beginning of the response. Upon detection, the Pico outputs a rising edge on one of its GPIO pins, which serves as our external trigger signal (visible as the green trace).

Now, let’s zoom in and enable the persistence mode on the oscilloscope. We’re triggering on the rising edge of the green trace:

We can observe some jitter, but most of the Key Unwrap operations consistently begin around 9.6 ms before the trigger. This offset will serve as our reference point.

By capturing a 200 µs window, we can ensure that the beginning of the Key Unwrap operation is included.

We intentionally avoid capturing too much data, as doing so would reduce analysis performance, and slow down processing.

3 – Traces alignment

This time, trace alignment is more challenging. Due to the relatively high jitter, the traces are significantly less aligned compared to our previous experiment, where we had the advantage of an internal trigger.

Using the internal trigger, on the know equipment, we can identify the patterns [1, 2] just before the first AES [3] (this is captured with the 20MHz bandwidth limitation) :

Objet de groupe 1

This will help the identification and the alignment of the traces captured on the new equipment.

Let’s see what our captures look like. This is the first 3 frames:

Objet de groupe 2

The known patterns are barely recognizable due to the high noise level — a result of capturing with the full bandwidth.

To facilitate shape identification and improve alignment, we apply a 20 MHz low-pass filter to the captured traces. These filtered traces are used only to determine the alignment offset, which is then applied to the original full-bandwidth traces.

Objet de groupe 3

We can now easily identify the patterns. The first AES is clearly visible (3). Let’s zoom in:

We use this filtered segment as a reference for the cross-correlation process used to align the traces.

At the same time, we crop each trace to this region to retain only the portion corresponding to the AES operation.

This is the result of 5 aligned traces (still filtered):

During the alignment process, I also applied masks to define which traces are considered valid. In this example, the masks are represented by red boxes.

Any trace that falls outside the valid region — like the orange one — is excluded from the final dataset.

Objet de groupe 4

Here is the final result — a portion of the first round of the AES operation, shown using the unfiltered traces:

The alignment looks solid, so we can now proceed with the analysis.

4 – Analysis

4.1 – Finding leakage

As in the previous article, we want to check whether we can detect leakage of known information — in this case, the input to the AES algorithm.

This is the result of the CPA performed on 15000 traces:

It gives us two key pieces of information:

We can confirm the presence of leakage,
It verifies the starting point of the AES operation.

We can know proceed with the CPA on the full dataset, using the ciphertexts and the AES AddRoundKey step as the leakage model.

4.2 – CPA analysis

I ran the CPA attack using the same methodology as in the previous experiment, this time with 90,000 traces.

As expected, the correct key byte candidates appear as prominent peaks in the CPA results.

This is for example, the first and the second bytes:

After some time, we obtain candidate values for all key bytes. In theory, the top-ranked candidate for each byte should form the correct key.

However, as we observed in the previous article, this isn’t always the case. Noise, misalignment, or other imperfections can cause the correct value to rank slightly lower.

To address this, we take the top few candidates for each byte and perform a brute-force search over the resulting key space.

In this particular case, we can’t allow too many key candidates per byte. In fact, we’re limited to trying one key every 120 ms, which makes exhaustive brute-force attempts expensive in time.

Let’s zoom in on the first trace to see if we can reduce the number of candidates:

The value 0x50 seems to be a very strong candidate. To be safe, we’ll also include 0x52 in our shortlist.

Now, let’s take a look at the second byte:

This time 0x12 and 0x11 appear to be strong candidates.

Let’s examine one last byte before I reveal which values were actually correct. This is byte 13:

In this case, you might choose from 0x64, 0x67, 0x65, or 0x66.

The correct key bytes for these examples were:

0x50 (ranked 1),
0x13 (ranked 7),
0x66 (ranked 4).

Clearly, we were far from selecting the correct values on the first attempt.

To improve the attack and reduce the number of key candidates, I reused the hardware with the known key. I placed the EM probe in exactly the same position as on the target device, and performed the same CPA attack.

The results provide a visual reference template we can rely on to guide the selection of the correct key candidates.

This approach is essentially a manual version of a template attack.

4.3 – CPA analysis, usage of a template

The top trace shows the CPA results for the known key, where the correct byte 0xBD is ranked 3.

Objet de groupe 5

In the bottom traces, we can identify a trace that matches the pattern observed in the top trace. Based on this, we identify 0x13 as a strong candidate.

It turns out that 0x13 is indeed the correct value.

4 – Brute forcing

Once we’ve narrowed down the key byte candidates, we can begin the brute-force phase. Since we need to send a wrapped key to the device, the process involves several steps:

Generate all the possible combinations from the candidates bytes,
These candidates represent AES round 10 keys (since this is a decryption process, we’re starting from the last round key). We need to reverse the key schedule to recover all possible AES keys,
With these AES keys, generate all possible wrapped versions of the payload we want to send to the device.

With a correct choice of candidates, the attack is very fast, only one byte was wrongly guessed:

5 – Conclusion

I’ve successfully completed a side-channel attack!

Overall it took:

A day to capture all the traces (very slow serial interface),
A day to align the traces (very slow algorithm),
Several hours to select candidates and perform the brute-force attack—without success,
A day to capture all the traces of the known hardware/key,
A day to align the traces,
Several hours to select candidates and just minutes to brute-force and find the key!

Relying solely on CPA results, without using a template, can lead to a significant amount of brute-forcing — especially when testing keys directly on the device.

If I had access to a wrapped key that was encrypted with the target KEK, I could have performed the brute-force search offline, at a much higher speed. In that case, even having 4 or 5 candidates per key byte would be entirely manageable.

Having access to identical hardware with a known key proved very helpful in two key areas:

• Timing: It helped pinpoint the location of the AES operation within the EM traces
• Key guessing: It provided a visual template to guide the selection of key candidates

Franck Jullien - 2025

www.collshade.fr

Follow me on X and Bluesky! @fjullien06, fjullien.bsky.social