In my previous post, I explained how we used SHA-256 to cryptographically sign HTTP requests between an Arduino called DoorDuino and a Ruby app known as Doorbot. However, that system was vulnerable to a hash length extension attack. To understand how extension attacks work, let's first discuss how SHA-256 hashes strings.
SHA-256 starts by creating an array of 8 numbers, which start out as predefined constants. Let’s call this array h
. SHA-256 then loops through the message to be hashed in 512-bit chunks. For each chunk, it uses some math magic to modify h
according to the 512 bits of the chunk and the previous value of h
. This continues until all the chunks have been processed and h
has been totally scrambled up. It then takes this final value of h
, converts it to hexadecimal, and outputs it.
Since a SHA-256 hash is just h
in hexadecimal form, anybody who can see the hash knows the final value of h
. There's nothing stopping an attacker from taking that h
, and using the math magic again on another 512-bit chunk to create a new hash of the plaintext of the first hash with additional content added on at the end. The hash will be different, but the attacker doesn't need to know the existing plaintext to generate a new hash with content added on to that plaintext. This means they could change the Doorbot request without knowing the password!
Now exploits aren't any fun if you can't actually use them, so let's try it out! In order to use this exploit, your SHA-256 library must let you set the initial values of the internal hash. Sadly, Ruby's digest
library doesn't have this functionality, so Matthew Avant and I built an implementation of SHA-256 in Rust. We can easily get the sha256 of a string.
sha256(~"kittens");
// c81a7b1e755bdf87160ff008f94c8ecc21bc2a71a23bf5e1351300edc0231a1
Our library also has the function sha256_core
, which allows customization of h
's initial vector, as well as an argument to change the calculated length of the hashed content, which will become useful later. It also accepts arrays of bytes instead of strings. In Rust, we convert strings to byte arrays by calling .as_bytes().to_owned()
on them.
sha256_core("kittens".as_bytes().to_owned(),(
0x6a09e667,
0xbb67ae85,
0x3c6ef372,
0xa54ff53a,
0x510e527f,
0x9b05688c,
0x1f83d9ab,
0x5be0cd19
), 7);
// c81a7b1e755bdf87160ff008f94c8ecc21bc2a71a23bf5e1351300edc0231a1
Those two statements generate the same hash, and are equivalent — internally, sha256
sets the initial value of h
to (0x6a09e667,0xbb67ae85,...)
, converts the message to an array of bytes, and sets the length to the length of the string.
The Authentic Request
Now that our library is all set up and ready to go, let's play the part of Doorbot, making the authenticated request to the Arduino. To unlock the door, our app makes a request to http://doorduino/unlock?nonce=<nonce>
, where <nonce>
is the incrementing number. The app also sends an X-Auth-Hash
header, whose value is a sha256
hash generated like so:
sha256(password + url_requested)
password
is a password that is 15 characters long, and is specific to each doorbot/doorduino pair. For our pair, the password is secret_password
.
For instance, let's do an example where the nonce is 42
. Our app will first calculate the sha256 hash from the URL and the password. These examples are all in Rust, but you shouldn't need to know Rust to understand them — they're pretty simple.
sha256(~"secret_password" + ~"http://doorduino/unlock?nonce=42");
// 2e663bc3fab972a7d33a7b77dc8017bdb326065b1950f6629e448c284de36e9f
Then, our app makes the HTTP request, which looks like this:
POST http://doorduino/unlock?nonce=42 HTTP/1.1
Host: doorduino
Accept: */*
X-Auth-Hash: 2e663bc3fab972a7d33a7b77dc8017bdb326065b1950f6629e448c284de36e9f
The Attack
Meanwhile, our attacker, who is well-equipped with our Rust implementation of SHA-256, spies on the connection and sees the HTTP request above. They also have access to the source code (it's an open source project), but not the actual password used. So, our attacker has three pieces of information:
- How the hash is generated:
sha256(password + url)
- The fact that passwords are always 15 characters long1.
- The URL requested:
http://doorduino/unlock?nonce=42
- The valid hash of the password and the URL:
2e663bc3fab972a7d33a7b77dc8017bdb326065b1950f6629e448c284de36e9f
The attacker wants to generate a new HTTP request with a higher nonce that has a valid hash. Actually figuring out the password from the hash is next to impossible 2, but as we showed earlier, the attacker doesn't need to know the password.
Instead, the attacker feeds the captured hash as the initial value of h
to sha256_core
, along with the string &nonce=43
, and the calculated length of the password. They also pass sha_core
the length in bytes of the existing hashed 512-bit block, which is 64, plus the length of the extension, which is 9, for a total of 73.
sha256_core(~"&nonce=43",(
0x2e663bc3,
0xfab972a7,
0xd33a7b77,
0xdc8017bd,
0xb326065b,
0x1950f662,
0x9e448c28,
0x4de36e9f
), 73);
// 60be0a782cc4664e20b4c14799627b2742e437529b17ba8051fa75e018ee6a68
This new hash is the hash of the contents of the first hash, with &nonce=43
at the end. The attacker did not need to know the original password to generate this hash! They just generated a new hash of both the contents of the old hash with their new string tacked on at the end.
Well, I say just tacked on, but it's sightly more complicated than that. sha256
appends the total length of the message on to the end of the message, a 1
bit after the message, and adds 0
bits in between the length and the 1
until the total message length perfectly fills the last block. The final message length is always a multiple of 512 bits. This means that if you originally hashed chicken
, sha256
transforms that message into:
chicken\x80\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x07
(\x
is an escape sequence that takes the hexadecimal number after in and converts it into a raw byte.)
So our attacker sends this request to the server:
POST http://doorduino/unlock?nonce=42\x80\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x78&nonce=43 HTTP/1.1
Host: doorduino
Accept: */*
X-Auth-Hash: 60be0a782cc4664e20b4c14799627b2742e437529b17ba8051fa75e018ee6a68
Receiving the Fake Request
The Arduino receives this request, converts the string into a byte array, and hashes it3:
sha256_core("secret_passwordhttp://doorduino/unlock?nonce=42".as_bytes().to_owned() +
~[0x80 as u8, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x1, 0x78] +
"&nonce=43".as_bytes().to_owned(),
(
0x6a09e667,
0xbb67ae85,
0x3c6ef372,
0xa54ff53a,
0x510e527f,
0x9b05688c,
0x1f83d9ab,
0x5be0cd19
),
73);
// 60be0a782cc4664e20b4c14799627b2742e437529b17ba8051fa75e018ee6a68
The hash is valid, and contains the password, even though the hacker did not know the password at any point. Although it depends on how the Arduino is set up to parse duplicate query strings, it will likely overwrite the first definition of nonce (42\x80\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x78
) with the latter, 43
. It will see that the nonce is larger than the original request, and unlock the door.
Protecting Against this Attack
HMAC is a way to prevent extension attacks. Instead of just hashing the content and password once, HMAC uses an algorithm that includes hashing the result of another hash, that is, nested hashing. This means that extending the outside hash is pointless, since the inside hash will still be unchanged and the server will detect the hash is invalid.
In the Wild
Although I made several assumptions along the way and my examples seems perhaps someone contrived, this attack is a real exploit that has been used against web services such as Flickr and Remember the Milk. It's a good thing to keep in mind when looking at anything that uses a hashing algorithm to sign something — SHA-256 isn't the only hashing algorithm vulnerable to this.
In practice, the protocol would probably not specify a password length, and an attacker wouldn't know the length of the password, but having to guess is fairly simple — assuming the password is less than ~200 characters, the attacker would have to try at most 200 hashes to find the right one.
2: That is, if the password weren't as simple as secret_password
.
3: This example is in Rust with our library, although in reality Arduino programs are written in that weird C variant thing. Additionally, all that weird syntax is because Rust doesn't allow for invalid UTF-8 strings, so some parts have to be raw byte arrays.