Today, I implemented the SHA-256 hashing algorithm from scratch in Rust, with Matthew Avant. "Why in the lord's name would you do this?" is a question you probably would ask. Hahaha, you're so funny! Let's talk about how SHA-256 works.
The Big Picture
SHA-256 starts by creating an array of 8 numbers, which start out as predefined constants. Let's call this array h
. SHA-256 then loops through the message in 512-bit chunks. For each chunk, it uses some math magic to modify h
according to the 512 bits of the chunk and the previous value of h
. This continues until all the chunks have been processed and h
has been totally scrambled up. It then takes this final value of h
, converts it to hexadecimal, and outputs it. With this method, the amount of memory needed doesn't balloon out of control for large files. At the same time, a small change, even at the start of the message, will completely change how the rest of the message is hashed.
The Rust Implementation
The algorithm starts off with defining the initial hash values h0
through h7
, and the round constants k
.
fn sha256(input: ~str) {
let mut h0:u32 = 0x6a09e667;
let mut h1:u32 = 0xbb67ae85;
let mut h2:u32 = 0x3c6ef372;
let mut h3:u32 = 0xa54ff53a;
let mut h4:u32 = 0x510e527f;
let mut h5:u32 = 0x9b05688c;
let mut h6:u32 = 0x1f83d9ab;
let mut h7:u32 = 0x5be0cd19;
let k = [0x428a2f98, 0x71374491, 0xb5c0fbcf, ..., 0xc67178f2];
The h0
... h7
variables are the initial values of the hash. After each round, these variables will be replaced by the output of the mathy stuff, but since this is the first round and we don't have any output yet, they need to have an initial value, which is those numbers up above.
Once the initialization is set up, we need to prepare the string, which involves converting it to an array of bytes, padding the end with zeros, and adding the length of the message in bits at the very end.
let mut msg: ~[u8] = input.as_bytes().to_owned();
let l = msg.len() as u64;
msg.push(0x80);
while (msg.len() % 64) != 56 {
msg.push(0);
}
std::io::extensions::u64_to_be_bytes(l*8, 8, |v| for i in v.iter() {
msg.push(*i);
});
The padding ensures that the 512-bit chunks are exactly full, so the last chunk isn't half-filled if the bit length of the message aren't divisible by 512.
The next step is to actually divide up the message into chunks of 64 bytes.
for c in msg.mut_chunks(64) {
Rust makes that pretty easy. For each chunk, we're going to do a bunch of mathy stuff, and get out the variables a
through h
.
let mut w: [u32,..64] = [0, ..64];
for i in range(0, 16) {
let a1: u32 = (c[4 * i] as u32) * (num::pow(2, 24) as u32);
let a2: u32 = (c[4 * i + 1] as u32) * (num::pow(2, 16) as u32);
let a3: u32 = (c[4 * i + 2] as u32) * (num::pow(2, 8) as u32);
let a4: u32 = (c[4 * i + 3] as u32);
let a: u32 = a1 + a2 + a3 +a4;
w[i] = a;
}
for i in range(16, 64) {
let s0: u32 = rotr(w[i-15], 7) ^ rotr(w[i-15], 18) ^ (w[i-15] >> 3);
let s1: u32 = rotr(w[i-2], 17) ^ rotr(w[i-2], 19) ^ (w[i-2] >> 10);
w[i] = w[i-16] + s0 + w[i-7] + s1;
}
let mut a = h0;
let mut b = h1;
let mut c = h2;
let mut d = h3;
let mut e = h4;
let mut f = h5;
let mut g = h6;
let mut h = h7;
for i in range(0, 64) {
let S1 = rotr(e, 6) ^ rotr(e, 11) ^ rotr(e, 25);
let ch = (e & f) ^ (!e & g);
let temp1 = h + S1 + ch + k[i] + w[i];
let S0 = rotr(a, 2) ^ rotr(a, 13) ^ rotr(a, 22);
let maj = (a & b) ^ (a & c) ^ (b & c);
let temp2 = S0 + maj;
h = g;
g = f;
f = e;
e = d + temp1;
d = c;
c = b;
b = a;
a = temp1 + temp2;
}
Once we've got the variables out, we add them to the hash state variables (modulo 2^32
, since the variables are u32
s), and loop back to the next chunk to continue.
h0 += a;
h1 += b;
h2 += c;
h3 += d;
h4 += e;
h5 += f;
h6 += g;
h7 += h;
Once all the chunks have been processed, we encode the hash vars into hexadecimal, and print them.
println!("{:x}{:x}{:x}{:x}{:x}{:x}{:x}{:x}", h0, h1, h2, h3, h4, h5, h6, h7);
That's all there is to it! In my next blog post, I'll explain a potential vulnerability in hashing when it's used for signing, such as how we used it when building doorbot.