s1.md (9071B)
1 ### Set 1 2 3 #### 1.1 4 5 We want to go from hex (i.e., base 16) to base64 (i.e., base.. uh, 64). So 6 we'll change from the representation: 7 8 .. + a2 * 16^2 + a1 * 16^1 + a0 * 16^0 9 10 for some (decimal-equivalent) coefficients {a0, a1, .. } in the alphabet 11 0-9a-f, to the representation: 12 13 .. + b2 * 64^2 + b1 * 64^1 + b0 * 64^0 14 15 for other coefficients {b0, b1, .. } in the alphabet A-Za-z0-9\+/. 16 17 `xxd` is a hexdump utility; one can use `xxd -r` to go from hex to binary, or 18 `xxd -r -p` to go from hex to UTF8: 19 20 $ echo $(xxd -r -p data/s1/q1_input.txt) 21 I'm killing your brain like a poisonous mushroom 22 23 The BSD `base64` utility gets one the rest of the way. An empty diff confirms 24 equality: 25 26 $ diff <(xxd -r -p data/s1/q1_input.txt | base64) data/s1/q1_output.txt 27 28 #### 1.2 29 30 Fixed-xor just encrypts by xoring every bit with some corresponding bit. 31 An example, using `xxd -b` to output bits this time: 32 33 $ parallel -k 'echo -n {} | xxd -b' ::: FAB ICE 34 00000000: 01000110 01000001 01000010 FAB 35 00000000: 01001001 01000011 01000101 ICE 36 37 So xoring 'FAB' with 'ICE' can be done manually -- one just computes the 38 bitwise xor: 39 40 0100 0110 0100 0001 0100 0010 41 0100 1001 0100 0011 0100 0101 42 43 0000 1111 0000 0010 0000 0111 44 45 In hex that's '0f0207'; we can use `bc` to calculate it; note it's not 46 zero-padded: 47 48 $ echo 'obase=16; ibase=2; 000011110000001000000111' | bc 49 F0207 50 51 The `fixed-xor` executable included will perform the reverse task on the 52 (zero-padded) hex string: 53 54 $ fixed-xor '0f0207' $(echo -n ICE | xxd -p) | xxd -r -p 55 FAB 56 57 and running `fixed-xor` on the question input yields the following: 58 59 $ SOLUTION=$(fixed-xor $(< data/s1/q2_input0.txt) $(< data/s1/q2_input1.txt)) 60 746865206b696420646f6e277420706c6179 61 62 The UTF8 encoding is fun: 63 64 $ echo $SOLUTION | xxd -r -p 65 the kid don't play 66 67 #### 1.3 68 69 (N.b., it's easy to memorize the (approximate) ranking of the most 70 commonly used characters in English. ETAOIN SHRDLU CMFWYP etc.) 71 72 Here we want to determine which byte has been used to produce a known 73 single-byte xor'd ciphertext. One wants to score bytes by their 74 "closeness" to what might be expected given typical English character 75 frequencies, then check the scores of the ciphertext single-byte-xor'd 76 against various bytes. 77 78 Here we can grab a table of UTF8 character frequencies 79 found in English corpora on the interwebs; I used [this 80 guy's](http://www.fitaly.com/board/domper3/posts/136.html). One can then 81 "score" bytestrings by e.g. calculating the mean squared error between 82 an observed frequency distribution and this expected one. 83 84 In single-byte xor, the entire input has simply been xor'd by.. a single 85 byte. Take plaintext 'hello' and '?', for example: 86 87 $ parallel 'echo -n {} | xxd -b' ::: hello ? 88 00000000: 01101000 01100101 01101100 01101100 01101111 hello 89 00000000: 00111111 ? 90 91 Repeating '?' as many times as is necessary and manually xoring bitwise yields: 92 93 01101000 01100101 01101100 01101100 01101111 94 00111111 00111111 00111111 00111111 00111111 95 96 01010111 01011010 01010011 01010011 01010000 97 98 (N.b. it's worth noting that bitwise xor is simply addition modulo 2 -- i.e., 99 addition in GF(2), the Galois field of two elements.) 100 101 The result in UTF8, going through hex, is: 102 103 $ echo 'obase=16; ibase=2; 0101011101011010010100110101001101010000' | \ 104 bc | xxd -r -p 105 WZSSP 106 107 Since xor is its own inverse, going backwards will get us 'hello' again. 108 109 For the actual question here: given an input, one can iterate over the UTF8 110 printable bytes (in decimal, 32-126), compute the single-byte xor against it, 111 and calculate its MSE. The result with the smallest MSE is probably the byte 112 that was used to encrypt it. 113 114 Using a binary that does that, we get: 115 116 $ break-single-byte-xor $(cat data/s1/q3_input.txt | tr -d '\n') | \ 117 xxd -r -p 118 cryptopals: input similarity score is 4.760245723733781e-3 119 cryptopals: xor-ing with 88 yields 1.1213430401648154e-3 120 cryptopals: result 121 Cooking MC's like a pound of bacon 122 123 #### 1.4 124 125 The idea in detecting single-byte XOR is to look for an unusually high 126 frequency of particular bytes. Natural English sentences will produce 127 some characters (e.g. ETAOIN.., whitespace) with high frequency, and 128 these when XOR'd qgainst a single byte will always produce the same 129 result. Inputs with a large number of repeated bytes should thus be 130 considered suspect. 131 132 The `detect-single-byte-xor` executable will prune out the most suspect 133 inputs included in a file. Piping the results to `break-single-byte-xor` 134 exposes the enciphered text; some relevant output is highlighted below: 135 136 ``` 137 $ detect-single-byte-xor data/s1/q4_input.txt | \ 138 parallel --keep-order 'break-single-byte-xor {} | xxd -r -p' 139 cryptopals: suspect inputs 140 cryptopals: 7b5a4215415d544115415d5015455447414c155c46155f4058455c5b523f 141 cryptopals: input similarity score is 3.016788883590278e-3 142 cryptopals: xor-ing with 76 yields 7.568718315873016e-4 143 cryptopals: result 144 Now that the party is jumping 145 ``` 146 147 #### 1.5 148 149 Repeating-key XOR just cyclically XOR's bytes in the plaintext against 150 bytes in the key: 151 152 $ sec=$(repeating-key-xor ICE "$(< data/s1/q5_input.txt)") 153 $ echo $sec | fold -w 74 154 0b3637272a2b2e63622c2e69692a23693a2a3c6324202d623d63343c2a2622632427276527 155 2a282b2f20430a652e2c652a3124333a653e2b2027630c692b20283165286326302e27282f 156 157 Note the `repeating-key-xor` binary assumes its input isn't hex-encoded, 158 so if we want to decrypt the ciphertext we'll need to pipe it through 159 `xxd` when going in reverse: 160 161 $ repeating-key-xor ICE "$(echo $sec | xxd -r -p)" | xxd -r -p 162 Burning 'em, if you ain't quick and nimble 163 I go crazy when I hear a cymbal 164 165 #### 1.6 166 167 First, determining a keysize for a suspected repeating-key XOR'd 168 ciphertext. I use an average pairwise normalised Hamming distance; one 169 breaks the ciphertext into chunks of a specific size, then calculates 170 the number of differing bits for every pair of chunks, normalising by 171 the chunk size, and then averages them all together. 172 173 $ detect-repeating-key-xor-keysize "$(< data/s1/q6_input.txt)" 174 cryptopals: keysize of 29 yields minimum score of 2.7856894063790736 175 176 Then to guess the key itself, one chunks the input into blocks of the 177 appropriate size and transposes the result, so that every byte in a 178 transposed block has (if the ciphertext has indeed been produced by 179 repeating-key XOR, and the keysize guess is correct) been XOR'd against 180 a single byte. Doing that and breaking each block individually yields 181 the key: 182 183 $ input_hex=$(cat data/s1/q6_input.txt | base64 -d | xxd -p | tr -d '\n') 184 $ key=$(rotate 29 $input_hex | \ 185 parallel -k 'break-single-byte-xor -l {}' 2> /dev/null | tr -d "\n'") 186 $ echo $key 187 Terminator X: Bring the noise 188 189 Use `repeating-key-xor` with the key to recover the plaintext: 190 191 $ repeating-key-xor "$key" --hex "$input_hex" | xxd -r -p | head -2 192 I'm back and I'm ringin' the bell 193 A rockin' on the mike while the fly girls yell 194 195 #### 1.7 196 197 Here we're doing AES-128 decryption in ECB mode. AES (Advanced 198 Encryption Standard) is a substitution-permutation based block cipher 199 with a fixed 128 bit (i.e. 16 byte) block size; AES-128 means the 200 keysize is also 128 bits. 201 202 ECB means "electronic codebook," the simplest block cipher encryption 203 mode. One divides a message into 16-byte blocks and then encrypts each 204 block independently. 205 206 It's worth using the `openssl` command-line tool here just to get a feel 207 for it: 208 209 $ key=$(echo -n 'YELLOW SUBMARINE' | xxd -p) 210 $ openssl enc -aes-128-ecb \ 211 -a -d -K "$key" -nosalt \ 212 -in data/s1/q7_input.txt | head -2 213 I'm back and I'm ringin' the bell 214 A rockin' on the mike while the fly girls yell 215 216 The `aes` binary I've cooked up (using `cryptonite` under the hood) will 217 similarly do the trick: 218 219 $ key=$(echo -n "YELLOW SUBMARINE" | xxd -p) 220 $ ciphertext=$(cat data/s1/q7_input.txt | base64 -d | xxd -p | tr -d '\n') 221 $ aes decrypt ecb "$key" "$ciphertext" | xxd -r -p | head -2 222 I'm back and I'm ringin' the bell 223 A rockin' on the mike while the fly girls yell 224 225 #### 1.8 226 227 Under ECB mode, the same 16 bytes will always encrypt to the same 228 output, so we can look for repeating bytes to detect ECB mode-encrypted 229 ciphertext: 230 231 $ cat data/s1/q8_input.txt | parallel \ 232 'echo -n {} | fold -w 16 | printf "%s %u\n" {} $(datamash countunique 1)' | \ 233 awk '{ if ($2 < 20) { print $1 }; }' | fold -w 64 234 d880619740a8a19b7840a8a31c810a3d08649af70dc06f4fd5d2d69c744cd283 235 e2dd052f6b641dbf9d11b0348542bb5708649af70dc06f4fd5d2d69c744cd283 236 9475c9dfdbc1d46597949d9c7e82bf5a08649af70dc06f4fd5d2d69c744cd283 237 97a93eab8d6aecd566489154789a6b0308649af70dc06f4fd5d2d69c744cd283 238 d403180c98c8f6db1f2a3f9c4040deb0ab51b29933f2c123c58386b06fba186a 239 240 The `Cryptopals.Block.Tools.detectMode` function produces the same 241 result by doing much the same thing.