cryptopals

Matasano's cryptopals challenges (cryptopals.com).
git clone git://git.jtobin.io/cryptopals.git
Log | Files | Refs | README | LICENSE

s1.md (9071B)


      1 ### Set 1
      2 
      3 #### 1.1
      4 
      5 We want to go from hex (i.e., base 16) to base64 (i.e., base.. uh, 64).  So
      6 we'll change from the representation:
      7 
      8     .. + a2 * 16^2 + a1 * 16^1 + a0 * 16^0
      9 
     10 for some (decimal-equivalent) coefficients {a0, a1, .. } in the alphabet
     11 0-9a-f, to the representation:
     12 
     13     .. + b2 * 64^2 + b1 * 64^1 + b0 * 64^0
     14 
     15 for other coefficients {b0, b1, .. } in the alphabet A-Za-z0-9\+/.
     16 
     17 `xxd` is a hexdump utility; one can use `xxd -r` to go from hex to binary, or
     18 `xxd -r -p` to go from hex to UTF8:
     19 
     20     $ echo $(xxd -r -p data/s1/q1_input.txt)
     21     I'm killing your brain like a poisonous mushroom
     22 
     23 The BSD `base64` utility gets one the rest of the way.  An empty diff confirms
     24 equality:
     25 
     26     $ diff <(xxd -r -p data/s1/q1_input.txt | base64) data/s1/q1_output.txt
     27 
     28 #### 1.2
     29 
     30 Fixed-xor just encrypts by xoring every bit with some corresponding bit.
     31 An example, using `xxd -b` to output bits this time:
     32 
     33     $ parallel -k 'echo -n {} | xxd -b' ::: FAB ICE
     34     00000000: 01000110 01000001 01000010                             FAB
     35     00000000: 01001001 01000011 01000101                             ICE
     36 
     37 So xoring 'FAB' with 'ICE' can be done manually -- one just computes the
     38 bitwise xor:
     39 
     40     0100 0110 0100 0001 0100 0010
     41     0100 1001 0100 0011 0100 0101
     42 
     43     0000 1111 0000 0010 0000 0111
     44 
     45 In hex that's '0f0207'; we can use `bc` to calculate it; note it's not
     46 zero-padded:
     47 
     48     $ echo 'obase=16; ibase=2; 000011110000001000000111' | bc
     49     F0207
     50 
     51 The `fixed-xor` executable included will perform the reverse task on the
     52 (zero-padded) hex string:
     53 
     54     $ fixed-xor '0f0207' $(echo -n ICE | xxd -p) | xxd -r -p
     55     FAB
     56 
     57 and running `fixed-xor` on the question input yields the following:
     58 
     59     $ SOLUTION=$(fixed-xor $(< data/s1/q2_input0.txt) $(< data/s1/q2_input1.txt))
     60     746865206b696420646f6e277420706c6179
     61 
     62 The UTF8 encoding is fun:
     63 
     64     $ echo $SOLUTION | xxd -r -p
     65     the kid don't play
     66 
     67 #### 1.3
     68 
     69 (N.b., it's easy to memorize the (approximate) ranking of the most
     70 commonly used characters in English. ETAOIN SHRDLU CMFWYP etc.)
     71 
     72 Here we want to determine which byte has been used to produce a known
     73 single-byte xor'd ciphertext. One wants to score bytes by their
     74 "closeness" to what might be expected given typical English character
     75 frequencies, then check the scores of the ciphertext single-byte-xor'd
     76 against various bytes.
     77 
     78 Here we can grab a table of UTF8 character frequencies
     79 found in English corpora on the interwebs; I used [this
     80 guy's](http://www.fitaly.com/board/domper3/posts/136.html). One can then
     81 "score" bytestrings by e.g. calculating the mean squared error between
     82 an observed frequency distribution and this expected one.
     83 
     84 In single-byte xor, the entire input has simply been xor'd by.. a single
     85 byte. Take plaintext 'hello' and '?', for example:
     86 
     87     $ parallel 'echo -n {} | xxd -b' ::: hello ?
     88     00000000: 01101000 01100101 01101100 01101100 01101111           hello
     89     00000000: 00111111                                               ?
     90 
     91 Repeating '?' as many times as is necessary and manually xoring bitwise yields:
     92 
     93     01101000 01100101 01101100 01101100 01101111
     94     00111111 00111111 00111111 00111111 00111111
     95 
     96     01010111 01011010 01010011 01010011 01010000
     97 
     98 (N.b. it's worth noting that bitwise xor is simply addition modulo 2 -- i.e.,
     99 addition in GF(2), the Galois field of two elements.)
    100 
    101 The result in UTF8, going through hex, is:
    102 
    103     $ echo 'obase=16; ibase=2; 0101011101011010010100110101001101010000' | \
    104         bc | xxd -r -p
    105     WZSSP
    106 
    107 Since xor is its own inverse, going backwards will get us 'hello' again.
    108 
    109 For the actual question here: given an input, one can iterate over the UTF8
    110 printable bytes (in decimal, 32-126), compute the single-byte xor against it,
    111 and calculate its MSE.  The result with the smallest MSE is probably the byte
    112 that was used to encrypt it.
    113 
    114 Using a binary that does that, we get:
    115 
    116     $ break-single-byte-xor $(cat data/s1/q3_input.txt | tr -d '\n') | \
    117         xxd -r -p
    118     cryptopals: input similarity score is 4.760245723733781e-3
    119     cryptopals: xor-ing with 88 yields 1.1213430401648154e-3
    120     cryptopals: result
    121     Cooking MC's like a pound of bacon
    122 
    123 #### 1.4
    124 
    125 The idea in detecting single-byte XOR is to look for an unusually high
    126 frequency of particular bytes. Natural English sentences will produce
    127 some characters (e.g. ETAOIN.., whitespace) with high frequency, and
    128 these when XOR'd qgainst a single byte will always produce the same
    129 result. Inputs with a large number of repeated bytes should thus be
    130 considered suspect.
    131 
    132 The `detect-single-byte-xor` executable will prune out the most suspect
    133 inputs included in a file. Piping the results to `break-single-byte-xor`
    134 exposes the enciphered text; some relevant output is highlighted below:
    135 
    136     ```
    137     $ detect-single-byte-xor data/s1/q4_input.txt | \
    138         parallel --keep-order 'break-single-byte-xor {} | xxd -r -p'
    139     cryptopals: suspect inputs
    140     cryptopals: 7b5a4215415d544115415d5015455447414c155c46155f4058455c5b523f
    141     cryptopals: input similarity score is 3.016788883590278e-3
    142     cryptopals: xor-ing with 76 yields 7.568718315873016e-4
    143     cryptopals: result
    144     Now that the party is jumping
    145     ```
    146 
    147 #### 1.5
    148 
    149 Repeating-key XOR just cyclically XOR's bytes in the plaintext against
    150 bytes in the key:
    151 
    152     $ sec=$(repeating-key-xor ICE "$(< data/s1/q5_input.txt)")
    153     $ echo $sec | fold -w 74
    154     0b3637272a2b2e63622c2e69692a23693a2a3c6324202d623d63343c2a2622632427276527
    155     2a282b2f20430a652e2c652a3124333a653e2b2027630c692b20283165286326302e27282f
    156 
    157 Note the `repeating-key-xor` binary assumes its input isn't hex-encoded,
    158 so if we want to decrypt the ciphertext we'll need to pipe it through
    159 `xxd` when going in reverse:
    160 
    161     $ repeating-key-xor ICE "$(echo $sec | xxd -r -p)" | xxd -r -p
    162     Burning 'em, if you ain't quick and nimble
    163     I go crazy when I hear a cymbal
    164 
    165 #### 1.6
    166 
    167 First, determining a keysize for a suspected repeating-key XOR'd
    168 ciphertext. I use an average pairwise normalised Hamming distance; one
    169 breaks the ciphertext into chunks of a specific size, then calculates
    170 the number of differing bits for every pair of chunks, normalising by
    171 the chunk size, and then averages them all together.
    172 
    173     $ detect-repeating-key-xor-keysize "$(< data/s1/q6_input.txt)"
    174     cryptopals: keysize of 29 yields minimum score of 2.7856894063790736
    175 
    176 Then to guess the key itself, one chunks the input into blocks of the
    177 appropriate size and transposes the result, so that every byte in a
    178 transposed block has (if the ciphertext has indeed been produced by
    179 repeating-key XOR, and the keysize guess is correct) been XOR'd against
    180 a single byte. Doing that and breaking each block individually yields
    181 the key:
    182 
    183     $ input_hex=$(cat data/s1/q6_input.txt | base64 -d | xxd -p | tr -d '\n')
    184     $ key=$(rotate 29 $input_hex | \
    185         parallel -k 'break-single-byte-xor -l {}' 2> /dev/null | tr -d "\n'")
    186     $ echo $key
    187     Terminator X: Bring the noise
    188 
    189 Use `repeating-key-xor` with the key to recover the plaintext:
    190 
    191     $ repeating-key-xor "$key" --hex "$input_hex" | xxd -r -p | head -2
    192     I'm back and I'm ringin' the bell
    193     A rockin' on the mike while the fly girls yell
    194 
    195 #### 1.7
    196 
    197 Here we're doing AES-128 decryption in ECB mode. AES (Advanced
    198 Encryption Standard) is a substitution-permutation based block cipher
    199 with a fixed 128 bit (i.e. 16 byte) block size; AES-128 means the
    200 keysize is also 128 bits.
    201 
    202 ECB means "electronic codebook," the simplest block cipher encryption
    203 mode. One divides a message into 16-byte blocks and then encrypts each
    204 block independently.
    205 
    206 It's worth using the `openssl` command-line tool here just to get a feel
    207 for it:
    208 
    209     $ key=$(echo -n 'YELLOW SUBMARINE' | xxd -p)
    210     $ openssl enc -aes-128-ecb \
    211         -a -d -K "$key" -nosalt \
    212         -in data/s1/q7_input.txt | head -2
    213     I'm back and I'm ringin' the bell
    214     A rockin' on the mike while the fly girls yell
    215 
    216 The `aes` binary I've cooked up (using `cryptonite` under the hood) will
    217 similarly do the trick:
    218 
    219     $ key=$(echo -n "YELLOW SUBMARINE" | xxd -p)
    220     $ ciphertext=$(cat data/s1/q7_input.txt | base64 -d | xxd -p | tr -d '\n')
    221     $ aes decrypt ecb "$key" "$ciphertext" | xxd -r -p | head -2
    222     I'm back and I'm ringin' the bell
    223     A rockin' on the mike while the fly girls yell
    224 
    225 #### 1.8
    226 
    227 Under ECB mode, the same 16 bytes will always encrypt to the same
    228 output, so we can look for repeating bytes to detect ECB mode-encrypted
    229 ciphertext:
    230 
    231     $ cat data/s1/q8_input.txt | parallel \
    232       'echo -n {} | fold -w 16 | printf "%s %u\n" {} $(datamash countunique 1)' | \
    233       awk '{ if ($2 < 20) { print $1 }; }' | fold -w 64
    234     d880619740a8a19b7840a8a31c810a3d08649af70dc06f4fd5d2d69c744cd283
    235     e2dd052f6b641dbf9d11b0348542bb5708649af70dc06f4fd5d2d69c744cd283
    236     9475c9dfdbc1d46597949d9c7e82bf5a08649af70dc06f4fd5d2d69c744cd283
    237     97a93eab8d6aecd566489154789a6b0308649af70dc06f4fd5d2d69c744cd283
    238     d403180c98c8f6db1f2a3f9c4040deb0ab51b29933f2c123c58386b06fba186a
    239 
    240 The `Cryptopals.Block.Tools.detectMode` function produces the same
    241 result by doing much the same thing.