mail@pastecode.io avatar
2 years ago
1.8 kB
You have two options:

If the cipher is long enough, you can check for more statistically common letters to help out in guessing parts of the cipher, for example we know that the letter “e” is the most common letter in the English alphabet. The way you would do that is brute force all the different lengths of the cipher (e.g. if the cipher is LEMON then you would look at every 5th letter of the ciphertext). Then, for each length (say 1–20) you would run the statistical analysis and see where “e” is, and subtract that to know the letter in the cipher text.
Write a brute force algorithm that tries different ciphers from a dictionary (because it is likely going to be an English word), and simply checks which output resembles English the most using statistical analysis (Hidden Markov model).
To know how “English-like” a string is, you would need to take a lot of English text and build a simple statistical model (HMM) that simply stores the probability of having the letter X given (x-1) and (x-2) is known. For example, if “THE” is much more common than “THJ” in English (which obviously is the case, then we can say that the probability of seeing E in the text given that the last two letters were TH is, say, 0.9. (P(Xn=E | Xn-2=T, Xn-1=H) = 0.9)

To build this structure, take all triplets of letters (without spaces because that’s how data is encrypted) and just do a frequency count on English. Then for your potentially decrypted text, check the frequencies and average them out, then compare the potentially decrypted texts to choose the most English-like one.

Note: Since no key length, key or plain text is given, its impossible to do by hand so you need a bit of computer power to do the brute force