April 2nd, 2008 RaVPup |
Bookmark
I have been entertained by the brute force attempts on this server since I deployed it and have been meticulously reading the logs. The reason I have been paying so much attention to them is because there was a noticeable absence of a firewall or any other mechanism of protecting the ports from enumeration. The occurrences of brute force attacks conducted by robots or very bored people with programs are quite common on the Internet. There are myriads of infected machines out there that can be dedicated to finding more vulnerable hosts. This is the currency of the hacker world. I do not mind the attempts so much as I mind the principle of them. Brute forcing passwords over an Internet connection is the last resort of the least intelligent.
There is a common misconception among Information Technology professionals without a thorough grounding in permutations and combinations to misquote the difficulty of brute forcing a password from the length of their key. The length of the key is important, what it defines is the key space of the cryptanalysis problem. Where your key lies in that key space is another matter altogether. Let’s consider the case where there is a key of length nine, in the set of all Unicode characters. Say we are given a key which is ‘aaaaaaaaa’ and another key which is ‘zzzzzzzzz’. The key space of both these keys is the same. That is to say from elementary combinatorics that each character has 1114112 states, and each of the states are not mutually exclusive. This can be generally represented as –

For any Q which represents the events that your machine can represent and the integer n which represents the number of these events. For keys that have events which are mutually exclusive, a minor modification to the general equation yields –

We can compute the key space for both our very simple keys to be

The magnitude of the key space is 1054 and from some rough references, the fastest supercomputer available today, the Blue Gene/P which is operating at 360×1012 flops, would take 7.3×1039 seconds to brute force the entire key space, assuming an ideal efficiency of one key per flop.
That is the ideal scenario. Let’s consider a practical application, the web application. Firstly, we can eliminate the Unicode superset because most people only use one, or in the case of Jennifer, a handful of the subsets. A cursory look at the web application will usually yield the information that is required in terms of linguistics. This has already eliminated the majority of the sample space and we are now left with the problem in the target subset. Let us use the ASCII subset as an example. We can reduce the sample space further by eliminating the non-printable characters, yielding only a sample space of 5.7×1017. To gain some perspective, this would take Blue Gene/P 1.6×103 seconds to compute. It is time to make some intelligent guesses about the passwords. Determining the sample space of the web application is a key priority, as this yields the information that we would need to determine if we can reduce this key space even further. I have found several, perhaps more web applications where the passwords are limited to the alphanumeric key space. This is only a total of 369 events, or a sample space of 1.0×1014, for a nine character password. If the designers of the web application are astute, they have hashed the passwords to prevent accidental disclosure using SQL injection. This is rarely the problem it might seem at first. To give you a real world situation, an eight character MD5 hashed key was successfully brute forced on a Pentium III in under a week.
What make attacks like these possible are two factors, both generalising to hash functions. I have discussed these attacks with a few people only to receive the response that the passwords are encrypted and there is no way to recover them. Technically, this is true. They are encrypted with a cryptographic hash which differs from a normal hash in that some of the original information is lost in the hashing process. What they have failed to consider is that because the algorithm is known, we can generate our own hashes and use a known plain text attack on the hash. This is done by generating random sets of characters that match the maximum possible key space of the target and encoding each one in turn to yield a hash. The hash is then compared to the recovered target and if they are the same, the original set is the key that we are after.
Another problem that plagues hashes is that they are susceptible to collision. For any hash that can be computed from any combination of input events, there exists another set of input events that can generate the same hash. This is known as the birthday paradox and it can be combined with a reduction in sample space to yield working input sets more quickly. Web applications hash the password before storing it in a database and a successful authentication is registered when an input password hashes to the same password recorded for the account that is being enumerated. It is now possible to generate a collision with MD5 in one minute on a laptop computer, leading to the conclusion that MD5 is no longer a cryptographically secure hashing algorithm.
There are another class of problems that all hashes are susceptible to known as the time-memory problem. If we were to set aside dedicated resources computing hashes and store them in a look aside table, we would be able to save the time required to compute the hash by trading memory to store the computed hashes. The second, not so obvious way to exploit this issue is when combined with the reuse of account passwords. The problem I faced in 2001 was that someone had created another user account on a forum and was masquerading as a legitimate user. Once it became obvious that the user was an interloper, the problem of how to locate the user was presented. With some improvised thinking, I decided to compare the hash of the interloper to all hashes in the database. I first retrieved the offenders hash and then used an SQL query to locate any legitimate accounts with the same hash. The method was successful and I saw the results immediately because the responsible party changed their password while sitting right next to me. I didn’t have to wait for the results of the query. I don’t know if they realise that they disclosed themselves but I continued the ruse to give them an easy exit.
Returning to our sample keys, if the attacker were to start brute forcing at the least significant portion of the key space, with some intelligent guesses for the key space domain, they would have to compute 268+1 hashes. With the second password they would have to compute 269 hashes. This translates to 2.1×1011 operations versus 5.4×1012 operations. If however, the attacker computed the hashes from the high order of the space to the low order, the situation would be reversed. Therefore, it is important to note that without knowledge of the algorithm that the attacker is using to compute the sample space, we cannot say anything about the security of the key. This is especially true for dedicated hardware and advanced methods that are used by sophisticated attackers. There are many optimisations that can be performed based on the weakness of algorithm implementations or based on dedicated hardware such as FPGAs. The NSA has made an art of it.
We can make the hash more resistant to a birthday attack by the use of salts which are added to the hash in some way, rendering the hash a non-linear transformation of the input set. That is to say, the salt must be non-computable from the input set. In practical terms, this requires that the application compute a salt and store it internally without disclosure to the legitimate user or otherwise. The security of the hash then depends to some extent on the security of the salt. Therefore, any disclosure of the salts of the hashes by means of SQL injection, XSS or inspection of the internal state of the machine will compromise the hashes. Due to the fact the salts are added to the input before the hashing function, an input set that is equivalent to the original input set plus the salt will yield a successful attack.
In real terms, brute forcing of passwords on servers is not a serious problem unless the algorithms are highly targeted. What I mean by that is, for an attacker hoping to break short passwords, the algorithm will generally tend to start from the lowest order of the key space that is to be computed. This may be optimised with current knowledge about password selection by starting computation from an input set that is a reasonable estimation of the target input set. If the attacker knows that the administrator of the system uses the maximum possible key length, they can start their attack from that specific key length and save the computation time it would require to trial suboptimal sets. On my terms, I have eliminated the mechanisms by which they can trial input sets. If nothing else, it will push them to think outside the box and use innovative methods.
Errata 03/04/08 - It seems that after working 24:07 hours in a row, fatigue has gotten the better of me and caused me to publish the wrong equation for an exclusive set of events. The error has been fixed.
Addendum 06/04/08 - Cleared the ambiguity with regard to salts.
Posted in Computers, Mathematics | No Comments »