Hacker, Hack Thyself

codinghorror · June 2, 2017, 10:52pm

A few notes not in the post, but I wanted to mention:

Running a long term password crack on your primary GPU (the one used to drive your video) is … surprisingly painful. Even with hashcat on “multitasking friendly and slowest” mode, video performance becomes incredibly sluggish. If you really want to do a long term (as in weeks, not days) password cracking project, you DEFINITELY should to build a dedicated machine for it, in my opinion. Way way too painful on your primary machine.

Speaking of password hash cracking "in the cloud ", Amazon’s GPUs are super anemic. One GTX 1080 Ti is worth more than three AWS G2.8xlarge instances!

Hashtype: PBKDF2-HMAC-SHA256

9473.2 kH/s   8x 1080
3737.4 kH/s   16x Tesla K80, p2.16xlarge
1730.9 kH/s   1080 Ti
1173.1 kH/s   1080
 883.3 kH/s   1070
 594.5 kH/s   RX 480
 459.6 kH/s   4x GRID K520, g2.8xlarge
 304.7 kH/s   HD 6690
 114.8 kH/s   GRID K520, g2.2xlarge

See more info about Amazon’s G2 instances and compare pricing… right now the g2.8xlarge is $2.60 per hour, or $62.40 per day. There is apparently also a new P2 instance type which has up to 16 Tesla K80 GPUs which is a little better. The 8x is $7.20 per hour, and the 16x is $14.40 per hour.

I do believe blocking the top X most common passwords is the best and most efficient strategy, but it is concievable you could run an automated, regular offline GPU crack attempt on all accounts, and then auto-reset passwords of those users whose passwords can be easily cracked. I would allocate at least one hour of GPU time per account, and obviously you’d want to use popular wordlists and masks to do so, brute force is out of the question. Another very clever idea, but it would not be trivial to set up.

Finally, when it comes to password generation, obviously in a perfect world we would all use magical perfectly random password generators. Barring that, for human generated passwords, I have some suggestions:

if you use a dictionary word, insert something random inside the word to make it no longer a dictionary word
avoid “number at the end” or “number at the beginning”
avoid “capitalize the first character”
try to fold something site-specific into your password for that site, as a kind of “site hash”. Don’t just concatenate words together though – insert one word at random within the other.

Let’s say you were generating a human password for, I dunno, reddit. Rather than

Redditmonkey1985

do

reddMon5891keyit

Break up the dictionary / site-specific words, capitalize other than beginning, and put the number in somewhere other than beginning or end.

Jeff_Johnson · June 3, 2017, 2:46am

So the database has the hash and the salt. Do you store a 3rd salt in say the web configuration file? I assume it would be much harder for an attacker to get this global salt value than a database backup file? Maybe I’m wrong, if they get access to your machine they have everything, but I’m working under the assumption they got a database backup file without gaining file read access to the web server.

thenrich2009 · June 3, 2017, 3:01am

All this trouble to get the password of a user in a discussion forum? And then what? Post silly messages on behalf of the user? This is worth the effort if the hacker is hoping the user is using the same password on another site the hacker is interested in.

Paul_Jimenez · June 3, 2017, 4:02am

Amazon’s GPUs may be super anemic… but they’re also more available and instantly-scalable than building hardware. You don’t have to be a nation-state to have thousands of GPUs… just enough money to afford to pay Amazon’s prices. And their pricing model makes it just as efficient (price-wise) to crack them in parallel as in series. Which suggests another metric of password difficulty, instead of time: money! How much would it cost to pay Amazon to use their GPUs to crack a password.

Wladimir_Palant · June 3, 2017, 5:36am

I know, but the table nevertheless sort of compares PBKDF, bcrypt and scrypt cracking performance without mentioning the number of iterations - that’s just pointless. I would actually like to use your numbers to validate my own approach, but without knowing the number of iterations this isn’t possible.

This article appears to have the necessary info. There is also this one which appears to be a follow-up. That’s all I know, didn’t try it out myself.

And then try to hack this user’s email account because they are likely reusing passwords.

codinghorror · June 3, 2017, 5:46am

That is a question about the built in hashcat benchmark function, see Benchmark compare algorithms and browse the actual values in the table at

https://hashcat.net/wiki/doku.php?id=example_hashes

Wladimir_Palant · June 3, 2017, 5:53am

In other words, 1000 PBKDF iterations. Good to know
bcrypt uses 2⁵ iterations (yes, that’s 32 of them). And scrypt appears to be using 1024 iterations.

All of these values are way below current recommendations of course. In particular, given the low number of bcrypt iterations, you will probably get better results if you run it on the CPU.

trajano · June 3, 2017, 6:09pm

This is why I like:

having two-factor authentication myself for anything important.
not having to have yet another password to deal with when I go to another service (thanks for making this with OpenID Connect)

24Seven · June 4, 2017, 12:50am

First, I applaud you for actually testing your hypothesis. You abided by the most critical question in a scientific study: how do I know what I think is true is actually true? Hubris is the enemy of security and you took measures to validate your assumptions with outside sources. Kudos!

Second, I’m curious about your experiment with the expert. Did they only go through a single iteration? Did they have the usernames that go with the passwords? Did they analyze the content on Discourse for those users? Did they analyze any other sources from the same users?
If you study how this white hat hacker worked (https://arstechnica.com/security/2013/05/how-crackers-make-minced-meat-out-of-your-passwords/), you see that they ran through four different iterations wherein they analyzed cracked passwords for patterns. Did this security expert use your various suggested password patterns as part of their attack?

My argument is that improving the hashing algorithm is definitely important and will absolutely improve security but it Sisyphean endeavor because it relies on a wholly unreliable source of limited capability which is the human. As machine learning gets better at anticipating the types of passwords humans will use, it will improve the already unbelievable ability to crack passwords.

There is a saying that the safest password is one that has never been cracked. Of all the requirements you make on Discourse users perhaps the best is denying them the ability to use the top 10K cracked passwords. Expand that list, and I suspect it will greatly improve security. Unfortunately, with IoT, there are soon to be billions of passwords out there and some subset of those cracked. That means choosing a password that hasn’t been used will get tougher. So much so, that you get closer to randomly choosing a password and at that point, you hit the point of using a password manager.

Again, to add significantly more security IMO, you need to add additional factors. Where you are connecting, what you have, who you are, patterns of entry etc. Only additional factors will make password crackers mostly obsolete.

sp3nx0r · June 4, 2017, 4:42pm

Great article. We did the exact same thing over at our company, “hackers, hacking thyself”. Super eye opening, adjusted some of our IT policies and practices. This is valuable for wherever you are in infosec: software development, security researcher, or infosec employee for a company. A little elbow grease gives one great insight.

CaptainKirk · June 5, 2017, 4:47pm

Maybe we thought about this differently. Maybe we were less secure than we realized.
Now, if we lost the source code to the site, we were hosed, they would know the secret sauce.

We basically did it this way:
Given a UserName & Password, we returned a user record ONLY using UserName.

We took Password, and generated the following String:

Salt_Pre + UserName + customHash(userID, Password) + Password + UserID*X + Salt_Post

Sizes: 10 + 3-50 + 8 + 6-unl + 6 + 10 -> 40 - 50 character String, to be MD5’d by the DB.

Now, Salt_Pre and Salt_Post were UNIQUE sitewide salts.

and customHash() was a hash worked out from a Sedgewick book, and the UserID was MODed to give us a really User specific hash value.

the thought process being that if you had the DB we could not envision you getting to the password, without knowing this.

And the login code, of course added: where userId = :UserID and Passwordhash = MD5(:Password);

so what was stored in the DB was the MD5 of that final string.

Obviously you cannot alter the Username, but it is a Key anyways. And it is case sensitive.
But our thought process was to add variables that we would KNOW in the formula, but the hackers would not think of.

If I had to do it today, I would probably have a GUID table where I lookup using UserID variants to get a set of GUIDS
to add to each successive hash (for each GUID G: H = Hash(Pwd)+Hash(H)+Hash(G); ) of course with salt to start and end the process… the goal, for me is to have ENORMOUSLY long strings that get hashed into an MD5() [or much better for the final step]

IE, don’t make the user provide the obnoxious length, let the system do it. And it’s OKAY if it is COSTLY on the CPU, in fact, that is even better!

PS: Canary accounts are great, but when users can create their own account with their own password. If they do that before they steal the DB, and then see their account. They can HACK on that one account until they find the path, but our extra stuff will seem like Salt. That is why we felt things should change, and be long per customer.

riking · June 6, 2017, 4:03am

One problem here - how are you going to distinguish the “canary” users from the real users… in a way that isn’t stored in the database, which we’ve already assumed the attacker has taken.

Leto_Atreides · June 6, 2017, 5:17pm

Another interesting protection is to add asymetric encryption.

For my backups, I have generated a GnuPG key. I do my backups using .tar.gz or 7Zip on Windows, then the archive itself is encrypted with the public key. Archive is then moved to the backup server. If the server is compromised, they need the private key to decipher. That private key is kept on a machine not linked to any network. If I have to backup something, I copy a backup on a USB3 disk, go to the deciphering machine, and use the private key there to decrypt the backup, then I do the recovery manually.

I do the same for logs : the logs are compressed, encrypted and sent to a server that does not allow remotes. You must go to the server physically to check logs. A diff between the log on server and backup-log is something that we have automated and it’s checked daily. Because the log server accepts logs coming in, it is protected against denial of service is spammed with logs.

Asymetric encryption is very interesting because any data can be protected on the backup server, and you need the private key to get to the data. If you keep the private key on a machine with no remote, no network which is used to grab data you did backup, it’s quite effective.

kb7iuj · June 6, 2017, 5:31pm

If I were to speculate, I’d say:

Salt and encrypt the username before it hits the database, where neither the salt nor the encryption algorithm sequence is stored in the database,
Use other salts and encryption sequences on canary users.

This assumes that the salting and encryption sequences themselves aren’t compromised, and that Eve isn’t watching traffic to the database looking up hashed usernames at the same time she submits an unhashed username…

jesstelford · June 6, 2017, 10:58pm

Litecoin (and derivatives; Dogecoin, et al) use scrypt as their hashing algorithm, which has greatly driven down the cost of scrypt ASIC hardware. I imagine it wouldn’t be too far fetched to imagine them being repurposed for password cracking.

codinghorror · June 7, 2017, 2:50am

The attacker won’t know or care about the canary user… it would suffice to have a standard username pattern for the canary like “canary639303”. Give it a super easy-but-not-too-easy password like random 8 char numeric. If anyone successfully logs in as that user, email all admins a standard warning template email that their db has definitely been compromised and they should reset all passwords at minimum.

anwarlord · June 7, 2017, 3:58pm

Great article, i use bcrypt with 16 log rounds salt, can you suggest what should be done to make it more secure to hashing attacks mentioned in article?

PatrickHuizinga · June 7, 2017, 7:31pm

Interesting. Last two days I had a “Hack yourself first” workshop by Troy Hunt where he also touched upon credentials.

His suggestion was to let the hashing of the password take X00ms. And have a dedicated hashing machine to prevent DoS attacks.

He also shared an interesting solution DropBox choose for storing their passwords: https://blogs.dropbox.com/tech/2016/09/how-dropbox-securely-stores-your-passwords/

What they do is: AES256(global_pepper, bcrypt(unique_salt, workload_10, SHA512(password)))

The sha is because bcrypt truncates passwords to ‘just’ 72 bytes.
The bcrypt workload of 10 translates to about 100ms for them. They will (or already have) increase that.
The aes encryption is done for defense in depth. The pepper is not stored near the database so attackers will need an extra breach.

Troy Hunt also showed a picture of a legit password cracking company that had ordered a literal pallet full of video cards.

codinghorror · June 7, 2017, 8:43pm

What? That’s terrible!

At up to 4 bytes per character with UTF-8 that’s only 18 characters.

Also @anwarlord you should be referring to work factor, the work factor I see recommended for bcrypt is 10 – but there is really not one correct answer, you should target a “sufficient” amount of time on the server, around 8ms.

Yes, here are his notes:

I don’t think there is too much voodo to password cracking. Here are my basic notes. I usually will run these commands in this order:

hashcat -m [hashtype] -r /usr/share/hashcat/rules/best64.rule hash.txt /usr/share/wordlists/rockyou.txt
hashcat -m [hashtype] -r /usr/share/hashcat/rules/d3adhob0.rule hash.txt /usr/share/wordlists/rockyou.txt
hashcat -m [hashtype] -r /usr/share/hashcat/rules/_NSAKEY.v2.dive.rule hash.txt /usr/share/wordlists/rockyou.txt
hashcat -m [hashtype] -r /usr/share/hashcat/rules/best64.rule hash.txt /usr/share/wordlists/crackstation.txt
hashcat -m [hashtype] -r /usr/share/hashcat/rules/d3adhob0.rule hash.txt /usr/share/wordlists/crackstation.txt
hashcat -m [hashtype] -r /usr/share/hashcat/rules/_NSAKEY.v2.dive.rule hash.txt /usr/share/wordlists/crackstation.txt

What that above means is using hashcat with specific rules lists and wordlists. Using hashcat this way offers a great return for the amount of time that it takes to complete.

Here are the two wordlists I use:

Rockyou

Crackstation (the big one)

The best64 rule comes with hashcat. The other two rulesets I use are:

NSAKEY

d3adhob0

I would create a custom wordlist with cewl and crunch and add them to rockyou/crackstation. I would guess there is a good chance that many of your users use some variation of “discourse” or other key words as their password so this approach should increase the number of cracked passwords.

For this project I was going to do a combination of the hashcat commands I’ve listed, with some custom rulesets and wordlists. I would run rockyou on my local hardware since that would be quick, and crackstation on the AWS instance. Then I would do some mask attacks.

silverbacknet · June 17, 2017, 11:30am

Wouldn’t the highest level of defense in depth be to never even export the passwords, much like many systems (for better or worse) cannot export private RSA keys? If the database needs to be restored, the the passwords need to be reset, end of story. No possibility at all of password compromise by simple export, as opposed to full database dump.