Skip to content

Deep learning seq-to-seq models to predict password from username using leaked password data

Notifications You must be signed in to change notification settings

themains/guesspass

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

guesspass: predict password from username

Building on our work that uses ~881M leaked passwords to build a character-level password generator as a way to assess password strength, we more directly approach the problem of how predictable is the password in a setting where we know the username. We build a supervised model that uses the username to predict the password. We then test the model to see how well we can predict the password among unseen usernames, assessing the prediction accuracy with edit distance, etc., kinds of metrics.

In our data, we have the same username appear multiple times. Multiple entries for the same username can exist for three reasons: 1. duplicates (as we synthesize over multiple data breaches), 2. people sign up to multiple accounts with the same username, esp. a commercial email username like gmail and these are all separate accounts, and 3. default for separate accounts may be the same username, e.g., [email protected]. We could split the data into train/test by username to address #1 but in the experiments described below.

Experiments

Random Sample (9GB)

We do a better job cracking 10,000 test set passwords using the top 100 most common passwords than the model (see here). Of the 10,000 randomly selected usernames, lookup only yields 1700 hits. On the usernames we are able to lookup, the average min. edit distance over 100 tries from the model is 7.28 while for the lookup, it is 6.23.

Common Usernames

We filter to common usernames (usernames in our database more than 100 times).

We randomly split the data into train and test and estimate a seq-to-seq model. We measure performance using edit distance and test against the baseline of a simple lookup into the training dataset.

Notebooks

Authors

Rajashekar Chintalapati and Gaurav Sood

About

Deep learning seq-to-seq models to predict password from username using leaked password data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published