GitHub - themains/guesspass: Deep learning seq-to-seq models to predict password from username using leaked password data

guesspass: predict password from username

Building on our work that uses ~881M leaked passwords to build a character-level password generator as a way to assess password strength, we more directly approach the problem of how predictable is the password in a setting where we know the username. We build a supervised model that uses the username to predict the password. We then test the model to see how well we can predict the password among unseen usernames, assessing the prediction accuracy with edit distance, etc., kinds of metrics.

In our data, we have the same username appear multiple times. Multiple entries for the same username can exist for three reasons: 1. duplicates (as we synthesize over multiple data breaches), 2. people sign up to multiple accounts with the same username, esp. a commercial email username like gmail and these are all separate accounts, and 3. default for separate accounts may be the same username, e.g., [email protected]. We could split the data into train/test by username to address #1 but in the experiments described below.

Experiments

Random Sample (9GB)

We do a better job cracking 10,000 test set passwords using the top 100 most common passwords than the model (see here). Of the 10,000 randomly selected usernames, lookup only yields 1700 hits. On the usernames we are able to lookup, the average min. edit distance over 100 tries from the model is 7.28 while for the lookup, it is 6.23.

Common Usernames

We filter to common usernames (usernames in our database more than 100 times).

We randomly split the data into train and test and estimate a seq-to-seq model. We measure performance using edit distance and test against the baseline of a simple lookup into the training dataset.

Notebooks

Authors

Rajashekar Chintalapati and Gaurav Sood

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
model		model
notebooks		notebooks
results		results
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

guesspass: predict password from username

Experiments

Random Sample (9GB)

Common Usernames

Authors

About

Releases

Packages

Contributors 2

Languages

themains/guesspass

Folders and files

Latest commit

History

Repository files navigation

guesspass: predict password from username

Experiments

Random Sample (9GB)

Common Usernames

Authors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages