Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate generation of anonymized data set from production #1642

Open
daveoconnor opened this issue Feb 25, 2025 · 2 comments
Open

Automate generation of anonymized data set from production #1642

daveoconnor opened this issue Feb 25, 2025 · 2 comments
Assignees
Labels
Feature New feature or request

Comments

@daveoconnor
Copy link
Collaborator

daveoconnor commented Feb 25, 2025

With changes made to how commit authors' names are displayed in reports and in the avatars across the site (e.g. the list of contributors on the releases page), the only way developers can get an accurate representation of how the production site will display commit author's names is by having data that mostly matches the production site.

The reason for this is we're using the github url from the stored commit data to associate commit data with the user account based on the github_username data from the Users table. This github_username data is only set when a user sets it, or connects the account via SSO where it is inserted automatically.

It would be useful to have an automated process that dumps the production data and anonymizes sensitive data (e.g. any passwords) that we as devs could download, maybe from a generated link placed into passbolt?

@daveoconnor daveoconnor added the Feature New feature or request label Feb 25, 2025
@rbbeeston rbbeeston moved this to Accepted in website-v2 Feb 25, 2025
@rbbeeston
Copy link
Member

@sdarwin I'm sending this to you but feel free to re-assign if someone else should do it.

@sdarwin
Copy link
Collaborator

sdarwin commented Feb 25, 2025

@daveoconnor do you agree Django does not store passwords in the database as clear text.

Try this and let me know your opinion of the strategy.

gcloud storage ls --project=boostorg-project1 gs://boostbackups/db1/daily/boost_production*
gcloud storage cp --project=boostorg-project1 gs://boostbackups/db1/daily/boost_production.db1-2.2025-02-25-08-00-01.dump .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature New feature or request
Projects
Status: Accepted
Development

No branches or pull requests

3 participants