Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data access request]: Census Bureau Provisioning (Cloud & OnPrem) #476

Open
3 of 5 tasks
Krista-Park opened this issue Nov 4, 2024 · 3 comments
Open
3 of 5 tasks
Assignees

Comments

@Krista-Park
Copy link

What is the name of your project?

Census Bureau Provisioning

What is the purpose of your project?

The purpose of this request is to ingest the full pseudopeople dataset into the Census Bureau’s research computing environments, which includes both on premises servers and cloud-hosted computing, to allow Census Bureau users, both internal and FSRDC users, to request the data set for experiments requiring a population-level dataset designed for testing record linkage / entity resolution software and models.

Who is involved in the project? Which of these people will have direct access to the pseudopeople input data?

Census Bureau users will be able to request the dataset via the Census Bureau’s Data Management System (DMS). Krista Park and Anup Mathur are the internal owners of the pseudopeople data set and must approve requests to use the data. They will inform the IHME pseudopeople team of approved provisioning of the dataset.

What funding is the project under? What expectations with respect to open access and access to data come with that funding?

Data requestors may be completing projects funded by the US Federal Government or other research partners using the Federal Statistical Research Data Centers. A current list of partner and collaborating agencies for FSRDC projects is available here: https://www.census.gov/about/adrm/fsrdc/partner-and-collaborating-agencies.html .

As stated in the Census Bureau Scientific Integrity Policy https://www.census.gov/content/dam/Census/about/about-the-bureau/policies_and_notices/scientificintegrity/Census_Bureau_Scientific_Integrity_Policy.pdf, the Census Bureau is committed to ensuring the free flow of scientific information. Although this commitment quality, accuracy and transparency in its communication of scientific findings requires publication “information on the specific approach, data, and models used to develop such scientific conclusions, including clear explanation of inferential procedures and, where appropriate, probabilities associated with a range of projections or scenarios, this publication does not require the publication of all source data. For example, in the instance of a paper relying on pseudopeople, authors would be instructed to report the version of the pseudopeople used for their analysis, their analytical process, and allowed to publish aggregate data and examples but would not be allowed to further disseminate the pseudopeople data set.

We commit to:

  • be responsive to further questions from interested parties
  • deprecate and replace our version of the pseudopeople input data when a new version is released

What data would you like to request?

  • Full US
  • Rhode Island
  • Other (may not be available immediately)

Other data - more explanation

No response

@Ironholds
Copy link
Collaborator

Looks pretty good! A quick question; if this is going to be a sustained use, what's the process for updating and regenerating the dataset as the code for generating it changes? How often do you expect this to occur?

@Krista-Park
Copy link
Author

We are provisioning the full dataset and allowing users to generate their own experimental datasets. When a new version of pseudopeople is released, we will work with users to time replacing the dataset and help them update the sets they have generated for their use to the new dataset.

@Ironholds
Copy link
Collaborator

Gotcha! @aflaxman does that align with how the data generation/your thinking on it works? Fine by me if so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants