Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EXAMPLE of a Data access request]: (Hypothetical) MedLink project --- see this for inspiration if you are requesting data #221

Open
3 of 5 tasks
Ironholds opened this issue Jul 13, 2023 · 1 comment
Assignees

Comments

@Ironholds
Copy link
Collaborator

Ironholds commented Jul 13, 2023

What is the name of your project?

MedLink

What is the purpose of your project?

The purpose of our project is to compare the computational requirements for realistic probabilistic record linkage at different scales, specifically focusing on thousands, millions, and billions of records. Probabilistic record linkage is a vital technique used in various fields such as healthcare, social sciences, and data analytics, where accurate and efficient matching of records from multiple datasets is crucial. By conducting this comparative analysis, we aim to provide insights into the computational challenges and resource requirements associated with scaling probabilistic record linkage algorithms to handle large-scale datasets. This research will contribute to the development of scalable solutions for record linkage and inform decision-making regarding data management and processing strategies.

In particular, we are focusing on record linkage between the US Census and public health datasets, a linkage that is commonly made due to how useful census data (on population distributions, densities and demographics) can be to work out the accuracy and potential biases in public health data.

Who is involved in the project? Which of these people will have direct access to the pseudopeople input data?

This project is being run by Prof. Josephine Bloggs (Nonesuch University, Texas) in collaboration with Drs Sue Denim and Dee Plume at the Centers for Disease Control (Atlanta). Drs Denim and Plume are experts in electronic health record (EHR) data, and will advise Prof. Bloggs on how to adapt pseudopeople's existing simulated data to be appropriately similar to the sort of noisy real-world EHR data which might be linked to real census data in future work, based on the results of this project. As a consequence, all of them will have access to both the pseudopeople input data and the EHR data.

What funding is the project under? What expectations with respect to open access and access to data come with that funding?

Our project is funded by the National Institutes of Health, for whom we have written a Data Management and Sharing Plan. Essentially, this states that we have an obligation to share the final dataset used for the analysis. This is not the same as sharing the pseudopeople data, or the healthcare data - instead, it is simply those variables and rows from the merged dataset that are used in the final analysis.

We commit to:

  • be responsive to further questions from interested parties
  • deprecate and replace our version of the pseudopeople input data when a new version is released

What data would you like to request?

  • Full US
  • Rhode Island
  • Other (may not be available immediately)

Other data - more explanation

No response

@aflaxman
Copy link
Member

In this hypothetical example, the one thing I think we should edit is about the explanation of who will have direct access to the data. Instead of positing that "Drs Denim and Plume are tasked with preparing the public health data, which is then linked by Prof. Bloggs" let's make them advisors who don't directly access the public health data, either. (Because how are they going to prepare the public health data to link to simulated census data without knowing about all of the simulated people in the census data?)

So "Drs Denim and Plume are experts in electronic health record (EHR) data, and will advise Prof. Bloggs on how to adapt pseudopeople's existing simulated data to be appropriately similar to the sort of noisy real-world EHR data which might be linked to real census data in future work, based on the results of this project."

@aflaxman aflaxman changed the title [Data access request]: MedLink project [EXAMPLE of a Data access request]: (Hypothetical) MedLink project --- see this for inspiration if you are requesting data Oct 31, 2023
@Ironholds Ironholds mentioned this issue Jul 10, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants