[EXAMPLE of a Data access request]: (Hypothetical) MedLink project --- see this for inspiration if you are requesting data #221

Ironholds · 2023-07-13T17:31:25Z

What is the name of your project?

MedLink

What is the purpose of your project?

The purpose of our project is to compare the computational requirements for realistic probabilistic record linkage at different scales, specifically focusing on thousands, millions, and billions of records. Probabilistic record linkage is a vital technique used in various fields such as healthcare, social sciences, and data analytics, where accurate and efficient matching of records from multiple datasets is crucial. By conducting this comparative analysis, we aim to provide insights into the computational challenges and resource requirements associated with scaling probabilistic record linkage algorithms to handle large-scale datasets. This research will contribute to the development of scalable solutions for record linkage and inform decision-making regarding data management and processing strategies.

In particular, we are focusing on record linkage between the US Census and public health datasets, a linkage that is commonly made due to how useful census data (on population distributions, densities and demographics) can be to work out the accuracy and potential biases in public health data.

Who is involved in the project? Which of these people will have direct access to the pseudopeople input data?

This project is being run by Prof. Josephine Bloggs (Nonesuch University, Texas) in collaboration with Drs Sue Denim and Dee Plume at the Centers for Disease Control (Atlanta). Drs Denim and Plume are experts in electronic health record (EHR) data, and will advise Prof. Bloggs on how to adapt pseudopeople's existing simulated data to be appropriately similar to the sort of noisy real-world EHR data which might be linked to real census data in future work, based on the results of this project. As a consequence, all of them will have access to both the pseudopeople input data and the EHR data.

What funding is the project under? What expectations with respect to open access and access to data come with that funding?

Our project is funded by the National Institutes of Health, for whom we have written a Data Management and Sharing Plan. Essentially, this states that we have an obligation to share the final dataset used for the analysis. This is not the same as sharing the pseudopeople data, or the healthcare data - instead, it is simply those variables and rows from the merged dataset that are used in the final analysis.

We commit to:

be responsive to further questions from interested parties
deprecate and replace our version of the pseudopeople input data when a new version is released

What data would you like to request?

Full US
Rhode Island
Other (may not be available immediately)

Other data - more explanation

No response

aflaxman · 2023-07-14T20:26:40Z

In this hypothetical example, the one thing I think we should edit is about the explanation of who will have direct access to the data. Instead of positing that "Drs Denim and Plume are tasked with preparing the public health data, which is then linked by Prof. Bloggs" let's make them advisors who don't directly access the public health data, either. (Because how are they going to prepare the public health data to link to simulated census data without knowing about all of the simulated people in the census data?)

So "Drs Denim and Plume are experts in electronic health record (EHR) data, and will advise Prof. Bloggs on how to adapt pseudopeople's existing simulated data to be appropriately similar to the sort of noisy real-world EHR data which might be linked to real census data in future work, based on the results of this project."

Ironholds added the data access label Jul 13, 2023

Ironholds assigned aflaxman and Ironholds Jul 13, 2023

aflaxman changed the title ~~[Data access request]: MedLink project~~ [EXAMPLE of a Data access request]: (Hypothetical) MedLink project --- see this for inspiration if you are requesting data Oct 31, 2023

Ironholds mentioned this issue Jul 10, 2024

Data Access Request #426

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EXAMPLE of a Data access request]: (Hypothetical) MedLink project --- see this for inspiration if you are requesting data #221

[EXAMPLE of a Data access request]: (Hypothetical) MedLink project --- see this for inspiration if you are requesting data #221

Ironholds commented Jul 13, 2023 •

edited

Loading

aflaxman commented Jul 14, 2023

[EXAMPLE of a Data access request]: (Hypothetical) MedLink project --- see this for inspiration if you are requesting data #221

[EXAMPLE of a Data access request]: (Hypothetical) MedLink project --- see this for inspiration if you are requesting data #221

Comments

Ironholds commented Jul 13, 2023 • edited Loading

What is the name of your project?

What is the purpose of your project?

Who is involved in the project? Which of these people will have direct access to the pseudopeople input data?

What funding is the project under? What expectations with respect to open access and access to data come with that funding?

We commit to:

What data would you like to request?

Other data - more explanation

aflaxman commented Jul 14, 2023

Ironholds commented Jul 13, 2023 •

edited

Loading