Skip to content
Daniel Delbarre edited this page May 15, 2024 · 12 revisions

Welcome to the Infrastructure Drop-in wiki!

For our schedule, please visit this page in our wiki 📆!

The Drop-in sessions are currently led by three teams at the Turing:

Each team has their own expertise and below we provide a few examples of what we may be able to help out with and with a few examples of what problems people have come with in the past.

If what you're looking to get help on isn't on this list, do still feel free come along anyway and we'll see what we can do!

What the Data Wrangling team can help with

  • Best practices for structuring and organising data, including:
    • How to restructure data
    • Adopting and developing data standards
    • Advantages and disadvantages of using different formats and filetypes
    • Cleaning data
  • Approaches to developing trustworthy datasets, including:
    • Capturing and recording metadata
    • Producing effective data documentation
  • Data and code versioning for reproducibility
  • Assessing data quality issues and how to deal with them
  • Integrating data from multiple sources, including:
    • Data linkage
    • Standardisation
    • Harmonisation
  • Considering security and safety issues around data, and how to make datasets safer
    • Including applying pseudonymisation and anonymisation
  • How to prepare your data for sharing with collaborators and the research community
  • General coding tips (e.g., Python, R, SQL) and best practices

What PMU can help with

What REG can help with

  • General programming problems, including:
    • Identifying and fixing bugs
    • Setting up Python packages and environments
  • Cloud computing questions, including:
    • How to get started on Microsoft Azure
    • How to deploy code to the cloud
    • Cloud computing and storage costs.
    • Getting access to GPU VMs on Microsoft Azure
  • Research Data Science issues, including:
    • Data collection and selection process
    • Experimental design (methods, baselines, metrics, ablation studies)
    • Moving from a general idea to a machine learning pipeline
  • Software sustainability
    • Best practices for testing, managing and packaging your code
  • Reproducible research
    • Ensuring software is developed to support long-term reproducibility
    • Including techniques for packaging and archiving code and data
  • High Performance Computing (HPC) support:
    • HPC services resources available at the Turing
    • HPC performance (parallellism, libraries, compilers)

What TPS can help with

  • Open Science
    • Open and FAIR data
    • Working open source
    • Open publishing/Open access
    • Open methods
    • Reproducibility
  • Community Building
    • Community engagement
    • Stakeholder strategy
    • Community communications
    • Event planning and facilitation
    • Embedding EDIA
  • Citizen science
  • Ethics and responsible research and innovation
  • Project management
  • Research skills
    • Academic writing
    • Writing for wider audiences
Clone this wiki locally