Skip to content

Phylografter user interviews

Jim Allman edited this page Sep 13, 2013 · 4 revisions

Sep 12, 2013, Jim Allman [email protected]

This is a summary of interviews with five of the principal users of phylografter (listed with their respective taxonomic areas):

These notes will be organized broadly by topic. Detailed interview notes are available upon request.

Interviews were conducted via Skype, and focused on

  • best practices and lessons learned;
  • ideas for needed features; and
  • recommended roles and workflow for a larger community

We also explored some of the "pain points" reported in earlier curation feedback and comments from the Nov 2012 Curation Sprint.

The term "phylografter" will be used here to refer to both the existing phylografter tool and possible replacements with similar functionality.

BEST PRACTICES & LESSONS LEARNED

Experienced phylografter users can work quickly, but the learning curve is very difficult for new users. Key issues include: * general confusion about tasks and outcomes * some input fields are confusing (where will this information appear? and to whom?) * prescribed sequence of operations is not obvious

Experienced users consider study submission to be a one-time task, as opposed to one that needs repeated sessions. But most felt that a first-time or occasional user might need to return to the tool more than once, as they gather required data or groom trees for re-submission. In addition, OTU mapping for large studies will likely require multiple sessions.

One use case that involves multiple sessions is when a curator is handling a submission, but they're asking for data from the study's author. This typically takes a few rounds, during which the study will be in an incomplete state. (This would be reflected in the quality / fitness indicator discussed elsewhere.)

There's been a large accumulation of bad study data (tests, experiments, failed submissions, etc.) We should avoid this by clearly distinguishing test data and allowing sensible deletion of unwanted stuff. Perhaps this is handled under the topic of privacy / ownership below.

INTERNAL COMMUNICATION (AMONG CURATORS AND THE SYNTHESIS TEAM)

We've identified some need for communication among the community of curators and those managing synthesis. To date, this has been handled via email and informally using phylografter's tagging facility, eg, to mark some trees for deletion, or as recommended for synthesis.

Ideally we would provide some form of shared institutional memory for

  • submissions in progress
  • pending data requests from authors (incl. reminders?)
  • pending changes to taxonomy (required for OTU mapping)
  • noteworthy judgment calls or best guesses in each study
  • who's worked in each study (and when, and what they did)

Some of this information should appear in the status app. But we should also consider adding internal notes to each study -- at minimum, a text log with timestamped entries -- or using an external forum (or ticket management system?) to capture this information.

Of course, curatorial notes could also be stored as Nexson metadata, in which case we should accomodate attaching curator's notes to any study, file, tree or node. In this case, we should discuss whether these notes are static and isolated, or they would allow discussion threads. [ADD THESE IDEAS TO ANNOTATION / CONTROLLED VOCABULARY DOCUMENT]

NEEDED FEATURES

All agreed that first-time users would benefit from a more intuitive (self-descriptive) user interface.

Tutorials

  • Easy/minimal version is most important
  • More advanced example, or just a link to help pages?

All users were interested in our notion of an ever-present "fitness" or quality indicator for the current study.

Easy "embargo" features for pre-publication data, and a simple publication trigger.

ROLES & PERMISSIONS

Users have conflicting feelings about whether studies and trees should be "owned" by the original submitter. While everyone recognizes that some corrections might we welcome, as well as help with OTU mapping, they're concerned about possible damage in the event that:

  • tree structure is changed (this is probably off-limits)
  • taxa are mapped incorrectly by an unqualified curator
  • a rival might vandalize data, perhaps in subtle ways

While volunteer effort is appreciated, the concensus is that the most qualified curator/editor for any study is an author.

PAIN POINTS

Delays and latency in phylografter are very irritating. These are most pronounced during OTU mapping, but there are occasional serious delays (or server timeouts) during general use.

A major frustration (esp. in microbial studies) is the need to pre-approve new taxon names in order to map trees. This creates a slow and complex bottleneck in the submission process. One suggested fix is to provide tentative mapping, ie, "pending" submissions whose OTU mapping will periodically be tested against the latest taxonomies. It would also be great to incorporate new-taxon requests inline in phylografter.

USING OTHER TOOLS

Some curators have used third-party tools to groom tree data before submission (or re-submission). For general cleanup, Mesquite is a recommended tool. Search-and-replace features (eg, in a text editor) are sometimes used to convert lab-specific taxon names to more standard forms, for easier OTU mapping.

Note that there's a sort of convenience threshold for doing this; it's not worth it for just a few unmapped taxa.

A preferred solution might be to incorporate some of these taxon-name transformations within the web UI, perhaps in a way that allows quick round-trip testing to see if mapping is now possible.

One curators (Chris Owen) has used external tools for tracking his work in phylografter. His spreadsheet suggests the kind of information we might show in a curator's personal "dashboard", for each study:

  • study number
  • focal group
  • first (or two) authors
  • date uploaded
  • rooted? or in-group designation
  • how many taxa (and how many mapped)
  • was double-checked?

RELATED QUESTIONS

Currently we ask for supplemental (non-tree) data files when a study is submitted. Some users believe this is a barrier to contribution from some study authors, who might view alignment data and other files as "crown jewels". If these are truly needed for synthesis, we should explain this; otherwise, we might encourage greater participation by relaxing this requirement.

RELATED INSIGHTS

There was general agreement that a status app (currently under development) would be very useful. This should make it easy to list all studies in the system, see the relative status / quality of each, and help someone to see if a study is already in the works (to avoid duplication of effort). It's not clear whether the status app should also show private studies.

COMMUNITY OF PRACTICE

We should consider helping new curators to learn from those with more experience, and for all to benefit from lessons learned and best practices.

This might also involve rewarding good work with increased visibility in the system, perhaps using a "leader board" or highlighting exemplary curation efforts on the OpenTree site. This might be a step toward rewarding the real work being done here and encouraging more.

--

Clone this wiki locally