Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SleepStaging returns a yasa.Hypnogram instance #127

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

raphaelvallat
Copy link
Owner

@raphaelvallat raphaelvallat commented Dec 30, 2022

Work in progress / Do not review

This PR changes the return type of SleepStaging.predict() from a numpy.array to the newly-created yasa.Hypnogram class.

Remaining tasks

  • Rebase to master once Default dtype of pandas.Series is now a categorical #126 is merged
  • Update tests
  • Update changelog
  • Update string representation of SleepStaging
  • Update FAQ and quickstart
  • Warning in SleepStaging.predict that the hypnogram values have changed, specifically: "W" -> "WAKE", "R" -> "REM"!
  • Add example on how to add as an MNE.Annotations

@raphaelvallat raphaelvallat added the enhancement 🚧 New feature or request label Dec 30, 2022
@raphaelvallat raphaelvallat self-assigned this Dec 30, 2022
@remrama
Copy link
Collaborator

remrama commented Dec 31, 2022

unless we decide to switch to "W" and "R" as the default string in yasa.Hypnogram

I think it should stay as full spellings WAKE and REM, so SleepStaging should conform to Hypnogram. I see no huge benefit to abbreviations, and the full spellings are clearer, especially in the lower n_stages sitatuations (and should be consistent across).

Add example on how to add as an MNE.Annotations

Do you think it's possible to add a probabilities attribute to yasa.Hypnogram? As opposed to returning them separately? This would be very useful for the upcoming evaluation module and some plotting. They will both need probabilities so I see no reason to not attach them here. Plus, I think probability estimates are going to become kind of the "norm" for hypnograms moving forward, so people will want to work with them frequently. One could even make a yasa.Hypnogram with probabilities derived from multiple human scorers. Seems useful.

Of course in this case they could just be added to as_annotations output, maybe with an include_probabilities boolean argument.

@raphaelvallat
Copy link
Owner Author

SleepStaging should conform to Hypnogram

Good point. I'll make the change.

Do you think it's possible to add a probabilities attribute to yasa.Hypnogram?

Yep, also a good idea. There's going to be some tricky edge cases. For example, when using upsample, should we also upsample and interpolate the proba? Or just pass proba=None. Also, should we move the SleepStaging.plot_predict_proba to yasa.Hypnogram.plot_proba instead. But regardess of these questions I can include in this PR a simple implementation of a Hypnogram(..., proba=None) parameter

@codecov-commenter
Copy link

codecov-commenter commented Dec 31, 2022

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Comparison is base (6b37c63) 92.59% compared to head (3f92f8b) 92.63%.

Files Patch % Lines
yasa/staging.py 83.33% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #127      +/-   ##
==========================================
+ Coverage   92.59%   92.63%   +0.04%     
==========================================
  Files          23       23              
  Lines        3104     3136      +32     
==========================================
+ Hits         2874     2905      +31     
- Misses        230      231       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@remrama
Copy link
Collaborator

remrama commented Dec 31, 2022

should we also upsample and interpolate the proba?

Oooh, you're right, that's a weird one. I think passing None is reasonable (definitely for now, but probably forever too). I don't imagine there are many situations where someone needs the probabilities for the upsampled data?? 🤔

Also, should we move the SleepStaging.plot_predict_proba to yasa.Hypnogram.plot_proba instead?

Yes definitely. As you said, keep it where it is for now. I can submit a separate PR for that later.

@remrama
Copy link
Collaborator

remrama commented Jan 1, 2023

@raphaelvallat I don't want to throw any wrenches in at this point, but I just considered this. Maybe you can tell me why it's a bad idea?

If users aren't going to need the yasa.SleepStaging.predict_probas and plot_probas methods, is there a reason to even work with the yasa.SleepStaging class instance directly? I mean, would it be simpler (with no cost) to just have a yasa.predict_hypnogram function that uses the SleepStaging class under-the-hood to return a yasa.Hypnogram?

I'm asking now, because if we liked this, then this would be a nice time to implement it without breaking the API, because you could actually keep the normal behavior of SleepStaging.predict to return a numpy array, then just create the Hypnogram instance in the predict_hypnogram function. I imagine you could always keep the SleepStaging interface, for advanced users who want the .get_features attributes or something.

@raphaelvallat
Copy link
Owner Author

It's not a bad idea at all. I'll have to think more about it. My initial preference would be to vote no, mostly because I'm being lazy and because I will have much less time to work on YASA in the coming weeks, but also:

  • yasa.SleepStaging class is probably the most widely-used module in YASA and I don't want to remove it. I think the flexibility of the (explicit) class implementation is a big advantage. I know quite a lot of people use the get_features method too.
  • I don't want to deprecate predict_proba and plot_predict_proba right away. I guess more generally for the next release I don't want to make big changes to the SleepStaging module. Changing the return type of predict to a yasa.Hypnogram instance is already going to be a hassle for most users.

That said, I'm not against implementing the shortcut yasa.predict_hypnogram in a future release, as long as we don't deprecate the SleepStaging class.

@raphaelvallat raphaelvallat marked this pull request as ready for review January 8, 2023 20:58
@raphaelvallat
Copy link
Owner Author

@remrama PR ready for review! I still need to update the changelog and FAQ but I think this can be done in a final PR before we release v0.7 — together with some cleaning of the existing notebooks. Also, if we decide to switch to Pooch than most of the examples might change anyway.

@raphaelvallat
Copy link
Owner Author

Btw I'm removing the FutureWarning in yasa.hypno_int_to_str. I think a lot of users are going to use it to convert their integer hypnograms to a yasa.Hypnogram instance and it's annoying to have a warning every time. I was actually in that situation just a few days ago when updating some of my code.

3d5c72e

@raphaelvallat raphaelvallat mentioned this pull request Jan 8, 2023
7 tasks
@remrama
Copy link
Collaborator

remrama commented Jan 8, 2023

I'm moving to this after #130 is finalized. Then we can swap reviews 🎉

Copy link
Collaborator

@remrama remrama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raphaelvallat I forgot how rowdy this PR is. This Hypnogram class is soooo nice! Great idea and clear implementation.

I think you can merge this as soon as you'd like. Once it is in together with the evaluation module we can start to play around with them simultaneously and work out any kinks there.

My comments are minor, though I'll emphasize that I think the assertion messages are important. I love this new object-oriented approach to hypnograms, but it's also true that many users are transitioning to YASA and Python at the same time, and the object-oriented approach might be a bit more intimidating. Maybe not, but if so, a few helpful messages here-and-there might go a long way.

@@ -228,6 +232,9 @@ def __init__(self, values, n_stages=5, *, freq="30s", start=None, scorer=None):
assert isinstance(
scorer, (type(None), str, int)
), "`scorer` must be either None, or a string or an integer."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can remove "or" from "or a string", as in a few lines above

@@ -215,7 +219,7 @@ class Hypnogram:
'%REM': 8.9713}
"""

def __init__(self, values, n_stages=5, *, freq="30s", start=None, scorer=None):
def __init__(self, values, n_stages=5, *, freq="30s", start=None, scorer=None, proba=None):
assert isinstance(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right after checking that values is a list, I would check they are all strings, with a useful error message telling the use that "yasa expects strings for each epoch now". The first thing I tried when playing around was:

>>> yasa.Hypnogram([1, 2, 3, 4])
# AttributeError: 'int' object has no attribute 'upper'

I'm quite sure this will happen for many users when upgrading to 0.7, and there should probably be as much hand-holding as possible.

@@ -228,6 +232,9 @@ def __init__(self, values, n_stages=5, *, freq="30s", start=None, scorer=None):
assert isinstance(
scorer, (type(None), str, int)
), "`scorer` must be either None, or a string or an integer."
assert isinstance(
proba, (pd.DataFrame, type(None))
), "`proba` must be either None or a pandas.DataFrame"
if n_stages == 2:
accepted = ["W", "WAKE", "S", "SLEEP", "ART", "UNS"]
mapping = {"WAKE": 0, "SLEEP": 1, "ART": -1, "UNS": -2}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A similar hand-holding opportunity is here, where I think it'd be nice to inform the user that they need to specify n_stages if they are trying to do specify a hypnogram that has less than 5 possible stages.

>>> yasa.Hypnogram(["S", "W"])
# AssertionError: ['S' 'W'] do not match the accepted values for a 5 stages hypnogram: ['WAKE', 'W', 'N1', 'N2', 'N3', 'REM', 'R', 'ART', 'UNS']

Sidenote, and minor opinion, but I think the wording throughout should be "N-stage hypnogram" instead of "N stages hypnogram". It's varied in a few places, but I think stages should be singular and the hyphen should be there. To me it seems clearer that it's referring to the number of potential stages (which for many sleep researchers, having <5 is not really an obvious need).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I think adding the word "unique" to the property string would be helpful: <Hypnogram | 2 epochs x 30s (1.00 minutes), 2 unique stages>

@@ -74,6 +74,10 @@ class Hypnogram:
scorer : str
An optional string indicating the scorer name. If specified, this will be set as the name
of the :py:class:`pandas.Series`, otherwise the name will be set to "Stage".
proba : :py:class:`pandas.DataFrame`
An optional dataframe with the probability of each sleep stage for each epoch in hypnogram.
Each row must sum to 1. This is automatically included if the hypnogram is created with
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might not be the best place to comment this, but regarding the proba that is automatically returned from SleepStaging, should this not have a Hynogram.scorer attached to it? Maybe just "YASA" or even something with the version number like "YASA-vX.X"? Or I'm not sure if you have a separate name and/or version control for the underlying sleep stager.

But in any case, I think the usefulness of the scorer attribute is that it allows for convenient comparisons, especially if someone is saving their data and looking back at it after a new stager comes out. Note also that the evaluation module plotting functions often take advantage of this attribute for plotting and dataframe organization, so using it in the returned dataframe might help out in those instances too.

@@ -280,6 +300,7 @@ def __init__(self, values, n_stages=5, *, freq="30s", start=None, scorer=None):
self._labels = labels
self._mapping = mapping
self._scorer = scorer
self._proba = proba

def __repr__(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what causes this, but note the differences in iPython:

In: pd.read_csv
Out: <function pandas.io.parsers.readers.read_csv(filepath_or_buffer: 'FilePath ...

In: h.as_int
Out:
<bound method Hypnogram.as_int of <Hypnogram | 3 epochs x 30s (1.50 minutes), 3 stages>
 - Use `.hypno` to get the string values as a pandas.Series
 - Use `.as_int()` to get the integer values as a pandas.Series
 - Use `.plot_hypnogram()` to plot the hypnogram
See the online documentation for more details.>

I often hit enter on methods without calling them, just to get a quick (if ugly) view of the arguments and such. When I do this on the Hypnogram methods it just returns the object string representation. I'm not sure what is necessary to switch it over.

for each epoch in hypnogram.
"""
return self._proba

# CLASS METHODS BELOW

def as_annotations(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MNE annotations glossary link in the docstrings is dead now. And that actually highlights an anxiety of mine about this nice and convenient as_annotations method; I really like using the BIDS standards for events files, but to-date I've struggled with comprehending the terminology in MNE between events and annotations. They have events_to_annotations and annotations_to_events methods, and I'm sure that if I look at the docs today there will be a clear differentiation, but as I recall I am consistently "confident-and-subsequently-confused" about how they handle this.

So I think this method is useful but I don't know about the name of it. Maybe it's just as_events for now? Because that is also consistent with BIDS, which is just looking for an events dataframe, and that's what this is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement 🚧 New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants