Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing the get_pmcode error #18

Closed
wants to merge 5 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 36 additions & 14 deletions dataretrieval/nwis.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
import re

from dataretrieval.utils import to_str, format_datetime, update_merge, set_metadata as set_md
from .utils import query
from utils import query

WATERDATA_BASE_URL = 'https://nwis.waterdata.usgs.gov/'
WATERDATA_URL = WATERDATA_BASE_URL + 'nwis/'
Expand Down Expand Up @@ -436,9 +436,9 @@ def _iv(**kwargs):
return _read_json(response.json()), _set_metadata(response, **kwargs)


def get_pmcodes(parameterCd='All', **kwargs):
def get_pmcodes(parameterCd = None, parameterNm = None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires a parameter to be defined to run now. It could break programs that expect to be able to run this function with no parameters.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you remove kwargs?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not certain what the difference is between parameterCd and parameterNm. parameterCd can be a list or a string. If it is All then {'group_cd':'%'} is in the payload. In all other scenarios {'parm_nm_cd':param} is added to the payload, unless parameterNm is used, then {'parm_nm_cd':f'%{param}%'}, for partial matches. Why are partial matches checked for only paramterNm? What are the differences between parameterNm and parameterCd? Functionally the only difference I see is partial matches.

Seems like it could be simplified but there is some functionality that is not clear to me.

Copy link
Author

@cjbas22 cjbas22 Apr 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it requires a parameter to run. I want to get Jeff opinion about the best way to manage this. Currently, some functions return nothing and others return all when using the default query because for some functions querying all is slow.

I removed kwargs becasue there is no additional argument that can be used to query with the new service, it is either parameter code, or name. When querying all, we use a different url.

parameterCd allows to query based on a parameter code, or a list of parameter codes. These numbers are introduced as a string or a list of strings (some of these codes start with 0).

parameterNm allows to query using names (e.g., discharge).

Is there a better way to allow querying from name and code in the function?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible to do partial matches for codes, I didn't think it woudl be useful, but I'll check on this!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so the nwis api has a parameter that can take a code or a name. You're splitting it up, which could make it easier for users. Some documentation around the parameters beyond data type could be make this clearer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider whether we always want to do partial matches with no ability to turn it off. I'd try to make it as close to the API as possible. When making it easier for users, try not to sacrifice functionality without reason.

"""
Return a DataFrame containing all NWIS parameter codes.
Return a DataFrame containing NWIS parameter codes.

Note: NWIS may return incorrect column names. Rename them with

Expand All @@ -447,21 +447,43 @@ def get_pmcodes(parameterCd='All', **kwargs):
Parameters (Additional parameters, if supplied, will be used as query parameters).
----------
parameterCd: string or listlike
parameterNm: string
Returns:
DataFrame containing the USGS parameter codes and Metadata as tuple
"""
payload = {'radio_pm_search' : 'pm_search',
'pm_group' : 'All+--+include+all+parameter+groups',
'pm_search' : parameterCd,
'casrn_search' : None,
'srsname_search' : None,
'show' : ['parameter_group_nm', 'casrn', 'srsname','parameter_units', 'parameter_nm'],
'format' : 'rdb'}
url = 'https://help.waterdata.usgs.gov/code/parameter_cd_nm_query?'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

URLs are configured at the top of the page. This line is also duplicated at line 466.

Copy link
Author

@cjbas22 cjbas22 Apr 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll define these url's at the top of the page. These (the one in line 454 and 466) are different urls and used only for this function.


payload.update(kwargs)
url = WATERDATA_URL + 'pmcodes/pmcodes'
if parameterCd is None and parameterNm is None:
raise TypeError('Query must specify a parameter code (parameterCd = ) or name (parameterNm = )')

if parameterCd is not None and parameterNm is not None:
raise TypeError('Query must specify a parameter name or number, not both)')

if parameterNm is None and parameterCd is not None: # querying based on a parameter code or list of codes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have already checked that only parameterCd or parameterNm is defined above. No need to check check again. Just check if parameterNm is defined. It will be easier to read and follow.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes perfect sense, I'll fix this, and the 'else' you mention below.

if isinstance(parameterCd, str): # when a single code is given
if parameterCd.lower() == "all": # if querying ALL a different url is needed
payload = {'fmt':'rdb', 'group_cd':'%'}
url = "https://help.waterdata.usgs.gov/code/parameter_cd_query?"
else: # this is for querying a single parameter
payload = {'parm_nm_cd':parameterCd,'fmt':'rdb'}
if isinstance(parameterCd, list): # Querying with a list of parameters
l = []
for param in parameterCd:
payload = {'parm_nm_cd':param,'fmt':'rdb'}
response = query(url, payload)
if len(response.text.splitlines()) < 10: # empty query
raise TypeError('One of the parameter codes used is not valid, please try a different value')
l.append(_read_rdb(response.text))
return pd.concat(l), _set_metadata(response)

if parameterNm is not None and parameterCd is None: # querying based on a parameter name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could just be an else.

parameterNm ='%{0}%'.format(parameterNm) # update to include partial matches
payload = {'parm_nm_cd':parameterNm,'fmt':'rdb'}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{'fmt': 'rdb'} is always part of the payload. Set it only once, maybe just before running the query below.


response = query(url, payload)
return _read_rdb(response.text), _set_metadata(response, **kwargs)
if len(response.text.splitlines()) < 10: # empty query
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check a response code instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can, the url just returns the same header with no information when there is nothing to return.

return ('The parameter code used is not valid, please try a different value')
return _read_rdb(response.text), _set_metadata(response)


def get_water_use(years="ALL", state=None, counties="ALL", categories="ALL"):
Expand Down Expand Up @@ -713,7 +735,7 @@ def _read_rdb(rdb):
break

fields = re.split("[,\t]", rdb.splitlines()[count])
dtypes = {'site_no': str, 'dec_long_va': float, 'dec_lat_va': float}
dtypes = {'site_no': str, 'dec_long_va': float, 'dec_lat_va': float, 'parm_cd': str, 'parameter_cd':str}

df = pd.read_csv(StringIO(rdb), delimiter='\t', skiprows=count + 2,
names=fields, na_values='NaN', dtype=dtypes)
Expand Down