-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing the get_pmcode error #18
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,7 +12,7 @@ | |
import re | ||
|
||
from dataretrieval.utils import to_str, format_datetime, update_merge, set_metadata as set_md | ||
from .utils import query | ||
from utils import query | ||
|
||
WATERDATA_BASE_URL = 'https://nwis.waterdata.usgs.gov/' | ||
WATERDATA_URL = WATERDATA_BASE_URL + 'nwis/' | ||
|
@@ -436,9 +436,9 @@ def _iv(**kwargs): | |
return _read_json(response.json()), _set_metadata(response, **kwargs) | ||
|
||
|
||
def get_pmcodes(parameterCd='All', **kwargs): | ||
def get_pmcodes(parameterCd = None, parameterNm = None): | ||
""" | ||
Return a DataFrame containing all NWIS parameter codes. | ||
Return a DataFrame containing NWIS parameter codes. | ||
|
||
Note: NWIS may return incorrect column names. Rename them with | ||
|
||
|
@@ -447,21 +447,43 @@ def get_pmcodes(parameterCd='All', **kwargs): | |
Parameters (Additional parameters, if supplied, will be used as query parameters). | ||
---------- | ||
parameterCd: string or listlike | ||
parameterNm: string | ||
Returns: | ||
DataFrame containing the USGS parameter codes and Metadata as tuple | ||
""" | ||
payload = {'radio_pm_search' : 'pm_search', | ||
'pm_group' : 'All+--+include+all+parameter+groups', | ||
'pm_search' : parameterCd, | ||
'casrn_search' : None, | ||
'srsname_search' : None, | ||
'show' : ['parameter_group_nm', 'casrn', 'srsname','parameter_units', 'parameter_nm'], | ||
'format' : 'rdb'} | ||
url = 'https://help.waterdata.usgs.gov/code/parameter_cd_nm_query?' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. URLs are configured at the top of the page. This line is also duplicated at line 466. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll define these url's at the top of the page. These (the one in line 454 and 466) are different urls and used only for this function. |
||
|
||
payload.update(kwargs) | ||
url = WATERDATA_URL + 'pmcodes/pmcodes' | ||
if parameterCd is None and parameterNm is None: | ||
raise TypeError('Query must specify a parameter code (parameterCd = ) or name (parameterNm = )') | ||
|
||
if parameterCd is not None and parameterNm is not None: | ||
raise TypeError('Query must specify a parameter name or number, not both)') | ||
|
||
if parameterNm is None and parameterCd is not None: # querying based on a parameter code or list of codes | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You have already checked that only parameterCd or parameterNm is defined above. No need to check check again. Just check if parameterNm is defined. It will be easier to read and follow. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This makes perfect sense, I'll fix this, and the 'else' you mention below. |
||
if isinstance(parameterCd, str): # when a single code is given | ||
if parameterCd.lower() == "all": # if querying ALL a different url is needed | ||
payload = {'fmt':'rdb', 'group_cd':'%'} | ||
url = "https://help.waterdata.usgs.gov/code/parameter_cd_query?" | ||
else: # this is for querying a single parameter | ||
payload = {'parm_nm_cd':parameterCd,'fmt':'rdb'} | ||
if isinstance(parameterCd, list): # Querying with a list of parameters | ||
l = [] | ||
for param in parameterCd: | ||
payload = {'parm_nm_cd':param,'fmt':'rdb'} | ||
response = query(url, payload) | ||
if len(response.text.splitlines()) < 10: # empty query | ||
raise TypeError('One of the parameter codes used is not valid, please try a different value') | ||
l.append(_read_rdb(response.text)) | ||
return pd.concat(l), _set_metadata(response) | ||
|
||
if parameterNm is not None and parameterCd is None: # querying based on a parameter name | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This could just be an else. |
||
parameterNm ='%{0}%'.format(parameterNm) # update to include partial matches | ||
payload = {'parm_nm_cd':parameterNm,'fmt':'rdb'} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
|
||
response = query(url, payload) | ||
return _read_rdb(response.text), _set_metadata(response, **kwargs) | ||
if len(response.text.splitlines()) < 10: # empty query | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you check a response code instead? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we can, the url just returns the same header with no information when there is nothing to return. |
||
return ('The parameter code used is not valid, please try a different value') | ||
return _read_rdb(response.text), _set_metadata(response) | ||
|
||
|
||
def get_water_use(years="ALL", state=None, counties="ALL", categories="ALL"): | ||
|
@@ -713,7 +735,7 @@ def _read_rdb(rdb): | |
break | ||
|
||
fields = re.split("[,\t]", rdb.splitlines()[count]) | ||
dtypes = {'site_no': str, 'dec_long_va': float, 'dec_lat_va': float} | ||
dtypes = {'site_no': str, 'dec_long_va': float, 'dec_lat_va': float, 'parm_cd': str, 'parameter_cd':str} | ||
|
||
df = pd.read_csv(StringIO(rdb), delimiter='\t', skiprows=count + 2, | ||
names=fields, na_values='NaN', dtype=dtypes) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requires a parameter to be defined to run now. It could break programs that expect to be able to run this function with no parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you remove kwargs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not certain what the difference is between parameterCd and parameterNm. parameterCd can be a list or a string. If it is
All
then{'group_cd':'%'}
is in the payload. In all other scenarios{'parm_nm_cd':param}
is added to the payload, unless parameterNm is used, then{'parm_nm_cd':f'%{param}%'}
, for partial matches. Why are partial matches checked for only paramterNm? What are the differences between parameterNm and parameterCd? Functionally the only difference I see is partial matches.Seems like it could be simplified but there is some functionality that is not clear to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see it requires a parameter to run. I want to get Jeff opinion about the best way to manage this. Currently, some functions return nothing and others return all when using the default query because for some functions querying all is slow.
I removed kwargs becasue there is no additional argument that can be used to query with the new service, it is either parameter code, or name. When querying all, we use a different url.
parameterCd allows to query based on a parameter code, or a list of parameter codes. These numbers are introduced as a string or a list of strings (some of these codes start with 0).
parameterNm allows to query using names (e.g., discharge).
Is there a better way to allow querying from name and code in the function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible to do partial matches for codes, I didn't think it woudl be useful, but I'll check on this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so the nwis api has a parameter that can take a code or a name. You're splitting it up, which could make it easier for users. Some documentation around the parameters beyond data type could be make this clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider whether we always want to do partial matches with no ability to turn it off. I'd try to make it as close to the API as possible. When making it easier for users, try not to sacrifice functionality without reason.