Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing the get_pmcode error #18

Closed
wants to merge 5 commits into from
Closed

Fixing the get_pmcode error #18

wants to merge 5 commits into from

Conversation

cjbas22
Copy link

@cjbas22 cjbas22 commented Apr 7, 2022

This commit addresses issue #17

This commit addresses issue #17
@sblack-usu
Copy link
Collaborator

builds are failing because of a linting error. https://github.com/UCHIC/dataretrieval/runs/5875910660?check_suite_focus=true

It'd be nice if the repo had a script to reformat files, you might have to do it manually or setup the reformatter yourself.

'srsname_search' : None,
'show' : ['parameter_group_nm', 'casrn', 'srsname','parameter_units', 'parameter_nm'],
'format' : 'rdb'}
url = 'https://help.waterdata.usgs.gov/code/parameter_cd_nm_query?'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

URLs are configured at the top of the page. This line is also duplicated at line 466.

Copy link
Author

@cjbas22 cjbas22 Apr 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll define these url's at the top of the page. These (the one in line 454 and 466) are different urls and used only for this function.

if parameterCd is not None and parameterNm is not None:
raise TypeError('Query must specify a parameter name or number, not both)')

if parameterNm is None and parameterCd is not None: # querying based on a parameter code or list of codes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have already checked that only parameterCd or parameterNm is defined above. No need to check check again. Just check if parameterNm is defined. It will be easier to read and follow.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes perfect sense, I'll fix this, and the 'else' you mention below.

@@ -436,9 +436,9 @@ def _iv(**kwargs):
return _read_json(response.json()), _set_metadata(response, **kwargs)


def get_pmcodes(parameterCd='All', **kwargs):
def get_pmcodes(parameterCd = None, parameterNm = None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires a parameter to be defined to run now. It could break programs that expect to be able to run this function with no parameters.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you remove kwargs?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not certain what the difference is between parameterCd and parameterNm. parameterCd can be a list or a string. If it is All then {'group_cd':'%'} is in the payload. In all other scenarios {'parm_nm_cd':param} is added to the payload, unless parameterNm is used, then {'parm_nm_cd':f'%{param}%'}, for partial matches. Why are partial matches checked for only paramterNm? What are the differences between parameterNm and parameterCd? Functionally the only difference I see is partial matches.

Seems like it could be simplified but there is some functionality that is not clear to me.

Copy link
Author

@cjbas22 cjbas22 Apr 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it requires a parameter to run. I want to get Jeff opinion about the best way to manage this. Currently, some functions return nothing and others return all when using the default query because for some functions querying all is slow.

I removed kwargs becasue there is no additional argument that can be used to query with the new service, it is either parameter code, or name. When querying all, we use a different url.

parameterCd allows to query based on a parameter code, or a list of parameter codes. These numbers are introduced as a string or a list of strings (some of these codes start with 0).

parameterNm allows to query using names (e.g., discharge).

Is there a better way to allow querying from name and code in the function?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible to do partial matches for codes, I didn't think it woudl be useful, but I'll check on this!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so the nwis api has a parameter that can take a code or a name. You're splitting it up, which could make it easier for users. Some documentation around the parameters beyond data type could be make this clearer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider whether we always want to do partial matches with no ability to turn it off. I'd try to make it as close to the API as possible. When making it easier for users, try not to sacrifice functionality without reason.

l.append(_read_rdb(response.text))
return pd.concat(l), _set_metadata(response)

if parameterNm is not None and parameterCd is None: # querying based on a parameter name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could just be an else.


if parameterNm is not None and parameterCd is None: # querying based on a parameter name
parameterNm ='%{0}%'.format(parameterNm) # update to include partial matches
payload = {'parm_nm_cd':parameterNm,'fmt':'rdb'}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{'fmt': 'rdb'} is always part of the payload. Set it only once, maybe just before running the query below.

response = query(url, payload)
return _read_rdb(response.text), _set_metadata(response, **kwargs)
if len(response.text.splitlines()) < 10: # empty query
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check a response code instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can, the url just returns the same header with no information when there is nothing to return.

@sblack-usu
Copy link
Collaborator

The tests are still failing with flake8. You need to reformat the file nwis.py.

@cjbas22 cjbas22 mentioned this pull request May 2, 2022
cjbas22 added a commit that referenced this pull request Jul 5, 2022
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants