-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support for ES 7 (fixes #155) #161
Conversation
@@ -0,0 +1,96 @@ | |||
{"index":{"_index":"shakespeare","_id":2}} | |||
{"line_id":3,"play_name":"Henry IV","speech_number":"","line_number":"","speaker":"","text_entry":"Enter KING HENRY, LORD JOHN OF LANCASTER, the EARL of WESTMORELAND, SIR WALTER BLUNT, and others"} | |||
{"index":{"_index":"shakespeare","_id":3}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was necessary to make this new file because _type
was removed in ES7. Having documents with multiple types within one index is now not allowed and attempts to set _type
will raise errors complaining about multiple document types (unless you explicitly include a default document type in your mapping).
It's possible this file could be irrelevant and that es7_mapping.json
could just have some argument added to it that specifics the default document type to line
, but I couldn't figure out how to do that in 5 minutes and this worked. Definitely would like to come back to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, want to make an issue for that after this is merged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
created #167
NEWS.md
Outdated
|
||
## Features | ||
|
||
### Full support for ES7.x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just "Support" without "Full"? Since new features aren't accounted for yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah good call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still have to do this I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ugh you're right. I want Bitbucket tasks back :(
expect_true(data.table::is.data.table(outDT)) | ||
expect_true(nrow(outDT) == 4) | ||
expect_true(nrow(outDT) == 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this is failing checks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha yeah I kind of thought it might! This is a weird inconsistency between ES <7 and Es 7. I think that what's happening is that previously you were able to have multiple document types in one index and that + something weird in the way we write test data was causing duplicate entries for some indices. I'll have to figure that out to get this merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope I was totally wrong. It's a different thing. In the ES7.x one, I'm using the keyword
type (first introduced in ES6.x) for field speaker
. That tells Elasticsearch "don't pass the values through a tokenizer, treat the full text as a single level of a categorical".
For ES6 I'm using the text
type and for ES5 I'm using the string
type. Those default to breaking down their inputs into tokens with a whitespace tokenizer.
So basically for the ES7 test it thinks there are three unique speakers: henry iv
, king
, and westmoreland
. But for all earlier versions, the same terms
agg gives you four levels:
thing doc_count
1: henry 34
2: iv 34
3: king 34
4: westmoreland 13
I chose the keyword
type for this field in the ES7 mapping because Elasticsearch removed the use of fielddata = true
to say "make it possible to do a terms agg".
For now, I'm going to go with the approach of added an explicit check on the version around this test. It's gross so I'll open a "come back and make this less gross" bug, but I feel like it's a thing that will:
- Give us confidence that this PR doesn't break backwards compatibility of our library with all earlier Elasticsearch versions
- Give us confidence that
uptasticsearch
can process the result of aterms
agg from ES7.x correctly
I couldn't find Travis checking 7.3.0; is that expected? |
TIL the "work in progress" feature! Haven't seen that before |
yep it's a new-ish feature! I of course appreciate the review, but I did open it in WIP so you'd know you didn't have to review yet |
Nope that's an omission, thank you! |
Ok I rebased to catch the changes in #163 , so the diff for |
Codecov Report
@@ Coverage Diff @@
## master #161 +/- ##
========================================
+ Coverage 92.8% 93.1% +0.3%
========================================
Files 8 8
Lines 556 595 +39
========================================
+ Hits 516 554 +38
- Misses 40 41 +1
Continue to review full report at Codecov.
|
This is working! Going to add ALL of the versions back and make this an official PR. @austin3dickey take a look whenever you have time. |
respose to a ``POST /_search`` request, return the total | ||
number of docs matching the query | ||
""" | ||
return response_json['hits']['total']['value'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just that small NEWS tweak
Ok one more review por favor (I also miss the bitbucket "leave tasks plus approve" thing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!!
This pull requests aims to add support in
uptasticsearch
for Elasticsearch 7.x. See the linked issue and changes to NEWS.md for details on what has changed in ES7.x.This PR's scope is limited to "get
uptasticsearch
code working with Es7.x". It does not include taking advantage of any ES7-specific features like new types of aggregations.Opening this as a draft PR as it currently only addresses the R side. The Python side needs to be updated n this PR as well.