-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathREADME.old
420 lines (337 loc) · 16 KB
/
README.old
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420

# esse
**This project is under development and may suffer constant structural changes. I don't recommend using it right now**
Simple and efficient way to organize queries/mapping/indices/tasks based on the official elasticsearch-ruby.
## Why to use it?
Some facts to use this library:
### Don't spend time learning our DLS
You don't need to spend time learning our DSL or gem usage to start using it. All you need know is the elasticsearch syntax. You are free to build your queries/mappings/settings using JSON/RubyHash flexibility. And keeping simple any elasticsearch upgrade and its syntax changes.
### Multiple ElasticSearch Versions
You can use multiple elasticsearch servers with different versions in an elegant way. Take a look at [LINK TO TOPIC](#anchors-id-here) for more details.
### It's pure Ruby
Yeah!! Nor [activesupport](http://github.com/rails/rails/tree/master/activesupport) dependency and all its monkey patchings. But if you are using rails, suggest install `esse-rails` extension that makes things even easier. Use the [Get started with esse-rails](#anchors-id-here) for more details.
## Installation
Add this line to your application's Gemfile:
```ruby
gem 'esse'
```
And then execute:
$ bundle install
Or install it yourself as:
$ gem install esse
## Usage
### Configuration
Example of gem configuration using the `configure` method:
```ruby
Esse.configure do |config|
config.indices_directory = 'app/indices'
config.cluster do |cluster|
cluster.index_prefix = 'illinois'
cluster.settings = {
index: {
number_of_shards: 4,
number_of_replicas: 1,
},
analysis: {
analyzer: {
esse_index: {
type: "custom",
char_filter: ["ampersand"],
filter: [
"lowercase",
"asciifolding",
"esse_english_stop",
"esse_index_shingle",
"esse_stemmer"
],
tokenizer: "standard"
},
},
filter: {
esse_index_shingle: {
token_separator: "",
type: "shingle"
},
esse_stemmer: {
type: "stemmer",
language: "English"
},
},
char_filter: {
ampersand: {
type: "mapping",
mappings: ["&=> and "]
}
}
}
}
cluster.mappings = {
dynamic_templates: [
{
esse_string_template: {
match: "*",
match_mapping_type: "string",
mapping: {
fields: {
analyzed: {
analyzer: "esse_index",
index: true,
type: "text"
}
},
ignore_above: 1024,
type: "keyword"
}
}
}
]
}
cluster.client = Elasticsearch::Client.new(host: 'http://localhost:9200')
end
end
```
Note that that if the cluster is configured without any identifier, it will be used as the `:default` cluster.
```ruby
Esse.config.cluster.settings
# => {index: {number_of_shards: 4, number_of_replicas: 1} }
# or
Esse.config.cluster(:default).settings
# => {index: {number_of_shards: 4, number_of_replicas: 1} }
```
You may also configure multiple cluster connections by specifying the identifier like the example below:
```ruby
Esse.configure do |config|
config.indices_directory = 'app/indices'
config.cluster(:il) do |cluster|
cluster.index_prefix = 'illinois'
cluster.settings = {
index: {
number_of_shards: 4,
number_of_replicas: 1,
}
}
cluster.client = Elasticsearch::Client.new(host: 'https://illinois:9200')
end
config.cluster(:fl) do |cluster|
cluster.index_prefix = 'florida'
cluster.settings = {
index: {
number_of_shards: 2,
number_of_replicas: 2,
}
}
cluster.client = Elasticsearch::Client.new(host: 'https://florida:9200')
end
end
```
And on the index you can use the `:il` or `:fl` identifier to specify which cluster you want to use.
```ruby
class Illinois::AccountsIndex < Esse::Index
self.cluster_id = :il
...
end
class Florida::AccountsIndex < Esse::Index
self.cluster_id = :fl
end
```
You can also configure it through the YAML file.
```ruby
Esse.config.load('./config/esse.yml')
# or
Esse.configure do |config|
config.load('./config/esse.yml')
config.indices_directory = 'override/indices/directory/from/yml'
end
```
And the `config/esse.yml` file should look like:
```yaml
indices_directory: "app/indices"
clusters:
il:
index_prefix: "illinois"
settings:
index:
number_of_shards: 4
number_of_replicas: 1
client:
host: "https://illinois:9200"
fl:
index_prefix: "florida"
settings:
index:
number_of_shards: 2
number_of_replicas: 2
client:
host: "https://florida:9200"
```
## Indices
### Single type per index
The mapping of elasticsearch 1.x, 2.x and 5.x allow multiple types. The single-type-per-index behaviour was introduction on es 5.x disabled as default. After es 6.x de single type per index is enabled by default but the explicit mapping is still required by using the `_doc` type. Es 7.x and above the `_doc` was totally removed and the mapping type is no longer required. The `esse` framework will automatically detect the version of elasticsearch and use the correct mapping type definition. But you can enforce the mapping type by using the `mapping_single_type` option in the `index`(Just make sure the server support it and it's configured accordingly). See [this article](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/removal-of-types.html) for more details.
```ruby
class MyIndex < Esse::Index
self.mapping_single_type = true # This only control the mapping format. Add the setting `index.mapping.single_type: true` to the index settings in elasticsearch as well if needed.
end
```
## CLI
The `esse` command line tool is available to manage indices and tasks. It's configured to load the configuration from the `Essefile`, `config/esse.rb` or `config/initializers/esse.rb` files(in that order).
```bash
Commands:
esse --version, -v # Show package version
esse generate SUBCOMMAND ...ARGS # Run generators
esse help [COMMAND] # Describe available commands or one specific command
esse index SUBCOMMAND ...ARGS # Manage indices
esse install # Generate boilerplate configuration files
Options:
-r, [--require=REQUIRE] # Require config file where the application is defined
-s, [--silent], [--no-silent] # Silent mode
-h, [--help], -?, [--usage] # Show help
```
### Generate Index
Generate an index with the following command:
```bash
❯ esse generate index <IndexName> <*doc_type>
```
List of types are optional. If not specified, the index will be created with definition on Index level with the `"default"` as type. Example:
```bash
❯ bundle exec esse generate index GeosIndex
create app/indices/geos_index.rb
```
or with multiple datasources as document types:
```bash
❯ bundle exec esse generate index GeosIndex state city
create app/indices/geos_index.rb
```
As default, the index will be create d with the collection, serializer, settings, mapping directly to the index class. By you can also specify custom arguments to better organize your code by splitting the it into multiple files.
```bash
❯ ./exec/esse generate index GeosIndex state city --settings --mappings --documents --collections
create app/indices/geos_index.rb
create app/indices/geos_index/templates/settings.json
create app/indices/geos_index/templates/mappings.json
create app/indices/geos_index/documents/state_document.rb
create app/indices/geos_index/collections/state_collection.rb
create app/indices/geos_index/documents/city_document.rb
create app/indices/geos_index/collections/city_collection.rb
```
### Index Commands
There are several commands to manage indices. The following commands are available:
```bash
❯ bundle exec esse index [ruby-2.6.9p207]
Commands:
esse index close *INDEX_CLASS # Close an index (keep the data on disk, but deny operations with the index).
esse index create *INDEX_CLASSES # Creates indices for the given classes
esse index delete *INDEX_CLASSES # Deletes indices for the given classes
esse index help [COMMAND] # Describe subcommands or one specific subcommand
esse index import *INDEX_CLASSES --context=key:value # Import documents from the given classes
esse index open *INDEX_CLASS # Open a previously closed index.
esse index reset *INDEX_CLASSES # Performs zero-downtime index resetting.
esse index update_aliases *INDEX_CLASS # Replaces all existing aliases by the given suffix
esse index update_mapping *INDEX_CLASS # Create or update a mapping
esse index update_settings *INDEX_CLASS # Closes the index for read/write operations, updates the index settings, and open it again
```
## Search
Searching is done through the `search` method on the index class. It returns an instance of `Esse::Search::Query` that can be used to build the query and retrieve the results.
```ruby
# Searching using Lucene query syntax
> query = GeosIndex.search('*', size: 10, offset: 0)
=> #<Esse::Search::Query:0x0 @definition={:size=>10, :offset=>0, :q=>"*", :index=>"esse_console_geos"}>
# Searching using DSL
> query = GeosIndex.search(body: {query: {match: {name: 'Illinois'}}}, size: 1)
=> #<Esse::Search::Query:0x0 @definition={:body=>{:query=>{:match=>{:name=>"Illinois"}}}, :size=>1, :index=>"esse_console_geos"}>
# Retrieve response hits
> query.results
=> [{"_index"=>"esse_console_geos_v1", "_type"=>"_doc", "_id"=>"IL", "_score"=>2.5433555, "_routing"=>"IL", "_source"=>{"id"=>"IL", "name"=>"Illinois"}}]
> query = GeosIndex.search(body: {query: {match: {"name.analyzed" => 'Illinois'}}}, _source: false)
=> #<Esse::Search::Query:0x0 @definition={:body=>{:query=>{:match=>{"name.analyzed"=>"Illinois"}}}, :_source=>false, :index=>"esse_console_geos"}
> query.results
=> [{"_index"=>"esse_console_geos_v1", "_type"=>"_doc", "_id"=>"IL", "_score"=>2.5433555, "_routing"=>"IL"}
# Retrieve response
> query.response
=> #<Esse::Search::Response:0x0 ...>
```
Search aggregations can be retrieved using the `aggregations` method from query response.
```ruby
> query = GeosIndex.search(body: {query: { match_all: {}}, aggregations: { names: { terms: { field: "name", size: 2 }}}}, size: 2)
=> #<Esse::Search::Query:0x0 @definition={:body=>{:query=>{:match_all=>{}}, :aggregations=>{:names=>{:terms=>{:field=>"name", :size=>2}}}}, :size=>2, :index=>"esse_console_geos"}>
> query.response.aggregations
=> {"names"=>{"doc_count_error_upper_bound"=>0, "sum_other_doc_count"=>14, "buckets"=>[{"key"=>"Alabama", "doc_count"=>1}, {"key"=>"Alaska", "doc_count"=>1}]}}
# or just use the raw response
> query.response.raw_response
=> {"took"=>1,
"timed_out"=>false,
"_shards"=>{"total"=>1, "successful"=>1, "skipped"=>0, "failed"=>0},
"hits"=>
{"total"=>{"value"=>16, "relation"=>"eq"},
"max_score"=>1.0,
"hits"=>
[{"_index"=>"esse_console_geos_v1", "_type"=>"_doc", "_id"=>"AL", "_score"=>1.0, "_routing"=>"AL", "_source"=>{"id"=>"AL", "name"=>"Alabama"}},
{"_index"=>"esse_console_geos_v1", "_type"=>"_doc", "_id"=>"AK", "_score"=>1.0, "_routing"=>"AK", "_source"=>{"id"=>"AK", "name"=>"Alaska"}}]},
"aggregations"=>{"names"=>{"doc_count_error_upper_bound"=>0, "sum_other_doc_count"=>14, "buckets"=>[{"key"=>"Alabama", "doc_count"=>1}, {"key"=>"Alaska", "doc_count"=>1}]}}}
> query.response.total
=> 16
```
Scroll queries can be performed using the `scroll_hits` method on the query.
```ruby
> query.scroll_hits(batch_size: 6, scroll: '1m') { |hits| puts hits.size }
6
6
4
=> nil
```
Searching across multiple indices can be done using the `search` method on the cluster.
Using string indices as arguments:
```ruby
> Esse.config.cluster.search('esse_*', body: {query: {match_all:{}}})
=> #<Esse::Search::Query:0x0 @definition={:body=>{:query=>{:match_all=>{}}}, :index=>"esse_*"}>
> Esse.config.cluster.search('esse_cities_index', 'esse_counties_index', body: {query: {match_all:{}}})
=> #<Esse::Search::Query:0x0 @definition={:body=>{:query=>{:match_all=>{}}}, :index=>"esse_cities_index,esse_counties_index"}>
```
Using index classes as arguments:
```ruby
> Esse.config.cluster.search(CitiesIndex, CountiesIndex, body: {query: {match_all:{}}})
=> #<Esse::Search::Query:0x0 @definition={:body=>{:query=>{:match_all=>{}}}, :index=>"esse_cities_index,esse_counties_index"}>
```
## Development
After checking out the repo, run `bin/setup` to install dependencies. The command will dependencies of all `./gemfiles/Gemfile.*`.
You can use `docker-compose` to start elasticsearch and opensearch services.
```bash
❯ docker-compose up -d
```
```bash
❯ docker-compose ps
NAME COMMAND SERVICE STATUS PORTS
esse-elasticsearch-v1-1 "/docker-entrypoint.…" elasticsearch-v1 running 9300/tcp, 0.0.0.0:9201->9200/tcp, :::9201->9200/tcp
esse-elasticsearch-v2-1 "/docker-entrypoint.…" elasticsearch-v2 running 9300/tcp, 0.0.0.0:9202->9200/tcp, :::9202->9200/tcp
esse-elasticsearch-v5-1 "/bin/bash bin/es-do…" elasticsearch-v5 running 9300/tcp, 0.0.0.0:9205->9200/tcp, :::9205->9200/tcp
esse-elasticsearch-v6-1 "/usr/local/bin/dock…" elasticsearch-v6 running 9300/tcp, 0.0.0.0:9206->9200/tcp, :::9206->9200/tcp
esse-elasticsearch-v7-1 "/bin/tini -- /usr/l…" elasticsearch-v7 running 9300/tcp, 0.0.0.0:9207->9200/tcp, :::9207->9200/tcp
esse-elasticsearch-v8-1 "/bin/tini -- /usr/l…" elasticsearch-v8 running 9300/tcp, 0.0.0.0:9208->9200/tcp, :::9208->9200/tcp
esse-opensearch-v1-1 "./opensearch-docker…" opensearch-v1 running 9300/tcp, 9600/tcp, 9650/tcp, 0.0.0.0:19201->9200/tcp, :::19201->9200/tcp
esse-opensearch-v2-1 "./opensearch-docker…" opensearch-v2 running 9300/tcp, 9600/tcp, 9650/tcp, 0.0.0.0:19202->9200/tcp, :::19202->9200/tcp
```
As you can see each elasticsearch or opensearch version is binded to a different port.
Tests and console uses the `ESSE_URL` along with `STACK_VERSION` and `BUNDLE_GEMFILE` to determine which elasticsearch/opensearch and Gemfile with the official ruby client to use:
```bash
❯ export ESSE_URL=http://localhost:9201
❯ export STACK_VERSION=7.13.2
❯ export BUNDLE_GEMFILE=gemfiles/Gemfile.v7
```
But we have `bin/run` script to simplify this process. It will automatically set the required envionment variables according to the given arguments.
```bash
❯ bin/run elasticsearch v7 ./bin/console
> ENV.values_at('ESSE_URL', 'STACK_VERSION', 'BUNDLE_GEMFILE')
=> ["http://localhost:9207", "7.13.2", "gemfiles/Gemfile.elasticsearch-7.x"]
> Esse.config.cluster.api.health.fetch('cluster_name')
=> "cluster-elasticsearch-v7"
```
And to run the tests:
```bash
❯ bin/run elasticsearch v7 bundle exec rspec
```
If you don't have elasticsearch running and want to ignore integratino tests, you can use the `STUB_STACK=<distribution>-<version>` environment variable to stup the test suite for a specific elasticsearch version. Note that all examples with `:es_version` meta data will be skipped.
```bash
❯ STUB_STACK=elasticsearch-7.0.3 bundle exec --gemfile gemfiles/Gemfile.elasticsearch-7.x rspec
```
## Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/marcosgz/esse.
## License
The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).