We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
{ "settings":{ "number_of_shards":3, "number_of_replicas":1, "default_pipeline":"biz_timestamp_pipeline", "analysis":{ "analyzer":{ "pinyin_analyzer":{ "tokenizer":"my_pinyin" } }, "tokenizer":{ "my_pinyin":{ "type":"pinyin", "keep_separate_first_letter":true, "keep_full_pinyin":true, "keep_joined_full_pinyin":false, "keep_original":true, "limit_first_letter_length":16, "lowercase":true, "remove_duplicated_term":true, "ignore_pinyin_offset":false } } } }, "mappings":{ "properties":{ "vendorName":{ "type":"text", "analyzer":"pinyin_analyzer", "search_analyzer":"pinyin_analyzer", "fields":{ "keyword":{ "type":"keyword", "ignore_above":256 } } } } } }
示例一: 中文:刘德华阿里巴巴 分词结果: { "tokens": [ { "token": "l", "start_offset": 0, "end_offset": 1, "type": "word", "position": 0 }, { "token": "liu", "start_offset": 0, "end_offset": 1, "type": "word", "position": 0 }, { "token": "刘德华阿里巴巴", "start_offset": 0, "end_offset": 7, "type": "word", "position": 0 }, { "token": "ldhalbb", "start_offset": 0, "end_offset": 7, "type": "word", "position": 0 }, { "token": "d", "start_offset": 1, "end_offset": 2, "type": "word", "position": 1 }, { "token": "de", "start_offset": 1, "end_offset": 2, "type": "word", "position": 1 }, { "token": "h", "start_offset": 2, "end_offset": 3, "type": "word", "position": 2 }, { "token": "hua", "start_offset": 2, "end_offset": 3, "type": "word", "position": 2 }, { "token": "a", "start_offset": 3, "end_offset": 4, "type": "word", "position": 3 }, { "token": "li", "start_offset": 4, "end_offset": 5, "type": "word", "position": 4 }, { "token": "b", "start_offset": 5, "end_offset": 6, "type": "word", "position": 5 }, { "token": "ba", "start_offset": 5, "end_offset": 6, "type": "word", "position": 5 } ] }
查询: { "query": { "match_phrase": { "vendorName": { "query": "ldha" } } } }
可以看到分词结果中包含了首字母ldha,但查询不到结果,"阿"的首字母a,感觉是受到,"华"(hua)字中的a影响查不到。
示例二: 中文:深圳健安医药有限公司 { "tokens": [ { "token": "s", "start_offset": 0, "end_offset": 1, "type": "word", "position": 0 }, { "token": "shen", "start_offset": 0, "end_offset": 1, "type": "word", "position": 0 }, { "token": "深圳健安医药有限公司", "start_offset": 0, "end_offset": 10, "type": "word", "position": 0 }, { "token": "szjayyyxgs", "start_offset": 0, "end_offset": 10, "type": "word", "position": 0 }, { "token": "z", "start_offset": 1, "end_offset": 2, "type": "word", "position": 1 }, { "token": "zhen", "start_offset": 1, "end_offset": 2, "type": "word", "position": 1 }, { "token": "j", "start_offset": 2, "end_offset": 3, "type": "word", "position": 2 }, { "token": "jian", "start_offset": 2, "end_offset": 3, "type": "word", "position": 2 }, { "token": "a", "start_offset": 3, "end_offset": 4, "type": "word", "position": 3 }, { "token": "an", "start_offset": 3, "end_offset": 4, "type": "word", "position": 3 }, { "token": "y", "start_offset": 4, "end_offset": 5, "type": "word", "position": 4 }, { "token": "yi", "start_offset": 4, "end_offset": 5, "type": "word", "position": 4 }, { "token": "yao", "start_offset": 5, "end_offset": 6, "type": "word", "position": 5 }, { "token": "you", "start_offset": 6, "end_offset": 7, "type": "word", "position": 6 }, { "token": "x", "start_offset": 7, "end_offset": 8, "type": "word", "position": 7 }, { "token": "xian", "start_offset": 7, "end_offset": 8, "type": "word", "position": 7 }, { "token": "g", "start_offset": 8, "end_offset": 9, "type": "word", "position": 8 }, { "token": "gong", "start_offset": 8, "end_offset": 9, "type": "word", "position": 8 }, { "token": "si", "start_offset": 9, "end_offset": 10, "type": "word", "position": 9 } ] }
查询: { "query": { "match_phrase": { "vendorName": { "query": "szja" } } } }
可以看到分词结果中包含了首字母szja,但查询不到结果,"安"的首字母a,感觉是受到,"健"(jian)字中的a影响查不到。
其它中文,例如:深圳恩,使用sze同样查询不到,恩的首字母e 受到深(shen)字中的e影响查不到。
我调了很多参数都无法解决这个问题,有大佬救救我吗
The text was updated successfully, but these errors were encountered:
查询: { "query": { "match_phrase": { "vendorName": { "query": "ldha" } } } } 可以看到分词结果中包含了首字母ldha,但查询不到结果,"阿"的首字母a,感觉是受到,"华"(hua)字中的a影响查不到。
分词结果并没有把 ldha 分成一个词,所以匹配不上, 你换成 liudehua 就可以查了
Sorry, something went wrong.
No branches or pull requests
{
"settings":{
"number_of_shards":3,
"number_of_replicas":1,
"default_pipeline":"biz_timestamp_pipeline",
"analysis":{
"analyzer":{
"pinyin_analyzer":{
"tokenizer":"my_pinyin"
}
},
"tokenizer":{
"my_pinyin":{
"type":"pinyin",
"keep_separate_first_letter":true,
"keep_full_pinyin":true,
"keep_joined_full_pinyin":false,
"keep_original":true,
"limit_first_letter_length":16,
"lowercase":true,
"remove_duplicated_term":true,
"ignore_pinyin_offset":false
}
}
}
},
"mappings":{
"properties":{
"vendorName":{
"type":"text",
"analyzer":"pinyin_analyzer",
"search_analyzer":"pinyin_analyzer",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
}
}
}
}
示例一:
中文:刘德华阿里巴巴
分词结果:
{
"tokens": [
{
"token": "l",
"start_offset": 0,
"end_offset": 1,
"type": "word",
"position": 0
},
{
"token": "liu",
"start_offset": 0,
"end_offset": 1,
"type": "word",
"position": 0
},
{
"token": "刘德华阿里巴巴",
"start_offset": 0,
"end_offset": 7,
"type": "word",
"position": 0
},
{
"token": "ldhalbb",
"start_offset": 0,
"end_offset": 7,
"type": "word",
"position": 0
},
{
"token": "d",
"start_offset": 1,
"end_offset": 2,
"type": "word",
"position": 1
},
{
"token": "de",
"start_offset": 1,
"end_offset": 2,
"type": "word",
"position": 1
},
{
"token": "h",
"start_offset": 2,
"end_offset": 3,
"type": "word",
"position": 2
},
{
"token": "hua",
"start_offset": 2,
"end_offset": 3,
"type": "word",
"position": 2
},
{
"token": "a",
"start_offset": 3,
"end_offset": 4,
"type": "word",
"position": 3
},
{
"token": "li",
"start_offset": 4,
"end_offset": 5,
"type": "word",
"position": 4
},
{
"token": "b",
"start_offset": 5,
"end_offset": 6,
"type": "word",
"position": 5
},
{
"token": "ba",
"start_offset": 5,
"end_offset": 6,
"type": "word",
"position": 5
}
]
}
查询:
{
"query": {
"match_phrase": {
"vendorName": {
"query": "ldha"
}
}
}
}
可以看到分词结果中包含了首字母ldha,但查询不到结果,"阿"的首字母a,感觉是受到,"华"(hua)字中的a影响查不到。
示例二:
中文:深圳健安医药有限公司
{
"tokens": [
{
"token": "s",
"start_offset": 0,
"end_offset": 1,
"type": "word",
"position": 0
},
{
"token": "shen",
"start_offset": 0,
"end_offset": 1,
"type": "word",
"position": 0
},
{
"token": "深圳健安医药有限公司",
"start_offset": 0,
"end_offset": 10,
"type": "word",
"position": 0
},
{
"token": "szjayyyxgs",
"start_offset": 0,
"end_offset": 10,
"type": "word",
"position": 0
},
{
"token": "z",
"start_offset": 1,
"end_offset": 2,
"type": "word",
"position": 1
},
{
"token": "zhen",
"start_offset": 1,
"end_offset": 2,
"type": "word",
"position": 1
},
{
"token": "j",
"start_offset": 2,
"end_offset": 3,
"type": "word",
"position": 2
},
{
"token": "jian",
"start_offset": 2,
"end_offset": 3,
"type": "word",
"position": 2
},
{
"token": "a",
"start_offset": 3,
"end_offset": 4,
"type": "word",
"position": 3
},
{
"token": "an",
"start_offset": 3,
"end_offset": 4,
"type": "word",
"position": 3
},
{
"token": "y",
"start_offset": 4,
"end_offset": 5,
"type": "word",
"position": 4
},
{
"token": "yi",
"start_offset": 4,
"end_offset": 5,
"type": "word",
"position": 4
},
{
"token": "yao",
"start_offset": 5,
"end_offset": 6,
"type": "word",
"position": 5
},
{
"token": "you",
"start_offset": 6,
"end_offset": 7,
"type": "word",
"position": 6
},
{
"token": "x",
"start_offset": 7,
"end_offset": 8,
"type": "word",
"position": 7
},
{
"token": "xian",
"start_offset": 7,
"end_offset": 8,
"type": "word",
"position": 7
},
{
"token": "g",
"start_offset": 8,
"end_offset": 9,
"type": "word",
"position": 8
},
{
"token": "gong",
"start_offset": 8,
"end_offset": 9,
"type": "word",
"position": 8
},
{
"token": "si",
"start_offset": 9,
"end_offset": 10,
"type": "word",
"position": 9
}
]
}
查询:
{
"query": {
"match_phrase": {
"vendorName": {
"query": "szja"
}
}
}
}
可以看到分词结果中包含了首字母szja,但查询不到结果,"安"的首字母a,感觉是受到,"健"(jian)字中的a影响查不到。
其它中文,例如:深圳恩,使用sze同样查询不到,恩的首字母e 受到深(shen)字中的e影响查不到。
我调了很多参数都无法解决这个问题,有大佬救救我吗
The text was updated successfully, but these errors were encountered: