You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2020/04/11 10:23:51 UTC

[GitHub] [incubator-doris] wuyunfeng opened a new issue #3304: [Doris On ES] ES字符串类型且有分词的字段的过滤在精确匹配的场景下语义缺失

wuyunfeng opened a new issue #3304: [Doris On ES] ES字符串类型且有分词的字段的过滤在精确匹配的场景下语义缺失
URL: https://github.com/apache/incubator-doris/issues/3304
 
 
   ES不需要建立索引就可以导入数据,这时候遇到字符串类型的字段时,ES会建立两种类型的字段,如:
   ```
   {
      "test": {
         "mappings": {
            "doc": {
               "properties": {
                  "create_time": {
                     "type": "date"
                  },
                  "k1": {
                     "type": "text",
                     "fields": {
                        "keyword": {
                           "type": "keyword",
                           "ignore_above": 256
                        }
                     }
                  }
               }
            }
         }
      }
   }
   ```
   k1在ES存储的时候会生成两个字段:k1 和 k1.keyword, k1会用默认分词器(标准的英文分词器),k1.keyword 不分词。
   这时候如果一个文档的内容是:
   
   ```
    "_source": {
                  "k1": "wu yun feng",
                  "create_time": "2019-08-13"
               }
   ```
   
   在Doris On ES中,用户可能需要 select * from test where k1 = 'wu yun feng'.
   这时候 Doris On ES会生成一个ES的DSL:
   
   ```
   {"term":{"k1":"wu yun feng"}}
   ```
   因为k1是分词字段,直接查` wu yun feng` 这个term 是查不到的,在SQL中用户用 ` = ` 可能就是想精确匹配,因为我们需要将DSL改写成:
   
   ```
   {"term":{"k1.keyword":"wu yun feng"}}
   ```
   这时候就能匹配到
   ```
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] imay closed issue #3304: [Doris On ES] ES字符串类型且有分词的字段的过滤在精确匹配的场景下语义缺失

Posted by GitBox <gi...@apache.org>.
imay closed issue #3304: [Doris On ES] ES字符串类型且有分词的字段的过滤在精确匹配的场景下语义缺失
URL: https://github.com/apache/incubator-doris/issues/3304
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org