You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2020/04/11 10:25:35 UTC

[GitHub] [incubator-doris] wuyunfeng opened a new pull request #3305: Add field context for string field keyword type

wuyunfeng opened a new pull request #3305: Add field context for string field keyword type
URL: https://github.com/apache/incubator-doris/pull/3305
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] wuyunfeng commented on a change in pull request #3305: Add field context for string field keyword type

Posted by GitBox <gi...@apache.org>.
wuyunfeng commented on a change in pull request #3305: Add field context for string field keyword type
URL: https://github.com/apache/incubator-doris/pull/3305#discussion_r407499089
 
 

 ##########
 File path: fe/src/main/java/org/apache/doris/external/EsStateStore.java
 ##########
 @@ -209,36 +206,54 @@ public EsTableState parseClusterState55(String responseStr, EsTable esTable)
                 }
                 JSONObject fieldObject = schema.optJSONObject(colName);
                 String fieldType = fieldObject.optString("type");
-                if (EsTable.DEFAULT_DOCVALUE_DISABLED_FIELDS.contains(fieldType)) {
-                    JSONObject fieldsObject = fieldObject.optJSONObject("fields");
-                    if (fieldsObject != null) {
-                        for (String key : fieldsObject.keySet()) {
-                            JSONObject innerTypeObject = fieldsObject.optJSONObject(key);
-                            if (EsTable.DEFAULT_DOCVALUE_DISABLED_FIELDS.contains(innerTypeObject.optString("type"))) {
-                                continue;
-                            }
-                            if (innerTypeObject.has("doc_values")) {
-                                boolean docValue = innerTypeObject.getBoolean("doc_values");
-                                if (docValue) {
-                                    esTable.addDocValueField(colName, colName);
+                // string-type field used keyword type to generate predicate
+                if (esTable.isKeywordSniffEnable()) {
+                    // if text field type seen, we should use the `field` keyword type?
+                    if ("text".equals(fieldType)) {
 
 Review comment:
   Elasticsearch's type must be lowercase...

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] imay commented on a change in pull request #3305: Add field context for string field keyword type

Posted by GitBox <gi...@apache.org>.
imay commented on a change in pull request #3305: Add field context for string field keyword type
URL: https://github.com/apache/incubator-doris/pull/3305#discussion_r407482433
 
 

 ##########
 File path: fe/src/main/java/org/apache/doris/external/EsStateStore.java
 ##########
 @@ -209,36 +206,54 @@ public EsTableState parseClusterState55(String responseStr, EsTable esTable)
                 }
                 JSONObject fieldObject = schema.optJSONObject(colName);
                 String fieldType = fieldObject.optString("type");
-                if (EsTable.DEFAULT_DOCVALUE_DISABLED_FIELDS.contains(fieldType)) {
-                    JSONObject fieldsObject = fieldObject.optJSONObject("fields");
-                    if (fieldsObject != null) {
-                        for (String key : fieldsObject.keySet()) {
-                            JSONObject innerTypeObject = fieldsObject.optJSONObject(key);
-                            if (EsTable.DEFAULT_DOCVALUE_DISABLED_FIELDS.contains(innerTypeObject.optString("type"))) {
-                                continue;
-                            }
-                            if (innerTypeObject.has("doc_values")) {
-                                boolean docValue = innerTypeObject.getBoolean("doc_values");
-                                if (docValue) {
-                                    esTable.addDocValueField(colName, colName);
+                // string-type field used keyword type to generate predicate
+                if (esTable.isKeywordSniffEnable()) {
+                    // if text field type seen, we should use the `field` keyword type?
+                    if ("text".equals(fieldType)) {
 
 Review comment:
   
   ignorecase compare?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] stalary commented on issue #3305: Add field context for string field keyword type

Posted by GitBox <gi...@apache.org>.
stalary commented on issue #3305: Add field context for string field keyword type
URL: https://github.com/apache/incubator-doris/pull/3305#issuecomment-612760776
 
 
   How is the review going? I need this feature. @morningman 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] imay merged pull request #3305: Add field context for string field keyword type

Posted by GitBox <gi...@apache.org>.
imay merged pull request #3305: Add field context for string field keyword type
URL: https://github.com/apache/incubator-doris/pull/3305
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a change in pull request #3305: Add field context for string field keyword type

Posted by GitBox <gi...@apache.org>.
morningman commented on a change in pull request #3305: Add field context for string field keyword type
URL: https://github.com/apache/incubator-doris/pull/3305#discussion_r407062599
 
 

 ##########
 File path: be/src/exec/es/es_predicate.h
 ##########
 @@ -198,6 +198,10 @@ class EsPredicate {
         return _es_query_status;
     }
 
+    void field_context(const std::map<std::string, std::string>& field_context) {
 
 Review comment:
   And your UT failed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] wuyunfeng commented on a change in pull request #3305: Add field context for string field keyword type

Posted by GitBox <gi...@apache.org>.
wuyunfeng commented on a change in pull request #3305: Add field context for string field keyword type
URL: https://github.com/apache/incubator-doris/pull/3305#discussion_r407176067
 
 

 ##########
 File path: be/src/exec/es/es_predicate.cpp
 ##########
 @@ -298,8 +302,12 @@ Status EsPredicate::build_disjuncts_list(const Expr* conjunct) {
         } else {
             is_not_null = true;
         }
+        std::string col = slot_desc->col_name();
 
 Review comment:
   maybe introduce a common function would increase the complexity

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] imay commented on a change in pull request #3305: Add field context for string field keyword type

Posted by GitBox <gi...@apache.org>.
imay commented on a change in pull request #3305: Add field context for string field keyword type
URL: https://github.com/apache/incubator-doris/pull/3305#discussion_r407475374
 
 

 ##########
 File path: gensrc/thrift/PlanNodes.thrift
 ##########
 @@ -222,6 +222,18 @@ struct TEsScanNode {
     // {"city": "city.raw"}
     // use select city from table, if enable the docvalue, we will fetch the `city` field value from `city.raw`
     3: optional map<string, string> docvalue_context
+    // used to indicate which string-type field predicate should used xxx.keyword etc.
+    // "k1": {
+    //    "type": "text",
+    //    "fields": {
+    //        "keyword": {
+    //            "type": "keyword",
+    //            "ignore_above": 256
+    //           }
+    //    }
+    // }
+    // k1 > 'abc' -> k1.keyword > 'abc'
+    4: optional map<string, string> fields_context
 
 Review comment:
   If this is only used for keyword path, `fields_context` seems no relation with keyword

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on issue #3305: Add field context for string field keyword type

Posted by GitBox <gi...@apache.org>.
morningman commented on issue #3305: Add field context for string field keyword type
URL: https://github.com/apache/incubator-doris/pull/3305#issuecomment-612418725
 
 
   > #3304
   > 
   > 这个方案其实是一个过度的解决方案,最佳的解决方案应该是将对EsScanNode的谓词转换在FE中完成,BE只负责扫描,这样将对查询的处理和执行完全分开。
   > 
   > 建表时增加一个 `enable_keyword_sniff ` 配置,默认是 "true", 会对分词字段进行探测获取`keyword`类型字段的`json path`.
   > 
   > ```
   > 
   > CREATE EXTERNAL TABLE `test` (
   >   `k1` varchar(20) COMMENT "",
   >   `create_time` datetime COMMENT ""
   > ) ENGINE=ELASTICSEARCH
   > PROPERTIES (
   > "hosts" = "http://10.74.167.16:8200",
   > "user" = "root",
   > "password" = "root",
   > "index" = "test",
   > "type" = "doc",
   > "enable_keyword_sniff" = "true"
   > );
   > ```
   > 
   > 其中 `enable_keyword_sniff` 默认就是true
   > 
   > 效果:
   > 
   > ```
   > select * from test where k1 = "wu yun feng"
   > ```
   > 
   > 输出:
   > 
   > ```
   > {"term":{"k1.keyword":"wu yun feng"}}
   > ```
   > 
   > 在这个PR中,我暂时把对ES的version检测移除了
   > 
   > 未来几个月将会将谓词的转换全部上移到FE中,处理这种就变得非常简单了。
   
   Better in English

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] wuyunfeng commented on a change in pull request #3305: Add field context for string field keyword type

Posted by GitBox <gi...@apache.org>.
wuyunfeng commented on a change in pull request #3305: Add field context for string field keyword type
URL: https://github.com/apache/incubator-doris/pull/3305#discussion_r407048747
 
 

 ##########
 File path: fe/src/main/java/org/apache/doris/planner/EsScanNode.java
 ##########
 @@ -123,6 +123,9 @@ protected void toThrift(TPlanNode msg) {
         if (table.isDocValueScanEnable()) {
             esScanNode.setDocvalue_context(table.docValueContext());
         }
+        if (table.isKeywordSniffEnable() && table.fieldsContext().size() > 0) {
 
 Review comment:
   这两个是一样的,第一个条件是用户主动配置的,第二个目前代码都是这么写的^o^

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] kangkaisen commented on a change in pull request #3305: Add field context for string field keyword type

Posted by GitBox <gi...@apache.org>.
kangkaisen commented on a change in pull request #3305: Add field context for string field keyword type
URL: https://github.com/apache/incubator-doris/pull/3305#discussion_r407066649
 
 

 ##########
 File path: be/src/exec/es/es_predicate.cpp
 ##########
 @@ -298,8 +302,12 @@ Status EsPredicate::build_disjuncts_list(const Expr* conjunct) {
         } else {
             is_not_null = true;
         }
+        std::string col = slot_desc->col_name();
 
 Review comment:
   Repeat three times.
   ```
           std::string col = slot_desc->col_name();
           if (_field_context.find(col) != _field_context.end()) {
               col = _field_context[col];
           }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] wuyunfeng commented on a change in pull request #3305: Add field context for string field keyword type

Posted by GitBox <gi...@apache.org>.
wuyunfeng commented on a change in pull request #3305: Add field context for string field keyword type
URL: https://github.com/apache/incubator-doris/pull/3305#discussion_r407510288
 
 

 ##########
 File path: gensrc/thrift/PlanNodes.thrift
 ##########
 @@ -222,6 +222,18 @@ struct TEsScanNode {
     // {"city": "city.raw"}
     // use select city from table, if enable the docvalue, we will fetch the `city` field value from `city.raw`
     3: optional map<string, string> docvalue_context
+    // used to indicate which string-type field predicate should used xxx.keyword etc.
+    // "k1": {
+    //    "type": "text",
+    //    "fields": {
+    //        "keyword": {
+    //            "type": "keyword",
+    //            "ignore_above": 256
+    //           }
+    //    }
+    // }
+    // k1 > 'abc' -> k1.keyword > 'abc'
+    4: optional map<string, string> fields_context
 
 Review comment:
   https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html
   
   fields is what it means to be, maybe enable_keyword_sniff is no t suitable?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] imay commented on a change in pull request #3305: Add field context for string field keyword type

Posted by GitBox <gi...@apache.org>.
imay commented on a change in pull request #3305: Add field context for string field keyword type
URL: https://github.com/apache/incubator-doris/pull/3305#discussion_r407526745
 
 

 ##########
 File path: gensrc/thrift/PlanNodes.thrift
 ##########
 @@ -222,6 +222,18 @@ struct TEsScanNode {
     // {"city": "city.raw"}
     // use select city from table, if enable the docvalue, we will fetch the `city` field value from `city.raw`
     3: optional map<string, string> docvalue_context
+    // used to indicate which string-type field predicate should used xxx.keyword etc.
+    // "k1": {
+    //    "type": "text",
+    //    "fields": {
+    //        "keyword": {
+    //            "type": "keyword",
+    //            "ignore_above": 256
+    //           }
+    //    }
+    // }
+    // k1 > 'abc' -> k1.keyword > 'abc'
+    4: optional map<string, string> fields_context
 
 Review comment:
   You're right

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] kangkaisen commented on a change in pull request #3305: Add field context for string field keyword type

Posted by GitBox <gi...@apache.org>.
kangkaisen commented on a change in pull request #3305: Add field context for string field keyword type
URL: https://github.com/apache/incubator-doris/pull/3305#discussion_r407066649
 
 

 ##########
 File path: be/src/exec/es/es_predicate.cpp
 ##########
 @@ -298,8 +302,12 @@ Status EsPredicate::build_disjuncts_list(const Expr* conjunct) {
         } else {
             is_not_null = true;
         }
+        std::string col = slot_desc->col_name();
 
 Review comment:
   Repeat four times.
   ```
           std::string col = slot_desc->col_name();
           if (_field_context.find(col) != _field_context.end()) {
               col = _field_context[col];
           }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a change in pull request #3305: Add field context for string field keyword type

Posted by GitBox <gi...@apache.org>.
morningman commented on a change in pull request #3305: Add field context for string field keyword type
URL: https://github.com/apache/incubator-doris/pull/3305#discussion_r407062113
 
 

 ##########
 File path: be/src/exec/es/es_predicate.h
 ##########
 @@ -198,6 +198,10 @@ class EsPredicate {
         return _es_query_status;
     }
 
+    void field_context(const std::map<std::string, std::string>& field_context) {
 
 Review comment:
   better to call `set_ field_context`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] wuyunfeng commented on a change in pull request #3305: Add field context for string field keyword type

Posted by GitBox <gi...@apache.org>.
wuyunfeng commented on a change in pull request #3305: Add field context for string field keyword type
URL: https://github.com/apache/incubator-doris/pull/3305#discussion_r407176067
 
 

 ##########
 File path: be/src/exec/es/es_predicate.cpp
 ##########
 @@ -298,8 +302,12 @@ Status EsPredicate::build_disjuncts_list(const Expr* conjunct) {
         } else {
             is_not_null = true;
         }
+        std::string col = slot_desc->col_name();
 
 Review comment:
   maybe instroduce a common function would increase the complexity

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] stalary commented on a change in pull request #3305: Add field context for string field keyword type

Posted by GitBox <gi...@apache.org>.
stalary commented on a change in pull request #3305: Add field context for string field keyword type
URL: https://github.com/apache/incubator-doris/pull/3305#discussion_r407048050
 
 

 ##########
 File path: fe/src/main/java/org/apache/doris/planner/EsScanNode.java
 ##########
 @@ -123,6 +123,9 @@ protected void toThrift(TPlanNode msg) {
         if (table.isDocValueScanEnable()) {
             esScanNode.setDocvalue_context(table.docValueContext());
         }
+        if (table.isKeywordSniffEnable() && table.fieldsContext().size() > 0) {
 
 Review comment:
   !table.fieldsContext().isEmpty()是不是更好一点

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org