You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Andrei Sereda (JIRA)" <ji...@apache.org> on 2019/05/22 23:43:00 UTC

[jira] [Commented] (CALCITE-3027) Support LIKE operator in Elasticsearch

    [ https://issues.apache.org/jira/browse/CALCITE-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16846291#comment-16846291 ] 

Andrei Sereda commented on CALCITE-3027:
----------------------------------------

[~xpleaf] I have left some comments in PR. 

Will you be able to address them before 1.20 release ? 

> Support LIKE operator in Elasticsearch
> --------------------------------------
>
>                 Key: CALCITE-3027
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3027
>             Project: Calcite
>          Issue Type: New Feature
>          Components: elasticsearch-adapter
>            Reporter: yeyonghao
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> In Elasticsearch, fuzzy matching is implemented by wildcard query:
> {code:java}
> GET /company/_search
> {
>   "query": {
>     "constant_score": {
>       "filter": {
>         "wildcard":{
>           "name_text":"*Alle_"
>         }
>       }
>     }
>   }
> }
> {code}
> The symbols % and _ in sql are equivalent to the symbols * and ? in es, respectively.
> So I added a new QueryBuilder class to support wildcard queries:
> {code:java}
> static class WildcardQueryBuilder extends QueryBuilder {
>     private final String fieldName;
>     private final String value;
>     WildcardQueryBuilder(String fieldName, String value) {
>       this.fieldName = fieldName;
>       this.value = transformValue(value);
>     }
>     @Override void writeJson(JsonGenerator generator) throws IOException {
>       generator.writeStartObject();
>       generator.writeFieldName("wildcard");
>       generator.writeStartObject();
>       generator.writeFieldName(fieldName);
>       writeObject(generator, value);
>       generator.writeEndObject();
>       generator.writeEndObject();
>     }
>     // The symbols % and _ in sql are equivalent to the symbols * and ? in es, respectively.
>     private String transformValue(String value) {
>       if (value != null) {
>         value = value
>                 .replaceAll("%", "*")
>                 .replaceAll("_", "?");
>       }
>       return value;
>     }
>   }
> {code}
> Below I give some examples to illustrate this like query.
> ----
> *Elasticsearch Version*
> {code:java}
> $ ./elasticsearch --version
> Version: 6.1.2, Build: Unknown/2019-03-19T15:31:12.734Z, JVM: 1.8.0_181
> {code}
> *Mock Data In Elasticsearch* 
> {code:java}
> PUT company
> {
>   "mappings": {
>     "doc": {
>       "properties": {
>         "name_keyword":{
>           "type": "keyword"
>         },
>         "name_text":{
>           "type": "text"
>         }
>       }
>     }
>   }
> }
> PUT /company/doc/_bulk?refresh
> {"index":{}}
> {"name_keyword":"hello to Allen","name_text":"hello to Allen"}
> {"index":{}}
> {"name_keyword":"say hi william","name_text":"say hi william"}
> {"index":{}}
> {"name_keyword":"hello to Rose","name_text":"hello to Rose"}
> {code}
> *Test in sqlline*
> model.json
> {code:java}
> {
>   "version": "1.0",
>   "defaultSchema": "elasticsearch",
>   "schemas": [
>     {
>       "type": "custom",
>       "name": "elasticsearch",
>       "factory": "org.apache.calcite.adapter.elasticsearch.ElasticsearchSchemaFactory",
>       "operand": {
>         "coordinates": "{'127.0.0.1': 9200}",
>         "index": "company"
>       }
>     }
>   ]
> }
> {code}
> sqlline test 
> {code:java}
> $ ./sqlline
> sqlline version 1.7.0
> sqlline> !connect jdbc:calcite:model=model.json admin admin
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in [jar:file:/Users/yeyonghao/maven/repository/org/slf4j/slf4j-log4j12/1.7.25/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in [jar:file:/Users/yeyonghao/maven/repository/ch/qos/logback/logback-classic/1.1.3/logback-classic-1.1.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> Transaction isolation level TRANSACTION_REPEATABLE_READ is not supported. Default (TRANSACTION_NONE) will be used instead.
> 0: jdbc:calcite:model=model.json> select * from "doc";
> +---------------------------------------------------------+
> | _MAP |
> +---------------------------------------------------------+
> | {name_keyword=hello to Allen, name_text=hello to Allen} |
> | {name_keyword=hello to Rose, name_text=hello to Rose} |
> | {name_keyword=say hi william, name_text=say hi william} |
> +---------------------------------------------------------+
> 3 rows selected (1.536 seconds)
> 0: jdbc:calcite:model=model.json>
> 0: jdbc:calcite:model=model.json> select * from "doc" where _MAP['name_keyword'] like '%Alle_';
> +---------------------------------------------------------+
> | _MAP |
> +---------------------------------------------------------+
> | {name_keyword=hello to Allen, name_text=hello to Allen} |
> +---------------------------------------------------------+
> 1 row selected (0.092 seconds)
> 0: jdbc:calcite:model=model.json>
> 0: jdbc:calcite:model=model.json> select * from "doc" where _MAP['name_keyword'] like 'hello to Rose';
> +-------------------------------------------------------+
> | _MAP |
> +-------------------------------------------------------+
> | {name_keyword=hello to Rose, name_text=hello to Rose} |
> +-------------------------------------------------------+
> 1 row selected (0.063 seconds)
> 0: jdbc:calcite:model=model.json>
> 0: jdbc:calcite:model=model.json> select * from "doc" where _MAP['name_keyword'] like '__llo to Ros_';
> +-------------------------------------------------------+
> | _MAP |
> +-------------------------------------------------------+
> | {name_keyword=hello to Rose, name_text=hello to Rose} |
> +-------------------------------------------------------+
> 1 row selected (0.048 seconds)
> {code}
> ----
> *Attention*
> In Elasticsearch, the wildcard searches for the contents of the inverted index table, so the behavior is different for fields of type text and field type of keyword:
> 1.If the field type is keyword, es will index the entire field content, so the es wildcard search behavior is the same as sql's like.
> 2.If the field type is text, the content of the field will be saved into the index table after being segmented. So in this case, the es wildcard search is different from the like search in sql. In fact, at this time we recommend using full-text search instead of wildcard search.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)