You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Mikhail Krupitskiy <mi...@jetbrains.com> on 2016/09/13 15:06:09 UTC

How to query '%' character using LIKE operator in Cassandra 3.7?

Hi Cassandra guys,

I use Cassandra 3.7 and wondering how to use ‘%’ as a simple char in a search pattern.
Here is my test script:

DROP keyspace if exists kmv;
CREATE keyspace if not exists kmv WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor':'1'} ;
USE kmv;
CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, c1));
CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
'analyzed' : 'true',
'analyzer_class' : 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
'case_sensitive' : 'false', 
'mode' : 'CONTAINS'
};

INSERT into kmv (id, c1, c2) values (1, 'f22', 'qwe%asd');
INSERT into kmv (id, c1, c2) values (2, 'f22', '%asd');
INSERT into kmv (id, c1, c2) values (3, 'f22', 'asd%');
INSERT into kmv (id, c1, c2) values (4, 'f22', 'asd%1');
INSERT into kmv (id, c1, c2) values (5, 'f22', 'qweasd');

SELECT c2 from kmv.kmv where c2 like ‘_pattern_';

_pattern_ '%%%' finds all columns that contain %.
How to find columns that start form ‘%’ or ‘%a’?
How to find columns that end with ‘%’?
What about more complex patterns: '%qwe%a%sd%’? How to differentiate ‘%’ char form % as a command symbol? (Also there is a related issue CASSANDRA-12573).


Thanks,
Mikhail

Re: How to query '%' character using LIKE operator in Cassandra 3.7?

Posted by DuyHai Doan <do...@gmail.com>.
Ok I get around the issue about %w%a%

So this will be interpreter first by the CQL parser as LIKE CONTAINS with
searched term = w%a

And then things get complicated

1) if you're using NonTokeninzingAnalyzer or NoOpAnalyzer, everything is
fine, the % in 'w%a' is interpreted as simple literal and not wildcard
character

2) if you're using StandardAnalyzer, it's an entirely different story.
During the parsing of the search predicates by the query planer, the term
'w%a' is passed to the analyzer (StandardAnalyzer here):
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/Operation.java#L303-L323

The StandardAnalyzer is tokenizing the search term so 'w%a' becomes 2
distinct token, 'w' OR 'a'. Why does it ignore the % ? Because according to
Unicode line breaking rule, % is a separator, read here:
http://www.unicode.org/Public/UNIDATA/LineBreak.txt

Nowhere in the source code we can see this, in fact you'll need to look
into the JFlex grammar file
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/analyzer/StandardTokenizerImpl.jflex
to see a reference to Unicode word breaking rules ...

So indeed when using StandardAnalyzer, any % character will be interpreter
as a separator so our LIKE '%w%a%' is indeed transformed into a LIKE '%w%'
OR LIKE '%a%' e.g all words containing 'w' OR 'a', irrespective of their
relative position to each other ...

Why is it an OR predicate and not an AND predicate ? The answer is a
comment in the source code here:
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/Operation.java#L290-L295

I'll end by a famous sentence : "It is not a bug, it is a feature"  :D

On Thu, Sep 15, 2016 at 4:11 PM, DuyHai Doan <do...@gmail.com> wrote:

> Currently SASI can only understand the % for the beginning (suffix) or
> ending (prefix) position.
>
> Any expression containing the % in the middle like %w%a% will not be
> interpreter by SASI as wildcard.
>
> %w%a% will translate into "Give me all results containing w%a
>
> On Thu, Sep 15, 2016 at 3:58 PM, Mikhail Krupitskiy <
> mikhail.krupitskiy@jetbrains.com> wrote:
>
>> Thank you for the investigation. Will wait for a fix and news.
>>
>> Probably it’s not a directly related question but what do you think about
>> CASSANDRA-12573? Let me know if it’s better to create a separate thread for
>> it.
>>
>> Thanks,
>> Mikhail
>>
>>
>> On 15 Sep 2016, at 16:02, DuyHai Doan <do...@gmail.com> wrote:
>>
>> Ok so I've found the source of the issue, it's pretty well hidden because
>> it is NOT in the SASI source code directly.
>>
>> Here is the method where C* determines what kind of LIKE expression
>> you're using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES)
>>
>> https://github.com/apache/cassandra/blob/trunk/src/java/org/
>> apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778
>>
>> As you can see, it's pretty simple, maybe too simple. Indeed, they forget
>> to remove escape character BEFORE doing the matching so if your search is LIKE
>> '%%esc%', the detected expression is LIKE_CONTAINS.
>>
>> A possible fix would be:
>>
>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on
>> the column data type)
>> 2) remove the escape character e.g. before parsing OR use some advanced
>> regex to exclude the %% from parsing e.g
>>
>> Step 2) is dead easy but step 1) is harder because I don't know if
>> converting the bytebuffer into String at this stage of the CQL parser is
>> expensive or not (in term of computation)
>>
>> Let me try a patch
>>
>>
>>
>> On Wed, Sep 14, 2016 at 9:42 AM, DuyHai Doan <do...@gmail.com>
>> wrote:
>>
>>> Ok you're right, I get your point
>>>
>>> LIKE '%%esc%' --> startWith('%esc')
>>>
>>> LIKE 'escape%%' -->  = 'escape%'
>>>
>>> What I strongly suspect is that in the source code of SASI, we parse the
>>> % xxx % expression BEFORE applying escape. That will explain the observed
>>> behavior. E.g:
>>>
>>> LIKE '%%esc%'  parsed as %xxx% where xxx = %esc
>>>
>>> LIKE 'escape%%' parsed as xxx% where xxx =escape%
>>>
>>> Let me check in the source code and try to reproduce the issue
>>>
>>>
>>>
>>> On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy <
>>> mikhail.krupitskiy@jetbrains.com> wrote:
>>>
>>>> Looks like we have different understanding of what results are expected.
>>>> I based my understanding on http://docs.datastax.com/en
>>>> /cql/3.3/cql/cql_using/useSASIIndex.html
>>>> According to the doc ‘esc’ is a pattern for exact match and I guess
>>>> that there is no semantical difference between two LIKE patterns (both of
>>>> patterns should be treated as ‘exact match'): ‘%%esc’ and ‘esc’.
>>>>
>>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results
>>>> *containing* '%esc' so *%esc*apeme is a possible match and also escape
>>>> *%esc*
>>>>
>>>> Why ‘containing’? I expect that it should be ’starting’..
>>>>
>>>>
>>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results
>>>> *starting* with 'escape%' so *escape%*me is a valid result and also
>>>> *escape%*esc
>>>>
>>>> Why ’starting’? I expect that it should be ‘exact matching’.
>>>>
>>>> Also I expect that “ LIKE ‘%s%sc%’ ” will return ‘escape%esc’ but it
>>>> returns nothing (CASSANDRA-12573).
>>>>
>>>> What I’m missing?
>>>>
>>>> Thanks,
>>>> Mikhail
>>>>
>>>> On 13 Sep 2016, at 19:31, DuyHai Doan <do...@gmail.com> wrote:
>>>>
>>>> CREATE CUSTOM INDEX ON test.escape(val) USING '
>>>> org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode':
>>>> 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sa
>>>> si.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false'};
>>>>
>>>> I don't see any problem in the results you got
>>>>
>>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results
>>>> *containing* '%esc' so *%esc*apeme is a possible match and also escape
>>>> *%esc*
>>>>
>>>> Why ‘containing’? I expect that it should be ’starting’..
>>>>
>>>>
>>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results
>>>> *starting* with 'escape%' so *escape%*me is a valid result and also
>>>> *escape%*esc
>>>>
>>>> Why ’starting’? I expect that it should be ‘exact matching’.
>>>>
>>>>
>>>> On Tue, Sep 13, 2016 at 5:58 PM, Mikhail Krupitskiy <
>>>> mikhail.krupitskiy@jetbrains.com> wrote:
>>>>
>>>>> Thanks for the reply.
>>>>> Could you please provide what index definition did you use?
>>>>> With the index from my script I get the following results:
>>>>>
>>>>> cqlsh:test> select * from escape;
>>>>>
>>>>>  id | val
>>>>> ----+-----------
>>>>>   1 | %escapeme
>>>>>   2 | escape%me
>>>>> *  3 | escape%esc*
>>>>>
>>>>> Contains search
>>>>>
>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>>>>
>>>>>  id | val
>>>>> ----+-----------
>>>>>   1 | %escapeme
>>>>>   3
>>>>> * | escape%esc*(2 rows)
>>>>>
>>>>>
>>>>> Prefix search
>>>>>
>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>>>>
>>>>>  id | val
>>>>> ----+-----------
>>>>>   2 | escape%me
>>>>>   3
>>>>> * | escape%esc*
>>>>>
>>>>> Thanks,
>>>>> Mikhail
>>>>>
>>>>> On 13 Sep 2016, at 18:16, DuyHai Doan <do...@gmail.com> wrote:
>>>>>
>>>>> Use % to escape %
>>>>>
>>>>> cqlsh:test> select * from escape;
>>>>>
>>>>>  id | val
>>>>> ----+-----------
>>>>>   1 | %escapeme
>>>>>   2 | escape%me
>>>>>
>>>>>
>>>>> Contains search
>>>>>
>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>>>>
>>>>>  id | val
>>>>> ----+-----------
>>>>>   1 | %escapeme
>>>>>
>>>>> (1 rows)
>>>>>
>>>>>
>>>>> Prefix search
>>>>>
>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>>>>
>>>>>  id | val
>>>>> ----+-----------
>>>>>   2 | escape%me
>>>>>
>>>>> On Tue, Sep 13, 2016 at 5:06 PM, Mikhail Krupitskiy <
>>>>> mikhail.krupitskiy@jetbrains.com> wrote:
>>>>>
>>>>>> Hi Cassandra guys,
>>>>>>
>>>>>> I use Cassandra 3.7 and wondering how to use ‘%’ as a simple char in
>>>>>> a search pattern.
>>>>>> Here is my test script:
>>>>>>
>>>>>> DROP keyspace if exists kmv;
>>>>>> CREATE keyspace if not exists kmv WITH REPLICATION = { 'class' :
>>>>>> 'SimpleStrategy', 'replication_factor':'1'} ;
>>>>>> USE kmv;
>>>>>> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY
>>>>>> KEY(id, c1));
>>>>>> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING '
>>>>>> org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>>>>>> 'analyzed' : 'true',
>>>>>> 'analyzer_class' : 'org.apache.cassandra.index.sa
>>>>>> si.analyzer.NonTokenizingAnalyzer',
>>>>>> 'case_sensitive' : 'false',
>>>>>> 'mode' : 'CONTAINS'
>>>>>> };
>>>>>>
>>>>>> INSERT into kmv (id, c1, c2) values (1, 'f22', 'qwe%asd');
>>>>>> INSERT into kmv (id, c1, c2) values (2, 'f22', '%asd');
>>>>>> INSERT into kmv (id, c1, c2) values (3, 'f22', 'asd%');
>>>>>> INSERT into kmv (id, c1, c2) values (4, 'f22', 'asd%1');
>>>>>> INSERT into kmv (id, c1, c2) values (5, 'f22', 'qweasd');
>>>>>>
>>>>>> SELECT c2 from kmv.kmv where c2 like ‘_pattern_';
>>>>>>
>>>>>> _pattern_ '%%%' finds all columns that contain %.
>>>>>> How to find columns that start form ‘%’ or ‘%a’?
>>>>>> How to find columns that end with ‘%’?
>>>>>> What about more complex patterns: '%qwe%a%sd%’? How to differentiate
>>>>>> ‘%’ char form % as a command symbol? (Also there is a related issue
>>>>>> CASSANDRA-12573).
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Mikhail
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>

Re: How to query '%' character using LIKE operator in Cassandra 3.7?

Posted by DuyHai Doan <do...@gmail.com>.
Currently SASI can only understand the % for the beginning (suffix) or
ending (prefix) position.

Any expression containing the % in the middle like %w%a% will not be
interpreter by SASI as wildcard.

%w%a% will translate into "Give me all results containing w%a

On Thu, Sep 15, 2016 at 3:58 PM, Mikhail Krupitskiy <
mikhail.krupitskiy@jetbrains.com> wrote:

> Thank you for the investigation. Will wait for a fix and news.
>
> Probably it’s not a directly related question but what do you think about
> CASSANDRA-12573? Let me know if it’s better to create a separate thread for
> it.
>
> Thanks,
> Mikhail
>
>
> On 15 Sep 2016, at 16:02, DuyHai Doan <do...@gmail.com> wrote:
>
> Ok so I've found the source of the issue, it's pretty well hidden because
> it is NOT in the SASI source code directly.
>
> Here is the method where C* determines what kind of LIKE expression you're
> using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES)
>
> https://github.com/apache/cassandra/blob/trunk/src/java/
> org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#
> L733-L778
>
> As you can see, it's pretty simple, maybe too simple. Indeed, they forget
> to remove escape character BEFORE doing the matching so if your search is LIKE
> '%%esc%', the detected expression is LIKE_CONTAINS.
>
> A possible fix would be:
>
> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on
> the column data type)
> 2) remove the escape character e.g. before parsing OR use some advanced
> regex to exclude the %% from parsing e.g
>
> Step 2) is dead easy but step 1) is harder because I don't know if
> converting the bytebuffer into String at this stage of the CQL parser is
> expensive or not (in term of computation)
>
> Let me try a patch
>
>
>
> On Wed, Sep 14, 2016 at 9:42 AM, DuyHai Doan <do...@gmail.com> wrote:
>
>> Ok you're right, I get your point
>>
>> LIKE '%%esc%' --> startWith('%esc')
>>
>> LIKE 'escape%%' -->  = 'escape%'
>>
>> What I strongly suspect is that in the source code of SASI, we parse the
>> % xxx % expression BEFORE applying escape. That will explain the observed
>> behavior. E.g:
>>
>> LIKE '%%esc%'  parsed as %xxx% where xxx = %esc
>>
>> LIKE 'escape%%' parsed as xxx% where xxx =escape%
>>
>> Let me check in the source code and try to reproduce the issue
>>
>>
>>
>> On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy <
>> mikhail.krupitskiy@jetbrains.com> wrote:
>>
>>> Looks like we have different understanding of what results are expected.
>>> I based my understanding on http://docs.datastax.com/en
>>> /cql/3.3/cql/cql_using/useSASIIndex.html
>>> According to the doc ‘esc’ is a pattern for exact match and I guess that
>>> there is no semantical difference between two LIKE patterns (both of
>>> patterns should be treated as ‘exact match'): ‘%%esc’ and ‘esc’.
>>>
>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results
>>> *containing* '%esc' so *%esc*apeme is a possible match and also escape
>>> *%esc*
>>>
>>> Why ‘containing’? I expect that it should be ’starting’..
>>>
>>>
>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results
>>> *starting* with 'escape%' so *escape%*me is a valid result and also
>>> *escape%*esc
>>>
>>> Why ’starting’? I expect that it should be ‘exact matching’.
>>>
>>> Also I expect that “ LIKE ‘%s%sc%’ ” will return ‘escape%esc’ but it
>>> returns nothing (CASSANDRA-12573).
>>>
>>> What I’m missing?
>>>
>>> Thanks,
>>> Mikhail
>>>
>>> On 13 Sep 2016, at 19:31, DuyHai Doan <do...@gmail.com> wrote:
>>>
>>> CREATE CUSTOM INDEX ON test.escape(val) USING '
>>> org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode':
>>> 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sa
>>> si.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false'};
>>>
>>> I don't see any problem in the results you got
>>>
>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results
>>> *containing* '%esc' so *%esc*apeme is a possible match and also escape
>>> *%esc*
>>>
>>> Why ‘containing’? I expect that it should be ’starting’..
>>>
>>>
>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results
>>> *starting* with 'escape%' so *escape%*me is a valid result and also
>>> *escape%*esc
>>>
>>> Why ’starting’? I expect that it should be ‘exact matching’.
>>>
>>>
>>> On Tue, Sep 13, 2016 at 5:58 PM, Mikhail Krupitskiy <
>>> mikhail.krupitskiy@jetbrains.com> wrote:
>>>
>>>> Thanks for the reply.
>>>> Could you please provide what index definition did you use?
>>>> With the index from my script I get the following results:
>>>>
>>>> cqlsh:test> select * from escape;
>>>>
>>>>  id | val
>>>> ----+-----------
>>>>   1 | %escapeme
>>>>   2 | escape%me
>>>> *  3 | escape%esc*
>>>>
>>>> Contains search
>>>>
>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>>>
>>>>  id | val
>>>> ----+-----------
>>>>   1 | %escapeme
>>>>   3
>>>> * | escape%esc*(2 rows)
>>>>
>>>>
>>>> Prefix search
>>>>
>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>>>
>>>>  id | val
>>>> ----+-----------
>>>>   2 | escape%me
>>>>   3
>>>> * | escape%esc*
>>>>
>>>> Thanks,
>>>> Mikhail
>>>>
>>>> On 13 Sep 2016, at 18:16, DuyHai Doan <do...@gmail.com> wrote:
>>>>
>>>> Use % to escape %
>>>>
>>>> cqlsh:test> select * from escape;
>>>>
>>>>  id | val
>>>> ----+-----------
>>>>   1 | %escapeme
>>>>   2 | escape%me
>>>>
>>>>
>>>> Contains search
>>>>
>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>>>
>>>>  id | val
>>>> ----+-----------
>>>>   1 | %escapeme
>>>>
>>>> (1 rows)
>>>>
>>>>
>>>> Prefix search
>>>>
>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>>>
>>>>  id | val
>>>> ----+-----------
>>>>   2 | escape%me
>>>>
>>>> On Tue, Sep 13, 2016 at 5:06 PM, Mikhail Krupitskiy <
>>>> mikhail.krupitskiy@jetbrains.com> wrote:
>>>>
>>>>> Hi Cassandra guys,
>>>>>
>>>>> I use Cassandra 3.7 and wondering how to use ‘%’ as a simple char in a
>>>>> search pattern.
>>>>> Here is my test script:
>>>>>
>>>>> DROP keyspace if exists kmv;
>>>>> CREATE keyspace if not exists kmv WITH REPLICATION = { 'class' :
>>>>> 'SimpleStrategy', 'replication_factor':'1'} ;
>>>>> USE kmv;
>>>>> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY
>>>>> KEY(id, c1));
>>>>> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING '
>>>>> org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>>>>> 'analyzed' : 'true',
>>>>> 'analyzer_class' : 'org.apache.cassandra.index.sa
>>>>> si.analyzer.NonTokenizingAnalyzer',
>>>>> 'case_sensitive' : 'false',
>>>>> 'mode' : 'CONTAINS'
>>>>> };
>>>>>
>>>>> INSERT into kmv (id, c1, c2) values (1, 'f22', 'qwe%asd');
>>>>> INSERT into kmv (id, c1, c2) values (2, 'f22', '%asd');
>>>>> INSERT into kmv (id, c1, c2) values (3, 'f22', 'asd%');
>>>>> INSERT into kmv (id, c1, c2) values (4, 'f22', 'asd%1');
>>>>> INSERT into kmv (id, c1, c2) values (5, 'f22', 'qweasd');
>>>>>
>>>>> SELECT c2 from kmv.kmv where c2 like ‘_pattern_';
>>>>>
>>>>> _pattern_ '%%%' finds all columns that contain %.
>>>>> How to find columns that start form ‘%’ or ‘%a’?
>>>>> How to find columns that end with ‘%’?
>>>>> What about more complex patterns: '%qwe%a%sd%’? How to differentiate
>>>>> ‘%’ char form % as a command symbol? (Also there is a related issue
>>>>> CASSANDRA-12573).
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Mikhail
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>
>

Re: How to query '%' character using LIKE operator in Cassandra 3.7?

Posted by Mikhail Krupitskiy <mi...@jetbrains.com>.
Thank you for the investigation. Will wait for a fix and news.

Probably it’s not a directly related question but what do you think about CASSANDRA-12573? Let me know if it’s better to create a separate thread for it.

Thanks,
Mikhail

> On 15 Sep 2016, at 16:02, DuyHai Doan <do...@gmail.com> wrote:
> 
> Ok so I've found the source of the issue, it's pretty well hidden because it is NOT in the SASI source code directly.
> 
> Here is the method where C* determines what kind of LIKE expression you're using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES)
> 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778 <https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778>
> 
> As you can see, it's pretty simple, maybe too simple. Indeed, they forget to remove escape character BEFORE doing the matching so if your search is LIKE '%%esc%', the detected expression is LIKE_CONTAINS.
> 
> A possible fix would be:
> 
> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on the column data type)
> 2) remove the escape character e.g. before parsing OR use some advanced regex to exclude the %% from parsing e.g
> 
> Step 2) is dead easy but step 1) is harder because I don't know if converting the bytebuffer into String at this stage of the CQL parser is expensive or not (in term of computation)
> 
> Let me try a patch  
> 
> 
> 
> On Wed, Sep 14, 2016 at 9:42 AM, DuyHai Doan <doanduyhai@gmail.com <ma...@gmail.com>> wrote:
> Ok you're right, I get your point
> 
> LIKE '%%esc%' --> startWith('%esc')
> 
> LIKE 'escape%%' -->  = 'escape%'
> 
> What I strongly suspect is that in the source code of SASI, we parse the % xxx % expression BEFORE applying escape. That will explain the observed behavior. E.g:
> 
> LIKE '%%esc%'  parsed as %xxx% where xxx = %esc
> 
> LIKE 'escape%%' parsed as xxx% where xxx =escape%
> 
> Let me check in the source code and try to reproduce the issue
> 
> 
> 
> On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy <mikhail.krupitskiy@jetbrains.com <ma...@jetbrains.com>> wrote:
> Looks like we have different understanding of what results are expected.
> I based my understanding on http://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html <http://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html>
> According to the doc ‘esc’ is a pattern for exact match and I guess that there is no semantical difference between two LIKE patterns (both of patterns should be treated as ‘exact match'): ‘%%esc’ and ‘esc’.
> 
>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results containing '%esc' so %escapeme is a possible match and also escape%esc
> Why ‘containing’? I expect that it should be ’starting’..
>> 
>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results starting with 'escape%' so escape%me is a valid result and also escape%esc
> Why ’starting’? I expect that it should be ‘exact matching’.
> 
> Also I expect that “ LIKE ‘%s%sc%’ ” will return ‘escape%esc’ but it returns nothing (CASSANDRA-12573).
> 
> What I’m missing?
> 
> Thanks,
> Mikhail
> 
>> On 13 Sep 2016, at 19:31, DuyHai Doan <doanduyhai@gmail.com <ma...@gmail.com>> wrote:
>> 
>> CREATE CUSTOM INDEX ON test.escape(val) USING 'org.apache.cassandra.index.sa <http://org.apache.cassandra.index.sa/>si.SASIIndex' WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sa <http://org.apache.cassandra.index.sa/>si.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false'};
>> 
>> I don't see any problem in the results you got
>> 
>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results containing '%esc' so %escapeme is a possible match and also escape%esc
> Why ‘containing’? I expect that it should be ’starting’..
>> 
>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results starting with 'escape%' so escape%me is a valid result and also escape%esc
> Why ’starting’? I expect that it should be ‘exact matching’.
> 
>> 
>> On Tue, Sep 13, 2016 at 5:58 PM, Mikhail Krupitskiy <mikhail.krupitskiy@jetbrains.com <ma...@jetbrains.com>> wrote:
>> Thanks for the reply.
>> Could you please provide what index definition did you use?
>> With the index from my script I get the following results:
>> 
>> cqlsh:test> select * from escape;
>> 
>>  id | val
>> ----+-----------
>>   1 | %escapeme
>>   2 | escape%me
>>   3 | escape%esc
>> 
>> Contains search
>> 
>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>> 
>>  id | val
>> ----+-----------
>>   1 | %escapeme
>>   3 | escape%esc
>> (2 rows)
>> 
>> 
>> Prefix search
>> 
>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>> 
>>  id | val
>> ----+-----------
>>   2 | escape%me
>>   3 | escape%esc
>> 
>> Thanks,
>> Mikhail 
>> 
>>> On 13 Sep 2016, at 18:16, DuyHai Doan <doanduyhai@gmail.com <ma...@gmail.com>> wrote:
>>> 
>>> Use % to escape %
>>> 
>>> cqlsh:test> select * from escape;
>>> 
>>>  id | val
>>> ----+-----------
>>>   1 | %escapeme
>>>   2 | escape%me
>>> 
>>> 
>>> Contains search
>>> 
>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>> 
>>>  id | val
>>> ----+-----------
>>>   1 | %escapeme
>>> 
>>> (1 rows)
>>> 
>>> 
>>> Prefix search
>>> 
>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>> 
>>>  id | val
>>> ----+-----------
>>>   2 | escape%me
>>> 
>>> On Tue, Sep 13, 2016 at 5:06 PM, Mikhail Krupitskiy <mikhail.krupitskiy@jetbrains.com <ma...@jetbrains.com>> wrote:
>>> Hi Cassandra guys,
>>> 
>>> I use Cassandra 3.7 and wondering how to use ‘%’ as a simple char in a search pattern.
>>> Here is my test script:
>>> 
>>> DROP keyspace if exists kmv;
>>> CREATE keyspace if not exists kmv WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor':'1'} ;
>>> USE kmv;
>>> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, c1));
>>> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 'org.apache.cassandra.index.sa <http://org.apache.cassandra.index.sa/>si.SASIIndex' WITH OPTIONS = {
>>> 'analyzed' : 'true',
>>> 'analyzer_class' : 'org.apache.cassandra.index.sa <http://org.apache.cassandra.index.sa/>si.analyzer.NonTokenizingAnalyzer',
>>> 'case_sensitive' : 'false',
>>> 'mode' : 'CONTAINS'
>>> };
>>> 
>>> INSERT into kmv (id, c1, c2) values (1, 'f22', 'qwe%asd');
>>> INSERT into kmv (id, c1, c2) values (2, 'f22', '%asd');
>>> INSERT into kmv (id, c1, c2) values (3, 'f22', 'asd%');
>>> INSERT into kmv (id, c1, c2) values (4, 'f22', 'asd%1');
>>> INSERT into kmv (id, c1, c2) values (5, 'f22', 'qweasd');
>>> 
>>> SELECT c2 from kmv.kmv where c2 like ‘_pattern_';
>>> 
>>> _pattern_ '%%%' finds all columns that contain %.
>>> How to find columns that start form ‘%’ or ‘%a’?
>>> How to find columns that end with ‘%’?
>>> What about more complex patterns: '%qwe%a%sd%’? How to differentiate ‘%’ char form % as a command symbol? (Also there is a related issue CASSANDRA-12573).
>>> 
>>> 
>>> Thanks,
>>> Mikhail
>>> 
>> 
>> 
> 
> 
> 


Re: How to query '%' character using LIKE operator in Cassandra 3.7?

Posted by Mikhail Krupitskiy <mi...@jetbrains.com>.
Please see my comments inline.

Thanks,
Mikhail
> On 26 Sep 2016, at 17:07, DuyHai Doan <do...@gmail.com> wrote:
> 
> "In the current implementation (‘%’ could be a wildcard only at the start/end of a term) I guess it should be ’ENDS with ‘%escape’ ‘." 
> 
> --> Yes in the current impl, it means ENDS WITH '%escape' but we want SASI to understand the %% as an escape for % so the goal is that SASI understands LIKE '%%escape' as EQUALS TO '%escape'. Am I correct ?
I guess that the goal is to define a way to use ‘%’ as a simple char.
LIKE '%escape' - ENDS WITH 'escape'
LIKE '%%escape' - EQUALS TO '%escape’
LIKE '%%escape%' - STARTS WITH '%escape’

LIKE ‘%%%escape’ - undefined in general case
LIKE ‘%%%escape’ - ENDS WITH “%escape” in a case when we know that a wildcard could be only at the start/end.
> 
> "Moreover all terms that contains single ‘%’ somewhere in the middle should cause an exception."
> 
> --> Not necessarily, sometime people may want to search text pattern containing the literal %. Imagine the text "this year the average income has increase by 10%". People may want to search for "10%”.
If someone wants to search for ’10%’ then he should escape the ‘%’ char: like “10%%”.
> 
> 
> 
> "BUT may be it’s better to make escaping more universal to support a future possible case where a wildcard could be placed in the middle of a term too?"
> 
> --> I guess universal escaping for % is the cleaner and better solution. However it may involve some complex regular expression. I'm not sure that input.replaceAll("%%", "%") trick would work for any cases.
As I wrote I don’t think that it’s possible to do universal escaping using ‘%’ as an escape char (a char to escape wildcard char to make it a simple char semantically) and as wildcard at the same time.
I suggest to use “\” as an escape char.
Also I don’t know enough about Cassandra’s internals to estimate how universal escaping will affect performance.
It really looks like a better solution for Cassandra users.
> 
> And we also need to define when we want to detect operation type (LIKE_PREFIX, LIKE_SUFFIX, LIKE_CONTAINS, EQUAL) ? 
> 
> Should we detect operation type BEFORE escaping or AFTER escaping ?
As I understand ‘escaping' will be done by users. 
So on DB level we get an already escaped string from a request and it’s possible to know which symbol is a wildcard and which is just a char.
I guess that Cassandra should parse (unescape?) an incoming string to define wildcards positions and simple chars positions and then define an operation type.

 
> 
> 
> 
> 
> 
> On Mon, Sep 26, 2016 at 3:54 PM, Mikhail Krupitskiy <mikhail.krupitskiy@jetbrains.com <ma...@jetbrains.com>> wrote:
>> LIKE '%%%escape' --> EQUALS TO '%%escape' ???
> In the current implementation (‘%’ could be a wildcard only at the start/end of a term) I guess it should be ’ENDS with ‘%escape’ ‘.
> Moreover all terms that contains single ‘%’ somewhere in the middle should cause an exception.
> BUT may be it’s better to make escaping more universal to support a future possible case where a wildcard could be placed in the middle of a term too?
> 
> Thanks,
> Mikhail 
>> On 24 Sep 2016, at 21:09, DuyHai Doan <doanduyhai@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Reminder, right now, the % character is only interpreted as wildcard IF AND ONLY IF it is the first/last character of the searched term
>> 
>> 
>> LIKE '%escape' --> ENDS WITH 'escape' 
>> 
>> If we use % to escape %,
>> LIKE '%%escape' -->  EQUALS TO '%escape'
>> 
>> LIKE '%%%escape' --> EQUALS TO '%%escape' ???
>> 
>> 
>> 
>> 
>> On Fri, Sep 23, 2016 at 5:02 PM, Mikhail Krupitskiy <mikhail.krupitskiy@jetbrains.com <ma...@jetbrains.com>> wrote:
>> Hi, Jim,
>> 
>> What pattern should be used to search “ends with  ‘%escape’ “ with your conception?
>> 
>> Thanks,
>> Mikhail
>> 
>>> On 22 Sep 2016, at 18:51, Jim Ancona <jim@anconafamily.com <ma...@anconafamily.com>> wrote:
>>> 
>>> To answer DuyHai's question without introducing new syntax, I'd suggest:
>>>> LIKE '%%%escape' means STARTS WITH '%' AND ENDS WITH 'escape' 
>>> So the first two %'s are translated to a literal, non-wildcard % and the third % is a wildcard because it's not doubled.
>>> 
>>> Jim
>>> 
>>> On Thu, Sep 22, 2016 at 11:40 AM, Mikhail Krupitskiy <mikhail.krupitskiy@jetbrains.com <ma...@jetbrains.com>> wrote:
>>> I guess that it should be similar to how it is done in SQL for LIKE patterns.
>>> 
>>> You can introduce an escape character, e.g. ‘\’.
>>> Examples:
>>> ‘%’ - any string
>>> ‘\%’ - equal to ‘%’ character
>>> ‘\%foo%’ - starts from ‘%foo’
>>> ‘%%%escape’ - ends with ’escape’
>>> ‘\%%’ - starts from ‘%’
>>> ‘\\\%%’ - starts from ‘\%’ .
>>> 
>>> What do you think?
>>> 
>>> Thanks,
>>> Mikhail
>>>> On 22 Sep 2016, at 16:47, DuyHai Doan <doanduyhai@gmail.com <ma...@gmail.com>> wrote:
>>>> 
>>>> Hello Mikhail
>>>> 
>>>> It's more complicated that it seems
>>>> 
>>>> LIKE '%%escape' means  EQUAL TO '%escape'
>>>> 
>>>> LIKE '%escape' means ENDS WITH 'escape'
>>>> 
>>>> What's about LIKE '%%%escape' ????
>>>> 
>>>> How should we treat this case ? Replace %% by % at the beginning of the searched term ??
>>>> 
>>>> 
>>>> 
>>>> On Thu, Sep 22, 2016 at 3:31 PM, Mikhail Krupitskiy <mikhail.krupitskiy@jetbrains.com <ma...@jetbrains.com>> wrote:
>>>> Hi!
>>>> 
>>>> We’ve talked about two items:
>>>> 1) ‘%’ as a wildcard in the middle of LIKE pattern.
>>>> 2) How to escape ‘%’ to be able to find strings with the ‘%’ char with help of LIKE.
>>>> 
>>>> Item #1was resolved as CASSANDRA-12573.
>>>> 
>>>> Regarding to item #2: you said the following:
>>>>> A possible fix would be:
>>>>> 
>>>>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on the column data type)
>>>>> 2) remove the escape character e.g. before parsing OR use some advanced regex to exclude the %% from parsing e.g
>>>>> 
>>>>> Step 2) is dead easy but step 1) is harder because I don't know if converting the bytebuffer into String at this stage of the CQL parser is expensive or not (in term of computation)
>>>>> 
>>>>> Let me try a patch 
>>>> 
>>>> So is there any update on this?
>>>> 
>>>> Thanks,
>>>> Mikhail
>>>> 
>>>> 
>>>>> On 20 Sep 2016, at 18:38, Mikhail Krupitskiy <mikhail.krupitskiy@jetbrains.com <ma...@jetbrains.com>> wrote:
>>>>> 
>>>>> Hi!
>>>>> 
>>>>> Have you had a chance to try your patch or solve the issue in an other way? 
>>>>> 
>>>>> Thanks,
>>>>> Mikhail
>>>>>> On 15 Sep 2016, at 16:02, DuyHai Doan <doanduyhai@gmail.com <ma...@gmail.com>> wrote:
>>>>>> 
>>>>>> Ok so I've found the source of the issue, it's pretty well hidden because it is NOT in the SASI source code directly.
>>>>>> 
>>>>>> Here is the method where C* determines what kind of LIKE expression you're using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES)
>>>>>> 
>>>>>> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778 <https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778>
>>>>>> 
>>>>>> As you can see, it's pretty simple, maybe too simple. Indeed, they forget to remove escape character BEFORE doing the matching so if your search is LIKE '%%esc%', the detected expression is LIKE_CONTAINS.
>>>>>> 
>>>>>> A possible fix would be:
>>>>>> 
>>>>>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on the column data type)
>>>>>> 2) remove the escape character e.g. before parsing OR use some advanced regex to exclude the %% from parsing e.g
>>>>>> 
>>>>>> Step 2) is dead easy but step 1) is harder because I don't know if converting the bytebuffer into String at this stage of the CQL parser is expensive or not (in term of computation)
>>>>>> 
>>>>>> Let me try a patch  
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Wed, Sep 14, 2016 at 9:42 AM, DuyHai Doan <doanduyhai@gmail.com <ma...@gmail.com>> wrote:
>>>>>> Ok you're right, I get your point
>>>>>> 
>>>>>> LIKE '%%esc%' --> startWith('%esc')
>>>>>> 
>>>>>> LIKE 'escape%%' -->  = 'escape%'
>>>>>> 
>>>>>> What I strongly suspect is that in the source code of SASI, we parse the % xxx % expression BEFORE applying escape. That will explain the observed behavior. E.g:
>>>>>> 
>>>>>> LIKE '%%esc%'  parsed as %xxx% where xxx = %esc
>>>>>> 
>>>>>> LIKE 'escape%%' parsed as xxx% where xxx =escape%
>>>>>> 
>>>>>> Let me check in the source code and try to reproduce the issue
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy <mikhail.krupitskiy@jetbrains.com <ma...@jetbrains.com>> wrote:
>>>>>> Looks like we have different understanding of what results are expected.
>>>>>> I based my understanding on http://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html <http://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html>
>>>>>> According to the doc ‘esc’ is a pattern for exact match and I guess that there is no semantical difference between two LIKE patterns (both of patterns should be treated as ‘exact match'): ‘%%esc’ and ‘esc’.
>>>>>> 
>>>>>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results containing '%esc' so %escapeme is a possible match and also escape%esc
>>>>>> Why ‘containing’? I expect that it should be ’starting’..
>>>>>>> 
>>>>>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results starting with 'escape%' so escape%me is a valid result and also escape%esc
>>>>>> Why ’starting’? I expect that it should be ‘exact matching’.
>>>>>> 
>>>>>> Also I expect that “ LIKE ‘%s%sc%’ ” will return ‘escape%esc’ but it returns nothing (CASSANDRA-12573).
>>>>>> 
>>>>>> What I’m missing?
>>>>>> 
>>>>>> Thanks,
>>>>>> Mikhail
>>>>>> 
>>>>>>> On 13 Sep 2016, at 19:31, DuyHai Doan <doanduyhai@gmail.com <ma...@gmail.com>> wrote:
>>>>>>> 
>>>>>>> CREATE CUSTOM INDEX ON test.escape(val) USING 'org.apache.cassandra.index.sa <http://org.apache.cassandra.index.sa/>si.SASIIndex' WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sa <http://org.apache.cassandra.index.sa/>si.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false'};
>>>>>>> 
>>>>>>> I don't see any problem in the results you got
>>>>>>> 
>>>>>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results containing '%esc' so %escapeme is a possible match and also escape%esc
>>>>>> Why ‘containing’? I expect that it should be ’starting’..
>>>>>>> 
>>>>>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results starting with 'escape%' so escape%me is a valid result and also escape%esc
>>>>>> Why ’starting’? I expect that it should be ‘exact matching’.
>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Sep 13, 2016 at 5:58 PM, Mikhail Krupitskiy <mikhail.krupitskiy@jetbrains.com <ma...@jetbrains.com>> wrote:
>>>>>>> Thanks for the reply.
>>>>>>> Could you please provide what index definition did you use?
>>>>>>> With the index from my script I get the following results:
>>>>>>> 
>>>>>>> cqlsh:test> select * from escape;
>>>>>>> 
>>>>>>>  id | val
>>>>>>> ----+-----------
>>>>>>>   1 | %escapeme
>>>>>>>   2 | escape%me
>>>>>>>   3 | escape%esc
>>>>>>> 
>>>>>>> Contains search
>>>>>>> 
>>>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>>>>>> 
>>>>>>>  id | val
>>>>>>> ----+-----------
>>>>>>>   1 | %escapeme
>>>>>>>   3 | escape%esc
>>>>>>> (2 rows)
>>>>>>> 
>>>>>>> 
>>>>>>> Prefix search
>>>>>>> 
>>>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>>>>>> 
>>>>>>>  id | val
>>>>>>> ----+-----------
>>>>>>>   2 | escape%me
>>>>>>>   3 | escape%esc
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Mikhail 
>>>>>>> 
>>>>>>>> On 13 Sep 2016, at 18:16, DuyHai Doan <doanduyhai@gmail.com <ma...@gmail.com>> wrote:
>>>>>>>> 
>>>>>>>> Use % to escape %
>>>>>>>> 
>>>>>>>> cqlsh:test> select * from escape;
>>>>>>>> 
>>>>>>>>  id | val
>>>>>>>> ----+-----------
>>>>>>>>   1 | %escapeme
>>>>>>>>   2 | escape%me
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Contains search
>>>>>>>> 
>>>>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>>>>>>> 
>>>>>>>>  id | val
>>>>>>>> ----+-----------
>>>>>>>>   1 | %escapeme
>>>>>>>> 
>>>>>>>> (1 rows)
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Prefix search
>>>>>>>> 
>>>>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>>>>>>> 
>>>>>>>>  id | val
>>>>>>>> ----+-----------
>>>>>>>>   2 | escape%me
>>>>>>>> 
>>>>>>>> On Tue, Sep 13, 2016 at 5:06 PM, Mikhail Krupitskiy <mikhail.krupitskiy@jetbrains.com <ma...@jetbrains.com>> wrote:
>>>>>>>> Hi Cassandra guys,
>>>>>>>> 
>>>>>>>> I use Cassandra 3.7 and wondering how to use ‘%’ as a simple char in a search pattern.
>>>>>>>> Here is my test script:
>>>>>>>> 
>>>>>>>> DROP keyspace if exists kmv;
>>>>>>>> CREATE keyspace if not exists kmv WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor':'1'} ;
>>>>>>>> USE kmv;
>>>>>>>> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, c1));
>>>>>>>> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 'org.apache.cassandra.index.sa <http://org.apache.cassandra.index.sa/>si.SASIIndex' WITH OPTIONS = {
>>>>>>>> 'analyzed' : 'true',
>>>>>>>> 'analyzer_class' : 'org.apache.cassandra.index.sa <http://org.apache.cassandra.index.sa/>si.analyzer.NonTokenizingAnalyzer',
>>>>>>>> 'case_sensitive' : 'false',
>>>>>>>> 'mode' : 'CONTAINS'
>>>>>>>> };
>>>>>>>> 
>>>>>>>> INSERT into kmv (id, c1, c2) values (1, 'f22', 'qwe%asd');
>>>>>>>> INSERT into kmv (id, c1, c2) values (2, 'f22', '%asd');
>>>>>>>> INSERT into kmv (id, c1, c2) values (3, 'f22', 'asd%');
>>>>>>>> INSERT into kmv (id, c1, c2) values (4, 'f22', 'asd%1');
>>>>>>>> INSERT into kmv (id, c1, c2) values (5, 'f22', 'qweasd');
>>>>>>>> 
>>>>>>>> SELECT c2 from kmv.kmv where c2 like ‘_pattern_';
>>>>>>>> 
>>>>>>>> _pattern_ '%%%' finds all columns that contain %.
>>>>>>>> How to find columns that start form ‘%’ or ‘%a’?
>>>>>>>> How to find columns that end with ‘%’?
>>>>>>>> What about more complex patterns: '%qwe%a%sd%’? How to differentiate ‘%’ char form % as a command symbol? (Also there is a related issue CASSANDRA-12573).
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Mikhail
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
> 
> 


Re: How to query '%' character using LIKE operator in Cassandra 3.7?

Posted by DuyHai Doan <do...@gmail.com>.
"In the current implementation (‘%’ could be a wildcard only at the
start/end of a term) I guess it should be ’ENDS with ‘%escape’ ‘."

--> Yes in the current impl, it means ENDS WITH '%escape' but we want SASI
to understand the %% as an escape for % so the goal is that SASI
understands LIKE '%%escape' as EQUALS TO '%escape'. Am I correct ?

"Moreover all terms that contains single ‘%’ somewhere in the middle should
cause an exception."

--> Not necessarily, sometime people may want to search text pattern
containing the literal %. Imagine the text "this year the average income
has increase by 10%". People may want to search for "10%".


"BUT may be it’s better to make escaping more universal to support a future
possible case where a wildcard could be placed in the middle of a term too?"

--> I guess universal escaping for % is the cleaner and better solution.
However it may involve some complex regular expression. I'm not sure that
input.replaceAll("%%", "%") trick would work for any cases.

And we also need to define when we want to detect operation type
(LIKE_PREFIX, LIKE_SUFFIX, LIKE_CONTAINS, EQUAL) ?

Should we detect operation type BEFORE escaping or AFTER escaping ?





On Mon, Sep 26, 2016 at 3:54 PM, Mikhail Krupitskiy <
mikhail.krupitskiy@jetbrains.com> wrote:

> LIKE '%%%escape' --> EQUALS TO '%%escape' ???
>
> In the current implementation (‘%’ could be a wildcard only at the
> start/end of a term) I guess it should be ’ENDS with ‘%escape’ ‘.
> Moreover all terms that contains single ‘%’ somewhere in the middle should
> cause an exception.
> BUT may be it’s better to make escaping more universal to support a future
> possible case where a wildcard could be placed in the middle of a term too?
>
> Thanks,
> Mikhail
>
> On 24 Sep 2016, at 21:09, DuyHai Doan <do...@gmail.com> wrote:
>
> Reminder, right now, the % character is only interpreted as wildcard IF
> AND ONLY IF it is the first/last character of the searched term
>
>
> LIKE '%escape' --> ENDS WITH 'escape'
>
> If we use % to escape %,
> LIKE '%%escape' -->  EQUALS TO '%escape'
>
> LIKE '%%%escape' --> EQUALS TO '%%escape' ???
>
>
>
>
> On Fri, Sep 23, 2016 at 5:02 PM, Mikhail Krupitskiy <
> mikhail.krupitskiy@jetbrains.com> wrote:
>
>> Hi, Jim,
>>
>> What pattern should be used to search “ends with  ‘%escape’ “ with your
>> conception?
>>
>> Thanks,
>> Mikhail
>>
>> On 22 Sep 2016, at 18:51, Jim Ancona <ji...@anconafamily.com> wrote:
>>
>> To answer DuyHai's question without introducing new syntax, I'd suggest:
>>
>> LIKE '%%%escape' means STARTS WITH '%' AND ENDS WITH 'escape'
>>
>> So the first two %'s are translated to a literal, non-wildcard % and the
>> third % is a wildcard because it's not doubled.
>>
>> Jim
>>
>> On Thu, Sep 22, 2016 at 11:40 AM, Mikhail Krupitskiy <
>> mikhail.krupitskiy@jetbrains.com> wrote:
>>
>>> I guess that it should be similar to how it is done in SQL for LIKE
>>> patterns.
>>>
>>> You can introduce an escape character, e.g. ‘\’.
>>> Examples:
>>> ‘%’ - any string
>>> ‘\%’ - equal to ‘%’ character
>>> ‘\%foo%’ - starts from ‘%foo’
>>> ‘%%%escape’ - ends with ’escape’
>>> ‘\%%’ - starts from ‘%’
>>> ‘\\\%%’ - starts from ‘\%’ .
>>>
>>> What do you think?
>>>
>>> Thanks,
>>> Mikhail
>>>
>>> On 22 Sep 2016, at 16:47, DuyHai Doan <do...@gmail.com> wrote:
>>>
>>> Hello Mikhail
>>>
>>> It's more complicated that it seems
>>>
>>> LIKE '%%escape' means  EQUAL TO '%escape'
>>>
>>> LIKE '%escape' means ENDS WITH 'escape'
>>>
>>> What's about LIKE '%%%escape' ????
>>>
>>> How should we treat this case ? Replace %% by % at the beginning of the
>>> searched term ??
>>>
>>>
>>>
>>> On Thu, Sep 22, 2016 at 3:31 PM, Mikhail Krupitskiy <
>>> mikhail.krupitskiy@jetbrains.com> wrote:
>>>
>>>> Hi!
>>>>
>>>> We’ve talked about two items:
>>>> 1) ‘%’ as a wildcard in the middle of LIKE pattern.
>>>> 2) How to escape ‘%’ to be able to find strings with the ‘%’ char with
>>>> help of LIKE.
>>>>
>>>> Item #1was resolved as CASSANDRA-12573.
>>>>
>>>> Regarding to item #2: you said the following:
>>>>
>>>> A possible fix would be:
>>>>
>>>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending
>>>> on the column data type)
>>>> 2) remove the escape character e.g. before parsing OR use some advanced
>>>> regex to exclude the %% from parsing e.g
>>>>
>>>> Step 2) is dead easy but step 1) is harder because I don't know if
>>>> converting the bytebuffer into String at this stage of the CQL parser is
>>>> expensive or not (in term of computation)
>>>>
>>>> Let me try a patch
>>>>
>>>> So is there any update on this?
>>>>
>>>> Thanks,
>>>> Mikhail
>>>>
>>>>
>>>> On 20 Sep 2016, at 18:38, Mikhail Krupitskiy <
>>>> mikhail.krupitskiy@jetbrains.com> wrote:
>>>>
>>>> Hi!
>>>>
>>>> Have you had a chance to try your patch or solve the issue in an other
>>>> way?
>>>>
>>>> Thanks,
>>>> Mikhail
>>>>
>>>> On 15 Sep 2016, at 16:02, DuyHai Doan <do...@gmail.com> wrote:
>>>>
>>>> Ok so I've found the source of the issue, it's pretty well hidden
>>>> because it is NOT in the SASI source code directly.
>>>>
>>>> Here is the method where C* determines what kind of LIKE expression
>>>> you're using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES)
>>>>
>>>> https://github.com/apache/cassandra/blob/trunk/src/java/org/
>>>> apache/cassandra/cql3/restrictions/SingleColumnRestriction.j
>>>> ava#L733-L778
>>>>
>>>> As you can see, it's pretty simple, maybe too simple. Indeed, they
>>>> forget to remove escape character BEFORE doing the matching so if your
>>>> search is LIKE '%%esc%', the detected expression is LIKE_CONTAINS.
>>>>
>>>> A possible fix would be:
>>>>
>>>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending
>>>> on the column data type)
>>>> 2) remove the escape character e.g. before parsing OR use some advanced
>>>> regex to exclude the %% from parsing e.g
>>>>
>>>> Step 2) is dead easy but step 1) is harder because I don't know if
>>>> converting the bytebuffer into String at this stage of the CQL parser is
>>>> expensive or not (in term of computation)
>>>>
>>>> Let me try a patch
>>>>
>>>>
>>>>
>>>> On Wed, Sep 14, 2016 at 9:42 AM, DuyHai Doan <do...@gmail.com>
>>>> wrote:
>>>>
>>>>> Ok you're right, I get your point
>>>>>
>>>>> LIKE '%%esc%' --> startWith('%esc')
>>>>>
>>>>> LIKE 'escape%%' -->  = 'escape%'
>>>>>
>>>>> What I strongly suspect is that in the source code of SASI, we parse
>>>>> the % xxx % expression BEFORE applying escape. That will explain the
>>>>> observed behavior. E.g:
>>>>>
>>>>> LIKE '%%esc%'  parsed as %xxx% where xxx = %esc
>>>>>
>>>>> LIKE 'escape%%' parsed as xxx% where xxx =escape%
>>>>>
>>>>> Let me check in the source code and try to reproduce the issue
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy <
>>>>> mikhail.krupitskiy@jetbrains.com> wrote:
>>>>>
>>>>>> Looks like we have different understanding of what results are
>>>>>> expected.
>>>>>> I based my understanding on http://docs.datastax.com/en
>>>>>> /cql/3.3/cql/cql_using/useSASIIndex.html
>>>>>> According to the doc ‘esc’ is a pattern for exact match and I guess
>>>>>> that there is no semantical difference between two LIKE patterns (both of
>>>>>> patterns should be treated as ‘exact match'): ‘%%esc’ and ‘esc’.
>>>>>>
>>>>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results
>>>>>> *containing* '%esc' so *%esc*apeme is a possible match and also
>>>>>> escape*%esc*
>>>>>>
>>>>>> Why ‘containing’? I expect that it should be ’starting’..
>>>>>>
>>>>>>
>>>>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results
>>>>>> *starting* with 'escape%' so *escape%*me is a valid result and also
>>>>>> *escape%*esc
>>>>>>
>>>>>> Why ’starting’? I expect that it should be ‘exact matching’.
>>>>>>
>>>>>> Also I expect that “ LIKE ‘%s%sc%’ ” will return ‘escape%esc’ but it
>>>>>> returns nothing (CASSANDRA-12573).
>>>>>>
>>>>>> What I’m missing?
>>>>>>
>>>>>> Thanks,
>>>>>> Mikhail
>>>>>>
>>>>>> On 13 Sep 2016, at 19:31, DuyHai Doan <do...@gmail.com> wrote:
>>>>>>
>>>>>> CREATE CUSTOM INDEX ON test.escape(val) USING '
>>>>>> org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode':
>>>>>> 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sa
>>>>>> si.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false'};
>>>>>>
>>>>>> I don't see any problem in the results you got
>>>>>>
>>>>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results
>>>>>> *containing* '%esc' so *%esc*apeme is a possible match and also
>>>>>> escape*%esc*
>>>>>>
>>>>>> Why ‘containing’? I expect that it should be ’starting’..
>>>>>>
>>>>>>
>>>>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results
>>>>>> *starting* with 'escape%' so *escape%*me is a valid result and also
>>>>>> *escape%*esc
>>>>>>
>>>>>> Why ’starting’? I expect that it should be ‘exact matching’.
>>>>>>
>>>>>>
>>>>>> On Tue, Sep 13, 2016 at 5:58 PM, Mikhail Krupitskiy <
>>>>>> mikhail.krupitskiy@jetbrains.com> wrote:
>>>>>>
>>>>>>> Thanks for the reply.
>>>>>>> Could you please provide what index definition did you use?
>>>>>>> With the index from my script I get the following results:
>>>>>>>
>>>>>>> cqlsh:test> select * from escape;
>>>>>>>
>>>>>>>  id | val
>>>>>>> ----+-----------
>>>>>>>   1 | %escapeme
>>>>>>>   2 | escape%me
>>>>>>> *  3 | escape%esc*
>>>>>>>
>>>>>>> Contains search
>>>>>>>
>>>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>>>>>>
>>>>>>>  id | val
>>>>>>> ----+-----------
>>>>>>>   1 | %escapeme
>>>>>>>   3
>>>>>>> * | escape%esc*(2 rows)
>>>>>>>
>>>>>>>
>>>>>>> Prefix search
>>>>>>>
>>>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>>>>>>
>>>>>>>  id | val
>>>>>>> ----+-----------
>>>>>>>   2 | escape%me
>>>>>>>   3
>>>>>>> * | escape%esc*
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Mikhail
>>>>>>>
>>>>>>> On 13 Sep 2016, at 18:16, DuyHai Doan <do...@gmail.com> wrote:
>>>>>>>
>>>>>>> Use % to escape %
>>>>>>>
>>>>>>> cqlsh:test> select * from escape;
>>>>>>>
>>>>>>>  id | val
>>>>>>> ----+-----------
>>>>>>>   1 | %escapeme
>>>>>>>   2 | escape%me
>>>>>>>
>>>>>>>
>>>>>>> Contains search
>>>>>>>
>>>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>>>>>>
>>>>>>>  id | val
>>>>>>> ----+-----------
>>>>>>>   1 | %escapeme
>>>>>>>
>>>>>>> (1 rows)
>>>>>>>
>>>>>>>
>>>>>>> Prefix search
>>>>>>>
>>>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>>>>>>
>>>>>>>  id | val
>>>>>>> ----+-----------
>>>>>>>   2 | escape%me
>>>>>>>
>>>>>>> On Tue, Sep 13, 2016 at 5:06 PM, Mikhail Krupitskiy <
>>>>>>> mikhail.krupitskiy@jetbrains.com> wrote:
>>>>>>>
>>>>>>>> Hi Cassandra guys,
>>>>>>>>
>>>>>>>> I use Cassandra 3.7 and wondering how to use ‘%’ as a simple char
>>>>>>>> in a search pattern.
>>>>>>>> Here is my test script:
>>>>>>>>
>>>>>>>> DROP keyspace if exists kmv;
>>>>>>>> CREATE keyspace if not exists kmv WITH REPLICATION = { 'class' :
>>>>>>>> 'SimpleStrategy', 'replication_factor':'1'} ;
>>>>>>>> USE kmv;
>>>>>>>> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY
>>>>>>>> KEY(id, c1));
>>>>>>>> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING '
>>>>>>>> org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>>>>>>>> 'analyzed' : 'true',
>>>>>>>> 'analyzer_class' : 'org.apache.cassandra.index.sa
>>>>>>>> si.analyzer.NonTokenizingAnalyzer',
>>>>>>>> 'case_sensitive' : 'false',
>>>>>>>> 'mode' : 'CONTAINS'
>>>>>>>> };
>>>>>>>>
>>>>>>>> INSERT into kmv (id, c1, c2) values (1, 'f22', 'qwe%asd');
>>>>>>>> INSERT into kmv (id, c1, c2) values (2, 'f22', '%asd');
>>>>>>>> INSERT into kmv (id, c1, c2) values (3, 'f22', 'asd%');
>>>>>>>> INSERT into kmv (id, c1, c2) values (4, 'f22', 'asd%1');
>>>>>>>> INSERT into kmv (id, c1, c2) values (5, 'f22', 'qweasd');
>>>>>>>>
>>>>>>>> SELECT c2 from kmv.kmv where c2 like ‘_pattern_';
>>>>>>>>
>>>>>>>> _pattern_ '%%%' finds all columns that contain %.
>>>>>>>> How to find columns that start form ‘%’ or ‘%a’?
>>>>>>>> How to find columns that end with ‘%’?
>>>>>>>> What about more complex patterns: '%qwe%a%sd%’? How to
>>>>>>>> differentiate ‘%’ char form % as a command symbol? (Also there is a related
>>>>>>>> issue CASSANDRA-12573).
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Mikhail
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>

Re: How to query '%' character using LIKE operator in Cassandra 3.7?

Posted by Mikhail Krupitskiy <mi...@jetbrains.com>.
> LIKE '%%%escape' --> EQUALS TO '%%escape' ???
In the current implementation (‘%’ could be a wildcard only at the start/end of a term) I guess it should be ’ENDS with ‘%escape’ ‘.
Moreover all terms that contains single ‘%’ somewhere in the middle should cause an exception.
BUT may be it’s better to make escaping more universal to support a future possible case where a wildcard could be placed in the middle of a term too?

Thanks,
Mikhail 
> On 24 Sep 2016, at 21:09, DuyHai Doan <do...@gmail.com> wrote:
> 
> Reminder, right now, the % character is only interpreted as wildcard IF AND ONLY IF it is the first/last character of the searched term
> 
> 
> LIKE '%escape' --> ENDS WITH 'escape' 
> 
> If we use % to escape %,
> LIKE '%%escape' -->  EQUALS TO '%escape'
> 
> LIKE '%%%escape' --> EQUALS TO '%%escape' ???
> 
> 
> 
> 
> On Fri, Sep 23, 2016 at 5:02 PM, Mikhail Krupitskiy <mikhail.krupitskiy@jetbrains.com <ma...@jetbrains.com>> wrote:
> Hi, Jim,
> 
> What pattern should be used to search “ends with  ‘%escape’ “ with your conception?
> 
> Thanks,
> Mikhail
> 
>> On 22 Sep 2016, at 18:51, Jim Ancona <jim@anconafamily.com <ma...@anconafamily.com>> wrote:
>> 
>> To answer DuyHai's question without introducing new syntax, I'd suggest:
>>> LIKE '%%%escape' means STARTS WITH '%' AND ENDS WITH 'escape' 
>> So the first two %'s are translated to a literal, non-wildcard % and the third % is a wildcard because it's not doubled.
>> 
>> Jim
>> 
>> On Thu, Sep 22, 2016 at 11:40 AM, Mikhail Krupitskiy <mikhail.krupitskiy@jetbrains.com <ma...@jetbrains.com>> wrote:
>> I guess that it should be similar to how it is done in SQL for LIKE patterns.
>> 
>> You can introduce an escape character, e.g. ‘\’.
>> Examples:
>> ‘%’ - any string
>> ‘\%’ - equal to ‘%’ character
>> ‘\%foo%’ - starts from ‘%foo’
>> ‘%%%escape’ - ends with ’escape’
>> ‘\%%’ - starts from ‘%’
>> ‘\\\%%’ - starts from ‘\%’ .
>> 
>> What do you think?
>> 
>> Thanks,
>> Mikhail
>>> On 22 Sep 2016, at 16:47, DuyHai Doan <doanduyhai@gmail.com <ma...@gmail.com>> wrote:
>>> 
>>> Hello Mikhail
>>> 
>>> It's more complicated that it seems
>>> 
>>> LIKE '%%escape' means  EQUAL TO '%escape'
>>> 
>>> LIKE '%escape' means ENDS WITH 'escape'
>>> 
>>> What's about LIKE '%%%escape' ????
>>> 
>>> How should we treat this case ? Replace %% by % at the beginning of the searched term ??
>>> 
>>> 
>>> 
>>> On Thu, Sep 22, 2016 at 3:31 PM, Mikhail Krupitskiy <mikhail.krupitskiy@jetbrains.com <ma...@jetbrains.com>> wrote:
>>> Hi!
>>> 
>>> We’ve talked about two items:
>>> 1) ‘%’ as a wildcard in the middle of LIKE pattern.
>>> 2) How to escape ‘%’ to be able to find strings with the ‘%’ char with help of LIKE.
>>> 
>>> Item #1was resolved as CASSANDRA-12573.
>>> 
>>> Regarding to item #2: you said the following:
>>>> A possible fix would be:
>>>> 
>>>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on the column data type)
>>>> 2) remove the escape character e.g. before parsing OR use some advanced regex to exclude the %% from parsing e.g
>>>> 
>>>> Step 2) is dead easy but step 1) is harder because I don't know if converting the bytebuffer into String at this stage of the CQL parser is expensive or not (in term of computation)
>>>> 
>>>> Let me try a patch 
>>> 
>>> So is there any update on this?
>>> 
>>> Thanks,
>>> Mikhail
>>> 
>>> 
>>>> On 20 Sep 2016, at 18:38, Mikhail Krupitskiy <mikhail.krupitskiy@jetbrains.com <ma...@jetbrains.com>> wrote:
>>>> 
>>>> Hi!
>>>> 
>>>> Have you had a chance to try your patch or solve the issue in an other way? 
>>>> 
>>>> Thanks,
>>>> Mikhail
>>>>> On 15 Sep 2016, at 16:02, DuyHai Doan <doanduyhai@gmail.com <ma...@gmail.com>> wrote:
>>>>> 
>>>>> Ok so I've found the source of the issue, it's pretty well hidden because it is NOT in the SASI source code directly.
>>>>> 
>>>>> Here is the method where C* determines what kind of LIKE expression you're using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES)
>>>>> 
>>>>> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778 <https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778>
>>>>> 
>>>>> As you can see, it's pretty simple, maybe too simple. Indeed, they forget to remove escape character BEFORE doing the matching so if your search is LIKE '%%esc%', the detected expression is LIKE_CONTAINS.
>>>>> 
>>>>> A possible fix would be:
>>>>> 
>>>>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on the column data type)
>>>>> 2) remove the escape character e.g. before parsing OR use some advanced regex to exclude the %% from parsing e.g
>>>>> 
>>>>> Step 2) is dead easy but step 1) is harder because I don't know if converting the bytebuffer into String at this stage of the CQL parser is expensive or not (in term of computation)
>>>>> 
>>>>> Let me try a patch  
>>>>> 
>>>>> 
>>>>> 
>>>>> On Wed, Sep 14, 2016 at 9:42 AM, DuyHai Doan <doanduyhai@gmail.com <ma...@gmail.com>> wrote:
>>>>> Ok you're right, I get your point
>>>>> 
>>>>> LIKE '%%esc%' --> startWith('%esc')
>>>>> 
>>>>> LIKE 'escape%%' -->  = 'escape%'
>>>>> 
>>>>> What I strongly suspect is that in the source code of SASI, we parse the % xxx % expression BEFORE applying escape. That will explain the observed behavior. E.g:
>>>>> 
>>>>> LIKE '%%esc%'  parsed as %xxx% where xxx = %esc
>>>>> 
>>>>> LIKE 'escape%%' parsed as xxx% where xxx =escape%
>>>>> 
>>>>> Let me check in the source code and try to reproduce the issue
>>>>> 
>>>>> 
>>>>> 
>>>>> On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy <mikhail.krupitskiy@jetbrains.com <ma...@jetbrains.com>> wrote:
>>>>> Looks like we have different understanding of what results are expected.
>>>>> I based my understanding on http://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html <http://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html>
>>>>> According to the doc ‘esc’ is a pattern for exact match and I guess that there is no semantical difference between two LIKE patterns (both of patterns should be treated as ‘exact match'): ‘%%esc’ and ‘esc’.
>>>>> 
>>>>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results containing '%esc' so %escapeme is a possible match and also escape%esc
>>>>> Why ‘containing’? I expect that it should be ’starting’..
>>>>>> 
>>>>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results starting with 'escape%' so escape%me is a valid result and also escape%esc
>>>>> Why ’starting’? I expect that it should be ‘exact matching’.
>>>>> 
>>>>> Also I expect that “ LIKE ‘%s%sc%’ ” will return ‘escape%esc’ but it returns nothing (CASSANDRA-12573).
>>>>> 
>>>>> What I’m missing?
>>>>> 
>>>>> Thanks,
>>>>> Mikhail
>>>>> 
>>>>>> On 13 Sep 2016, at 19:31, DuyHai Doan <doanduyhai@gmail.com <ma...@gmail.com>> wrote:
>>>>>> 
>>>>>> CREATE CUSTOM INDEX ON test.escape(val) USING 'org.apache.cassandra.index.sa <http://org.apache.cassandra.index.sa/>si.SASIIndex' WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sa <http://org.apache.cassandra.index.sa/>si.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false'};
>>>>>> 
>>>>>> I don't see any problem in the results you got
>>>>>> 
>>>>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results containing '%esc' so %escapeme is a possible match and also escape%esc
>>>>> Why ‘containing’? I expect that it should be ’starting’..
>>>>>> 
>>>>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results starting with 'escape%' so escape%me is a valid result and also escape%esc
>>>>> Why ’starting’? I expect that it should be ‘exact matching’.
>>>>> 
>>>>>> 
>>>>>> On Tue, Sep 13, 2016 at 5:58 PM, Mikhail Krupitskiy <mikhail.krupitskiy@jetbrains.com <ma...@jetbrains.com>> wrote:
>>>>>> Thanks for the reply.
>>>>>> Could you please provide what index definition did you use?
>>>>>> With the index from my script I get the following results:
>>>>>> 
>>>>>> cqlsh:test> select * from escape;
>>>>>> 
>>>>>>  id | val
>>>>>> ----+-----------
>>>>>>   1 | %escapeme
>>>>>>   2 | escape%me
>>>>>>   3 | escape%esc
>>>>>> 
>>>>>> Contains search
>>>>>> 
>>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>>>>> 
>>>>>>  id | val
>>>>>> ----+-----------
>>>>>>   1 | %escapeme
>>>>>>   3 | escape%esc
>>>>>> (2 rows)
>>>>>> 
>>>>>> 
>>>>>> Prefix search
>>>>>> 
>>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>>>>> 
>>>>>>  id | val
>>>>>> ----+-----------
>>>>>>   2 | escape%me
>>>>>>   3 | escape%esc
>>>>>> 
>>>>>> Thanks,
>>>>>> Mikhail 
>>>>>> 
>>>>>>> On 13 Sep 2016, at 18:16, DuyHai Doan <doanduyhai@gmail.com <ma...@gmail.com>> wrote:
>>>>>>> 
>>>>>>> Use % to escape %
>>>>>>> 
>>>>>>> cqlsh:test> select * from escape;
>>>>>>> 
>>>>>>>  id | val
>>>>>>> ----+-----------
>>>>>>>   1 | %escapeme
>>>>>>>   2 | escape%me
>>>>>>> 
>>>>>>> 
>>>>>>> Contains search
>>>>>>> 
>>>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>>>>>> 
>>>>>>>  id | val
>>>>>>> ----+-----------
>>>>>>>   1 | %escapeme
>>>>>>> 
>>>>>>> (1 rows)
>>>>>>> 
>>>>>>> 
>>>>>>> Prefix search
>>>>>>> 
>>>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>>>>>> 
>>>>>>>  id | val
>>>>>>> ----+-----------
>>>>>>>   2 | escape%me
>>>>>>> 
>>>>>>> On Tue, Sep 13, 2016 at 5:06 PM, Mikhail Krupitskiy <mikhail.krupitskiy@jetbrains.com <ma...@jetbrains.com>> wrote:
>>>>>>> Hi Cassandra guys,
>>>>>>> 
>>>>>>> I use Cassandra 3.7 and wondering how to use ‘%’ as a simple char in a search pattern.
>>>>>>> Here is my test script:
>>>>>>> 
>>>>>>> DROP keyspace if exists kmv;
>>>>>>> CREATE keyspace if not exists kmv WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor':'1'} ;
>>>>>>> USE kmv;
>>>>>>> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, c1));
>>>>>>> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 'org.apache.cassandra.index.sa <http://org.apache.cassandra.index.sa/>si.SASIIndex' WITH OPTIONS = {
>>>>>>> 'analyzed' : 'true',
>>>>>>> 'analyzer_class' : 'org.apache.cassandra.index.sa <http://org.apache.cassandra.index.sa/>si.analyzer.NonTokenizingAnalyzer',
>>>>>>> 'case_sensitive' : 'false',
>>>>>>> 'mode' : 'CONTAINS'
>>>>>>> };
>>>>>>> 
>>>>>>> INSERT into kmv (id, c1, c2) values (1, 'f22', 'qwe%asd');
>>>>>>> INSERT into kmv (id, c1, c2) values (2, 'f22', '%asd');
>>>>>>> INSERT into kmv (id, c1, c2) values (3, 'f22', 'asd%');
>>>>>>> INSERT into kmv (id, c1, c2) values (4, 'f22', 'asd%1');
>>>>>>> INSERT into kmv (id, c1, c2) values (5, 'f22', 'qweasd');
>>>>>>> 
>>>>>>> SELECT c2 from kmv.kmv where c2 like ‘_pattern_';
>>>>>>> 
>>>>>>> _pattern_ '%%%' finds all columns that contain %.
>>>>>>> How to find columns that start form ‘%’ or ‘%a’?
>>>>>>> How to find columns that end with ‘%’?
>>>>>>> What about more complex patterns: '%qwe%a%sd%’? How to differentiate ‘%’ char form % as a command symbol? (Also there is a related issue CASSANDRA-12573).
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Mikhail
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
> 
> 


Re: How to query '%' character using LIKE operator in Cassandra 3.7?

Posted by DuyHai Doan <do...@gmail.com>.
Reminder, right now, the % character is only interpreted as wildcard IF AND
ONLY IF it is the first/last character of the searched term


LIKE '%escape' --> ENDS WITH 'escape'

If we use % to escape %,
LIKE '%%escape' -->  EQUALS TO '%escape'

LIKE '%%%escape' --> EQUALS TO '%%escape' ???




On Fri, Sep 23, 2016 at 5:02 PM, Mikhail Krupitskiy <
mikhail.krupitskiy@jetbrains.com> wrote:

> Hi, Jim,
>
> What pattern should be used to search “ends with  ‘%escape’ “ with your
> conception?
>
> Thanks,
> Mikhail
>
> On 22 Sep 2016, at 18:51, Jim Ancona <ji...@anconafamily.com> wrote:
>
> To answer DuyHai's question without introducing new syntax, I'd suggest:
>
> LIKE '%%%escape' means STARTS WITH '%' AND ENDS WITH 'escape'
>
> So the first two %'s are translated to a literal, non-wildcard % and the
> third % is a wildcard because it's not doubled.
>
> Jim
>
> On Thu, Sep 22, 2016 at 11:40 AM, Mikhail Krupitskiy <
> mikhail.krupitskiy@jetbrains.com> wrote:
>
>> I guess that it should be similar to how it is done in SQL for LIKE
>> patterns.
>>
>> You can introduce an escape character, e.g. ‘\’.
>> Examples:
>> ‘%’ - any string
>> ‘\%’ - equal to ‘%’ character
>> ‘\%foo%’ - starts from ‘%foo’
>> ‘%%%escape’ - ends with ’escape’
>> ‘\%%’ - starts from ‘%’
>> ‘\\\%%’ - starts from ‘\%’ .
>>
>> What do you think?
>>
>> Thanks,
>> Mikhail
>>
>> On 22 Sep 2016, at 16:47, DuyHai Doan <do...@gmail.com> wrote:
>>
>> Hello Mikhail
>>
>> It's more complicated that it seems
>>
>> LIKE '%%escape' means  EQUAL TO '%escape'
>>
>> LIKE '%escape' means ENDS WITH 'escape'
>>
>> What's about LIKE '%%%escape' ????
>>
>> How should we treat this case ? Replace %% by % at the beginning of the
>> searched term ??
>>
>>
>>
>> On Thu, Sep 22, 2016 at 3:31 PM, Mikhail Krupitskiy <
>> mikhail.krupitskiy@jetbrains.com> wrote:
>>
>>> Hi!
>>>
>>> We’ve talked about two items:
>>> 1) ‘%’ as a wildcard in the middle of LIKE pattern.
>>> 2) How to escape ‘%’ to be able to find strings with the ‘%’ char with
>>> help of LIKE.
>>>
>>> Item #1was resolved as CASSANDRA-12573.
>>>
>>> Regarding to item #2: you said the following:
>>>
>>> A possible fix would be:
>>>
>>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on
>>> the column data type)
>>> 2) remove the escape character e.g. before parsing OR use some advanced
>>> regex to exclude the %% from parsing e.g
>>>
>>> Step 2) is dead easy but step 1) is harder because I don't know if
>>> converting the bytebuffer into String at this stage of the CQL parser is
>>> expensive or not (in term of computation)
>>>
>>> Let me try a patch
>>>
>>> So is there any update on this?
>>>
>>> Thanks,
>>> Mikhail
>>>
>>>
>>> On 20 Sep 2016, at 18:38, Mikhail Krupitskiy <
>>> mikhail.krupitskiy@jetbrains.com> wrote:
>>>
>>> Hi!
>>>
>>> Have you had a chance to try your patch or solve the issue in an other
>>> way?
>>>
>>> Thanks,
>>> Mikhail
>>>
>>> On 15 Sep 2016, at 16:02, DuyHai Doan <do...@gmail.com> wrote:
>>>
>>> Ok so I've found the source of the issue, it's pretty well hidden
>>> because it is NOT in the SASI source code directly.
>>>
>>> Here is the method where C* determines what kind of LIKE expression
>>> you're using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES)
>>>
>>> https://github.com/apache/cassandra/blob/trunk/src/java/org/
>>> apache/cassandra/cql3/restrictions/SingleColumnRestriction.j
>>> ava#L733-L778
>>>
>>> As you can see, it's pretty simple, maybe too simple. Indeed, they
>>> forget to remove escape character BEFORE doing the matching so if your
>>> search is LIKE '%%esc%', the detected expression is LIKE_CONTAINS.
>>>
>>> A possible fix would be:
>>>
>>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on
>>> the column data type)
>>> 2) remove the escape character e.g. before parsing OR use some advanced
>>> regex to exclude the %% from parsing e.g
>>>
>>> Step 2) is dead easy but step 1) is harder because I don't know if
>>> converting the bytebuffer into String at this stage of the CQL parser is
>>> expensive or not (in term of computation)
>>>
>>> Let me try a patch
>>>
>>>
>>>
>>> On Wed, Sep 14, 2016 at 9:42 AM, DuyHai Doan <do...@gmail.com>
>>> wrote:
>>>
>>>> Ok you're right, I get your point
>>>>
>>>> LIKE '%%esc%' --> startWith('%esc')
>>>>
>>>> LIKE 'escape%%' -->  = 'escape%'
>>>>
>>>> What I strongly suspect is that in the source code of SASI, we parse
>>>> the % xxx % expression BEFORE applying escape. That will explain the
>>>> observed behavior. E.g:
>>>>
>>>> LIKE '%%esc%'  parsed as %xxx% where xxx = %esc
>>>>
>>>> LIKE 'escape%%' parsed as xxx% where xxx =escape%
>>>>
>>>> Let me check in the source code and try to reproduce the issue
>>>>
>>>>
>>>>
>>>> On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy <
>>>> mikhail.krupitskiy@jetbrains.com> wrote:
>>>>
>>>>> Looks like we have different understanding of what results are
>>>>> expected.
>>>>> I based my understanding on http://docs.datastax.com/en
>>>>> /cql/3.3/cql/cql_using/useSASIIndex.html
>>>>> According to the doc ‘esc’ is a pattern for exact match and I guess
>>>>> that there is no semantical difference between two LIKE patterns (both of
>>>>> patterns should be treated as ‘exact match'): ‘%%esc’ and ‘esc’.
>>>>>
>>>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results
>>>>> *containing* '%esc' so *%esc*apeme is a possible match and also escape
>>>>> *%esc*
>>>>>
>>>>> Why ‘containing’? I expect that it should be ’starting’..
>>>>>
>>>>>
>>>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results
>>>>> *starting* with 'escape%' so *escape%*me is a valid result and also
>>>>> *escape%*esc
>>>>>
>>>>> Why ’starting’? I expect that it should be ‘exact matching’.
>>>>>
>>>>> Also I expect that “ LIKE ‘%s%sc%’ ” will return ‘escape%esc’ but it
>>>>> returns nothing (CASSANDRA-12573).
>>>>>
>>>>> What I’m missing?
>>>>>
>>>>> Thanks,
>>>>> Mikhail
>>>>>
>>>>> On 13 Sep 2016, at 19:31, DuyHai Doan <do...@gmail.com> wrote:
>>>>>
>>>>> CREATE CUSTOM INDEX ON test.escape(val) USING '
>>>>> org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode':
>>>>> 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sa
>>>>> si.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false'};
>>>>>
>>>>> I don't see any problem in the results you got
>>>>>
>>>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results
>>>>> *containing* '%esc' so *%esc*apeme is a possible match and also escape
>>>>> *%esc*
>>>>>
>>>>> Why ‘containing’? I expect that it should be ’starting’..
>>>>>
>>>>>
>>>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results
>>>>> *starting* with 'escape%' so *escape%*me is a valid result and also
>>>>> *escape%*esc
>>>>>
>>>>> Why ’starting’? I expect that it should be ‘exact matching’.
>>>>>
>>>>>
>>>>> On Tue, Sep 13, 2016 at 5:58 PM, Mikhail Krupitskiy <
>>>>> mikhail.krupitskiy@jetbrains.com> wrote:
>>>>>
>>>>>> Thanks for the reply.
>>>>>> Could you please provide what index definition did you use?
>>>>>> With the index from my script I get the following results:
>>>>>>
>>>>>> cqlsh:test> select * from escape;
>>>>>>
>>>>>>  id | val
>>>>>> ----+-----------
>>>>>>   1 | %escapeme
>>>>>>   2 | escape%me
>>>>>> *  3 | escape%esc*
>>>>>>
>>>>>> Contains search
>>>>>>
>>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>>>>>
>>>>>>  id | val
>>>>>> ----+-----------
>>>>>>   1 | %escapeme
>>>>>>   3
>>>>>> * | escape%esc*(2 rows)
>>>>>>
>>>>>>
>>>>>> Prefix search
>>>>>>
>>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>>>>>
>>>>>>  id | val
>>>>>> ----+-----------
>>>>>>   2 | escape%me
>>>>>>   3
>>>>>> * | escape%esc*
>>>>>>
>>>>>> Thanks,
>>>>>> Mikhail
>>>>>>
>>>>>> On 13 Sep 2016, at 18:16, DuyHai Doan <do...@gmail.com> wrote:
>>>>>>
>>>>>> Use % to escape %
>>>>>>
>>>>>> cqlsh:test> select * from escape;
>>>>>>
>>>>>>  id | val
>>>>>> ----+-----------
>>>>>>   1 | %escapeme
>>>>>>   2 | escape%me
>>>>>>
>>>>>>
>>>>>> Contains search
>>>>>>
>>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>>>>>
>>>>>>  id | val
>>>>>> ----+-----------
>>>>>>   1 | %escapeme
>>>>>>
>>>>>> (1 rows)
>>>>>>
>>>>>>
>>>>>> Prefix search
>>>>>>
>>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>>>>>
>>>>>>  id | val
>>>>>> ----+-----------
>>>>>>   2 | escape%me
>>>>>>
>>>>>> On Tue, Sep 13, 2016 at 5:06 PM, Mikhail Krupitskiy <
>>>>>> mikhail.krupitskiy@jetbrains.com> wrote:
>>>>>>
>>>>>>> Hi Cassandra guys,
>>>>>>>
>>>>>>> I use Cassandra 3.7 and wondering how to use ‘%’ as a simple char in
>>>>>>> a search pattern.
>>>>>>> Here is my test script:
>>>>>>>
>>>>>>> DROP keyspace if exists kmv;
>>>>>>> CREATE keyspace if not exists kmv WITH REPLICATION = { 'class' :
>>>>>>> 'SimpleStrategy', 'replication_factor':'1'} ;
>>>>>>> USE kmv;
>>>>>>> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY
>>>>>>> KEY(id, c1));
>>>>>>> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING '
>>>>>>> org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>>>>>>> 'analyzed' : 'true',
>>>>>>> 'analyzer_class' : 'org.apache.cassandra.index.sa
>>>>>>> si.analyzer.NonTokenizingAnalyzer',
>>>>>>> 'case_sensitive' : 'false',
>>>>>>> 'mode' : 'CONTAINS'
>>>>>>> };
>>>>>>>
>>>>>>> INSERT into kmv (id, c1, c2) values (1, 'f22', 'qwe%asd');
>>>>>>> INSERT into kmv (id, c1, c2) values (2, 'f22', '%asd');
>>>>>>> INSERT into kmv (id, c1, c2) values (3, 'f22', 'asd%');
>>>>>>> INSERT into kmv (id, c1, c2) values (4, 'f22', 'asd%1');
>>>>>>> INSERT into kmv (id, c1, c2) values (5, 'f22', 'qweasd');
>>>>>>>
>>>>>>> SELECT c2 from kmv.kmv where c2 like ‘_pattern_';
>>>>>>>
>>>>>>> _pattern_ '%%%' finds all columns that contain %.
>>>>>>> How to find columns that start form ‘%’ or ‘%a’?
>>>>>>> How to find columns that end with ‘%’?
>>>>>>> What about more complex patterns: '%qwe%a%sd%’? How to differentiate
>>>>>>> ‘%’ char form % as a command symbol? (Also there is a related issue
>>>>>>> CASSANDRA-12573).
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Mikhail
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
>>
>
>

Re: How to query '%' character using LIKE operator in Cassandra 3.7?

Posted by Mikhail Krupitskiy <mi...@jetbrains.com>.
Hi, Jim,

What pattern should be used to search “ends with  ‘%escape’ “ with your conception?

Thanks,
Mikhail
> On 22 Sep 2016, at 18:51, Jim Ancona <ji...@anconafamily.com> wrote:
> 
> To answer DuyHai's question without introducing new syntax, I'd suggest:
>> LIKE '%%%escape' means STARTS WITH '%' AND ENDS WITH 'escape' 
> So the first two %'s are translated to a literal, non-wildcard % and the third % is a wildcard because it's not doubled.
> 
> Jim
> 
> On Thu, Sep 22, 2016 at 11:40 AM, Mikhail Krupitskiy <mikhail.krupitskiy@jetbrains.com <ma...@jetbrains.com>> wrote:
> I guess that it should be similar to how it is done in SQL for LIKE patterns.
> 
> You can introduce an escape character, e.g. ‘\’.
> Examples:
> ‘%’ - any string
> ‘\%’ - equal to ‘%’ character
> ‘\%foo%’ - starts from ‘%foo’
> ‘%%%escape’ - ends with ’escape’
> ‘\%%’ - starts from ‘%’
> ‘\\\%%’ - starts from ‘\%’ .
> 
> What do you think?
> 
> Thanks,
> Mikhail
>> On 22 Sep 2016, at 16:47, DuyHai Doan <doanduyhai@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Hello Mikhail
>> 
>> It's more complicated that it seems
>> 
>> LIKE '%%escape' means  EQUAL TO '%escape'
>> 
>> LIKE '%escape' means ENDS WITH 'escape'
>> 
>> What's about LIKE '%%%escape' ????
>> 
>> How should we treat this case ? Replace %% by % at the beginning of the searched term ??
>> 
>> 
>> 
>> On Thu, Sep 22, 2016 at 3:31 PM, Mikhail Krupitskiy <mikhail.krupitskiy@jetbrains.com <ma...@jetbrains.com>> wrote:
>> Hi!
>> 
>> We’ve talked about two items:
>> 1) ‘%’ as a wildcard in the middle of LIKE pattern.
>> 2) How to escape ‘%’ to be able to find strings with the ‘%’ char with help of LIKE.
>> 
>> Item #1was resolved as CASSANDRA-12573.
>> 
>> Regarding to item #2: you said the following:
>>> A possible fix would be:
>>> 
>>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on the column data type)
>>> 2) remove the escape character e.g. before parsing OR use some advanced regex to exclude the %% from parsing e.g
>>> 
>>> Step 2) is dead easy but step 1) is harder because I don't know if converting the bytebuffer into String at this stage of the CQL parser is expensive or not (in term of computation)
>>> 
>>> Let me try a patch 
>> 
>> So is there any update on this?
>> 
>> Thanks,
>> Mikhail
>> 
>> 
>>> On 20 Sep 2016, at 18:38, Mikhail Krupitskiy <mikhail.krupitskiy@jetbrains.com <ma...@jetbrains.com>> wrote:
>>> 
>>> Hi!
>>> 
>>> Have you had a chance to try your patch or solve the issue in an other way? 
>>> 
>>> Thanks,
>>> Mikhail
>>>> On 15 Sep 2016, at 16:02, DuyHai Doan <doanduyhai@gmail.com <ma...@gmail.com>> wrote:
>>>> 
>>>> Ok so I've found the source of the issue, it's pretty well hidden because it is NOT in the SASI source code directly.
>>>> 
>>>> Here is the method where C* determines what kind of LIKE expression you're using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES)
>>>> 
>>>> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778 <https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778>
>>>> 
>>>> As you can see, it's pretty simple, maybe too simple. Indeed, they forget to remove escape character BEFORE doing the matching so if your search is LIKE '%%esc%', the detected expression is LIKE_CONTAINS.
>>>> 
>>>> A possible fix would be:
>>>> 
>>>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on the column data type)
>>>> 2) remove the escape character e.g. before parsing OR use some advanced regex to exclude the %% from parsing e.g
>>>> 
>>>> Step 2) is dead easy but step 1) is harder because I don't know if converting the bytebuffer into String at this stage of the CQL parser is expensive or not (in term of computation)
>>>> 
>>>> Let me try a patch  
>>>> 
>>>> 
>>>> 
>>>> On Wed, Sep 14, 2016 at 9:42 AM, DuyHai Doan <doanduyhai@gmail.com <ma...@gmail.com>> wrote:
>>>> Ok you're right, I get your point
>>>> 
>>>> LIKE '%%esc%' --> startWith('%esc')
>>>> 
>>>> LIKE 'escape%%' -->  = 'escape%'
>>>> 
>>>> What I strongly suspect is that in the source code of SASI, we parse the % xxx % expression BEFORE applying escape. That will explain the observed behavior. E.g:
>>>> 
>>>> LIKE '%%esc%'  parsed as %xxx% where xxx = %esc
>>>> 
>>>> LIKE 'escape%%' parsed as xxx% where xxx =escape%
>>>> 
>>>> Let me check in the source code and try to reproduce the issue
>>>> 
>>>> 
>>>> 
>>>> On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy <mikhail.krupitskiy@jetbrains.com <ma...@jetbrains.com>> wrote:
>>>> Looks like we have different understanding of what results are expected.
>>>> I based my understanding on http://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html <http://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html>
>>>> According to the doc ‘esc’ is a pattern for exact match and I guess that there is no semantical difference between two LIKE patterns (both of patterns should be treated as ‘exact match'): ‘%%esc’ and ‘esc’.
>>>> 
>>>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results containing '%esc' so %escapeme is a possible match and also escape%esc
>>>> Why ‘containing’? I expect that it should be ’starting’..
>>>>> 
>>>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results starting with 'escape%' so escape%me is a valid result and also escape%esc
>>>> Why ’starting’? I expect that it should be ‘exact matching’.
>>>> 
>>>> Also I expect that “ LIKE ‘%s%sc%’ ” will return ‘escape%esc’ but it returns nothing (CASSANDRA-12573).
>>>> 
>>>> What I’m missing?
>>>> 
>>>> Thanks,
>>>> Mikhail
>>>> 
>>>>> On 13 Sep 2016, at 19:31, DuyHai Doan <doanduyhai@gmail.com <ma...@gmail.com>> wrote:
>>>>> 
>>>>> CREATE CUSTOM INDEX ON test.escape(val) USING 'org.apache.cassandra.index.sa <http://org.apache.cassandra.index.sa/>si.SASIIndex' WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sa <http://org.apache.cassandra.index.sa/>si.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false'};
>>>>> 
>>>>> I don't see any problem in the results you got
>>>>> 
>>>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results containing '%esc' so %escapeme is a possible match and also escape%esc
>>>> Why ‘containing’? I expect that it should be ’starting’..
>>>>> 
>>>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results starting with 'escape%' so escape%me is a valid result and also escape%esc
>>>> Why ’starting’? I expect that it should be ‘exact matching’.
>>>> 
>>>>> 
>>>>> On Tue, Sep 13, 2016 at 5:58 PM, Mikhail Krupitskiy <mikhail.krupitskiy@jetbrains.com <ma...@jetbrains.com>> wrote:
>>>>> Thanks for the reply.
>>>>> Could you please provide what index definition did you use?
>>>>> With the index from my script I get the following results:
>>>>> 
>>>>> cqlsh:test> select * from escape;
>>>>> 
>>>>>  id | val
>>>>> ----+-----------
>>>>>   1 | %escapeme
>>>>>   2 | escape%me
>>>>>   3 | escape%esc
>>>>> 
>>>>> Contains search
>>>>> 
>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>>>> 
>>>>>  id | val
>>>>> ----+-----------
>>>>>   1 | %escapeme
>>>>>   3 | escape%esc
>>>>> (2 rows)
>>>>> 
>>>>> 
>>>>> Prefix search
>>>>> 
>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>>>> 
>>>>>  id | val
>>>>> ----+-----------
>>>>>   2 | escape%me
>>>>>   3 | escape%esc
>>>>> 
>>>>> Thanks,
>>>>> Mikhail 
>>>>> 
>>>>>> On 13 Sep 2016, at 18:16, DuyHai Doan <doanduyhai@gmail.com <ma...@gmail.com>> wrote:
>>>>>> 
>>>>>> Use % to escape %
>>>>>> 
>>>>>> cqlsh:test> select * from escape;
>>>>>> 
>>>>>>  id | val
>>>>>> ----+-----------
>>>>>>   1 | %escapeme
>>>>>>   2 | escape%me
>>>>>> 
>>>>>> 
>>>>>> Contains search
>>>>>> 
>>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>>>>> 
>>>>>>  id | val
>>>>>> ----+-----------
>>>>>>   1 | %escapeme
>>>>>> 
>>>>>> (1 rows)
>>>>>> 
>>>>>> 
>>>>>> Prefix search
>>>>>> 
>>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>>>>> 
>>>>>>  id | val
>>>>>> ----+-----------
>>>>>>   2 | escape%me
>>>>>> 
>>>>>> On Tue, Sep 13, 2016 at 5:06 PM, Mikhail Krupitskiy <mikhail.krupitskiy@jetbrains.com <ma...@jetbrains.com>> wrote:
>>>>>> Hi Cassandra guys,
>>>>>> 
>>>>>> I use Cassandra 3.7 and wondering how to use ‘%’ as a simple char in a search pattern.
>>>>>> Here is my test script:
>>>>>> 
>>>>>> DROP keyspace if exists kmv;
>>>>>> CREATE keyspace if not exists kmv WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor':'1'} ;
>>>>>> USE kmv;
>>>>>> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, c1));
>>>>>> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 'org.apache.cassandra.index.sa <http://org.apache.cassandra.index.sa/>si.SASIIndex' WITH OPTIONS = {
>>>>>> 'analyzed' : 'true',
>>>>>> 'analyzer_class' : 'org.apache.cassandra.index.sa <http://org.apache.cassandra.index.sa/>si.analyzer.NonTokenizingAnalyzer',
>>>>>> 'case_sensitive' : 'false',
>>>>>> 'mode' : 'CONTAINS'
>>>>>> };
>>>>>> 
>>>>>> INSERT into kmv (id, c1, c2) values (1, 'f22', 'qwe%asd');
>>>>>> INSERT into kmv (id, c1, c2) values (2, 'f22', '%asd');
>>>>>> INSERT into kmv (id, c1, c2) values (3, 'f22', 'asd%');
>>>>>> INSERT into kmv (id, c1, c2) values (4, 'f22', 'asd%1');
>>>>>> INSERT into kmv (id, c1, c2) values (5, 'f22', 'qweasd');
>>>>>> 
>>>>>> SELECT c2 from kmv.kmv where c2 like ‘_pattern_';
>>>>>> 
>>>>>> _pattern_ '%%%' finds all columns that contain %.
>>>>>> How to find columns that start form ‘%’ or ‘%a’?
>>>>>> How to find columns that end with ‘%’?
>>>>>> What about more complex patterns: '%qwe%a%sd%’? How to differentiate ‘%’ char form % as a command symbol? (Also there is a related issue CASSANDRA-12573).
>>>>>> 
>>>>>> 
>>>>>> Thanks,
>>>>>> Mikhail
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>> 
>> 
> 
> 


Re: How to query '%' character using LIKE operator in Cassandra 3.7?

Posted by Jim Ancona <ji...@anconafamily.com>.
To answer DuyHai's question without introducing new syntax, I'd suggest:

LIKE '%%%escape' means STARTS WITH '%' AND ENDS WITH 'escape'

So the first two %'s are translated to a literal, non-wildcard % and the
third % is a wildcard because it's not doubled.

Jim

On Thu, Sep 22, 2016 at 11:40 AM, Mikhail Krupitskiy <
mikhail.krupitskiy@jetbrains.com> wrote:

> I guess that it should be similar to how it is done in SQL for LIKE
> patterns.
>
> You can introduce an escape character, e.g. ‘\’.
> Examples:
> ‘%’ - any string
> ‘\%’ - equal to ‘%’ character
> ‘\%foo%’ - starts from ‘%foo’
> ‘%%%escape’ - ends with ’escape’
> ‘\%%’ - starts from ‘%’
> ‘\\\%%’ - starts from ‘\%’ .
>
> What do you think?
>
> Thanks,
> Mikhail
>
> On 22 Sep 2016, at 16:47, DuyHai Doan <do...@gmail.com> wrote:
>
> Hello Mikhail
>
> It's more complicated that it seems
>
> LIKE '%%escape' means  EQUAL TO '%escape'
>
> LIKE '%escape' means ENDS WITH 'escape'
>
> What's about LIKE '%%%escape' ????
>
> How should we treat this case ? Replace %% by % at the beginning of the
> searched term ??
>
>
>
> On Thu, Sep 22, 2016 at 3:31 PM, Mikhail Krupitskiy <
> mikhail.krupitskiy@jetbrains.com> wrote:
>
>> Hi!
>>
>> We’ve talked about two items:
>> 1) ‘%’ as a wildcard in the middle of LIKE pattern.
>> 2) How to escape ‘%’ to be able to find strings with the ‘%’ char with
>> help of LIKE.
>>
>> Item #1was resolved as CASSANDRA-12573.
>>
>> Regarding to item #2: you said the following:
>>
>> A possible fix would be:
>>
>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on
>> the column data type)
>> 2) remove the escape character e.g. before parsing OR use some advanced
>> regex to exclude the %% from parsing e.g
>>
>> Step 2) is dead easy but step 1) is harder because I don't know if
>> converting the bytebuffer into String at this stage of the CQL parser is
>> expensive or not (in term of computation)
>>
>> Let me try a patch
>>
>> So is there any update on this?
>>
>> Thanks,
>> Mikhail
>>
>>
>> On 20 Sep 2016, at 18:38, Mikhail Krupitskiy <
>> mikhail.krupitskiy@jetbrains.com> wrote:
>>
>> Hi!
>>
>> Have you had a chance to try your patch or solve the issue in an other
>> way?
>>
>> Thanks,
>> Mikhail
>>
>> On 15 Sep 2016, at 16:02, DuyHai Doan <do...@gmail.com> wrote:
>>
>> Ok so I've found the source of the issue, it's pretty well hidden because
>> it is NOT in the SASI source code directly.
>>
>> Here is the method where C* determines what kind of LIKE expression
>> you're using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES)
>>
>> https://github.com/apache/cassandra/blob/trunk/src/java/org/
>> apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778
>>
>> As you can see, it's pretty simple, maybe too simple. Indeed, they forget
>> to remove escape character BEFORE doing the matching so if your search is LIKE
>> '%%esc%', the detected expression is LIKE_CONTAINS.
>>
>> A possible fix would be:
>>
>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on
>> the column data type)
>> 2) remove the escape character e.g. before parsing OR use some advanced
>> regex to exclude the %% from parsing e.g
>>
>> Step 2) is dead easy but step 1) is harder because I don't know if
>> converting the bytebuffer into String at this stage of the CQL parser is
>> expensive or not (in term of computation)
>>
>> Let me try a patch
>>
>>
>>
>> On Wed, Sep 14, 2016 at 9:42 AM, DuyHai Doan <do...@gmail.com>
>> wrote:
>>
>>> Ok you're right, I get your point
>>>
>>> LIKE '%%esc%' --> startWith('%esc')
>>>
>>> LIKE 'escape%%' -->  = 'escape%'
>>>
>>> What I strongly suspect is that in the source code of SASI, we parse the
>>> % xxx % expression BEFORE applying escape. That will explain the observed
>>> behavior. E.g:
>>>
>>> LIKE '%%esc%'  parsed as %xxx% where xxx = %esc
>>>
>>> LIKE 'escape%%' parsed as xxx% where xxx =escape%
>>>
>>> Let me check in the source code and try to reproduce the issue
>>>
>>>
>>>
>>> On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy <
>>> mikhail.krupitskiy@jetbrains.com> wrote:
>>>
>>>> Looks like we have different understanding of what results are expected.
>>>> I based my understanding on http://docs.datastax.com/en
>>>> /cql/3.3/cql/cql_using/useSASIIndex.html
>>>> According to the doc ‘esc’ is a pattern for exact match and I guess
>>>> that there is no semantical difference between two LIKE patterns (both of
>>>> patterns should be treated as ‘exact match'): ‘%%esc’ and ‘esc’.
>>>>
>>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results
>>>> *containing* '%esc' so *%esc*apeme is a possible match and also escape
>>>> *%esc*
>>>>
>>>> Why ‘containing’? I expect that it should be ’starting’..
>>>>
>>>>
>>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results
>>>> *starting* with 'escape%' so *escape%*me is a valid result and also
>>>> *escape%*esc
>>>>
>>>> Why ’starting’? I expect that it should be ‘exact matching’.
>>>>
>>>> Also I expect that “ LIKE ‘%s%sc%’ ” will return ‘escape%esc’ but it
>>>> returns nothing (CASSANDRA-12573).
>>>>
>>>> What I’m missing?
>>>>
>>>> Thanks,
>>>> Mikhail
>>>>
>>>> On 13 Sep 2016, at 19:31, DuyHai Doan <do...@gmail.com> wrote:
>>>>
>>>> CREATE CUSTOM INDEX ON test.escape(val) USING '
>>>> org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode':
>>>> 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sa
>>>> si.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false'};
>>>>
>>>> I don't see any problem in the results you got
>>>>
>>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results
>>>> *containing* '%esc' so *%esc*apeme is a possible match and also escape
>>>> *%esc*
>>>>
>>>> Why ‘containing’? I expect that it should be ’starting’..
>>>>
>>>>
>>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results
>>>> *starting* with 'escape%' so *escape%*me is a valid result and also
>>>> *escape%*esc
>>>>
>>>> Why ’starting’? I expect that it should be ‘exact matching’.
>>>>
>>>>
>>>> On Tue, Sep 13, 2016 at 5:58 PM, Mikhail Krupitskiy <
>>>> mikhail.krupitskiy@jetbrains.com> wrote:
>>>>
>>>>> Thanks for the reply.
>>>>> Could you please provide what index definition did you use?
>>>>> With the index from my script I get the following results:
>>>>>
>>>>> cqlsh:test> select * from escape;
>>>>>
>>>>>  id | val
>>>>> ----+-----------
>>>>>   1 | %escapeme
>>>>>   2 | escape%me
>>>>> *  3 | escape%esc*
>>>>>
>>>>> Contains search
>>>>>
>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>>>>
>>>>>  id | val
>>>>> ----+-----------
>>>>>   1 | %escapeme
>>>>>   3
>>>>> * | escape%esc*(2 rows)
>>>>>
>>>>>
>>>>> Prefix search
>>>>>
>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>>>>
>>>>>  id | val
>>>>> ----+-----------
>>>>>   2 | escape%me
>>>>>   3
>>>>> * | escape%esc*
>>>>>
>>>>> Thanks,
>>>>> Mikhail
>>>>>
>>>>> On 13 Sep 2016, at 18:16, DuyHai Doan <do...@gmail.com> wrote:
>>>>>
>>>>> Use % to escape %
>>>>>
>>>>> cqlsh:test> select * from escape;
>>>>>
>>>>>  id | val
>>>>> ----+-----------
>>>>>   1 | %escapeme
>>>>>   2 | escape%me
>>>>>
>>>>>
>>>>> Contains search
>>>>>
>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>>>>
>>>>>  id | val
>>>>> ----+-----------
>>>>>   1 | %escapeme
>>>>>
>>>>> (1 rows)
>>>>>
>>>>>
>>>>> Prefix search
>>>>>
>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>>>>
>>>>>  id | val
>>>>> ----+-----------
>>>>>   2 | escape%me
>>>>>
>>>>> On Tue, Sep 13, 2016 at 5:06 PM, Mikhail Krupitskiy <
>>>>> mikhail.krupitskiy@jetbrains.com> wrote:
>>>>>
>>>>>> Hi Cassandra guys,
>>>>>>
>>>>>> I use Cassandra 3.7 and wondering how to use ‘%’ as a simple char in
>>>>>> a search pattern.
>>>>>> Here is my test script:
>>>>>>
>>>>>> DROP keyspace if exists kmv;
>>>>>> CREATE keyspace if not exists kmv WITH REPLICATION = { 'class' :
>>>>>> 'SimpleStrategy', 'replication_factor':'1'} ;
>>>>>> USE kmv;
>>>>>> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY
>>>>>> KEY(id, c1));
>>>>>> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING '
>>>>>> org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>>>>>> 'analyzed' : 'true',
>>>>>> 'analyzer_class' : 'org.apache.cassandra.index.sa
>>>>>> si.analyzer.NonTokenizingAnalyzer',
>>>>>> 'case_sensitive' : 'false',
>>>>>> 'mode' : 'CONTAINS'
>>>>>> };
>>>>>>
>>>>>> INSERT into kmv (id, c1, c2) values (1, 'f22', 'qwe%asd');
>>>>>> INSERT into kmv (id, c1, c2) values (2, 'f22', '%asd');
>>>>>> INSERT into kmv (id, c1, c2) values (3, 'f22', 'asd%');
>>>>>> INSERT into kmv (id, c1, c2) values (4, 'f22', 'asd%1');
>>>>>> INSERT into kmv (id, c1, c2) values (5, 'f22', 'qweasd');
>>>>>>
>>>>>> SELECT c2 from kmv.kmv where c2 like ‘_pattern_';
>>>>>>
>>>>>> _pattern_ '%%%' finds all columns that contain %.
>>>>>> How to find columns that start form ‘%’ or ‘%a’?
>>>>>> How to find columns that end with ‘%’?
>>>>>> What about more complex patterns: '%qwe%a%sd%’? How to differentiate
>>>>>> ‘%’ char form % as a command symbol? (Also there is a related issue
>>>>>> CASSANDRA-12573).
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Mikhail
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>>
>
>

Re: How to query '%' character using LIKE operator in Cassandra 3.7?

Posted by Mikhail Krupitskiy <mi...@jetbrains.com>.
I guess that it should be similar to how it is done in SQL for LIKE patterns.

You can introduce an escape character, e.g. ‘\’.
Examples:
‘%’ - any string
‘\%’ - equal to ‘%’ character
‘\%foo%’ - starts from ‘%foo’
‘%%%escape’ - ends with ’escape’
‘\%%’ - starts from ‘%’
‘\\\%%’ - starts from ‘\%’ .

What do you think?

Thanks,
Mikhail
> On 22 Sep 2016, at 16:47, DuyHai Doan <do...@gmail.com> wrote:
> 
> Hello Mikhail
> 
> It's more complicated that it seems
> 
> LIKE '%%escape' means  EQUAL TO '%escape'
> 
> LIKE '%escape' means ENDS WITH 'escape'
> 
> What's about LIKE '%%%escape' ????
> 
> How should we treat this case ? Replace %% by % at the beginning of the searched term ??
> 
> 
> 
> On Thu, Sep 22, 2016 at 3:31 PM, Mikhail Krupitskiy <mikhail.krupitskiy@jetbrains.com <ma...@jetbrains.com>> wrote:
> Hi!
> 
> We’ve talked about two items:
> 1) ‘%’ as a wildcard in the middle of LIKE pattern.
> 2) How to escape ‘%’ to be able to find strings with the ‘%’ char with help of LIKE.
> 
> Item #1was resolved as CASSANDRA-12573.
> 
> Regarding to item #2: you said the following:
>> A possible fix would be:
>> 
>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on the column data type)
>> 2) remove the escape character e.g. before parsing OR use some advanced regex to exclude the %% from parsing e.g
>> 
>> Step 2) is dead easy but step 1) is harder because I don't know if converting the bytebuffer into String at this stage of the CQL parser is expensive or not (in term of computation)
>> 
>> Let me try a patch 
> 
> So is there any update on this?
> 
> Thanks,
> Mikhail
> 
> 
>> On 20 Sep 2016, at 18:38, Mikhail Krupitskiy <mikhail.krupitskiy@jetbrains.com <ma...@jetbrains.com>> wrote:
>> 
>> Hi!
>> 
>> Have you had a chance to try your patch or solve the issue in an other way? 
>> 
>> Thanks,
>> Mikhail
>>> On 15 Sep 2016, at 16:02, DuyHai Doan <doanduyhai@gmail.com <ma...@gmail.com>> wrote:
>>> 
>>> Ok so I've found the source of the issue, it's pretty well hidden because it is NOT in the SASI source code directly.
>>> 
>>> Here is the method where C* determines what kind of LIKE expression you're using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES)
>>> 
>>> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778 <https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778>
>>> 
>>> As you can see, it's pretty simple, maybe too simple. Indeed, they forget to remove escape character BEFORE doing the matching so if your search is LIKE '%%esc%', the detected expression is LIKE_CONTAINS.
>>> 
>>> A possible fix would be:
>>> 
>>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on the column data type)
>>> 2) remove the escape character e.g. before parsing OR use some advanced regex to exclude the %% from parsing e.g
>>> 
>>> Step 2) is dead easy but step 1) is harder because I don't know if converting the bytebuffer into String at this stage of the CQL parser is expensive or not (in term of computation)
>>> 
>>> Let me try a patch  
>>> 
>>> 
>>> 
>>> On Wed, Sep 14, 2016 at 9:42 AM, DuyHai Doan <doanduyhai@gmail.com <ma...@gmail.com>> wrote:
>>> Ok you're right, I get your point
>>> 
>>> LIKE '%%esc%' --> startWith('%esc')
>>> 
>>> LIKE 'escape%%' -->  = 'escape%'
>>> 
>>> What I strongly suspect is that in the source code of SASI, we parse the % xxx % expression BEFORE applying escape. That will explain the observed behavior. E.g:
>>> 
>>> LIKE '%%esc%'  parsed as %xxx% where xxx = %esc
>>> 
>>> LIKE 'escape%%' parsed as xxx% where xxx =escape%
>>> 
>>> Let me check in the source code and try to reproduce the issue
>>> 
>>> 
>>> 
>>> On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy <mikhail.krupitskiy@jetbrains.com <ma...@jetbrains.com>> wrote:
>>> Looks like we have different understanding of what results are expected.
>>> I based my understanding on http://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html <http://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html>
>>> According to the doc ‘esc’ is a pattern for exact match and I guess that there is no semantical difference between two LIKE patterns (both of patterns should be treated as ‘exact match'): ‘%%esc’ and ‘esc’.
>>> 
>>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results containing '%esc' so %escapeme is a possible match and also escape%esc
>>> Why ‘containing’? I expect that it should be ’starting’..
>>>> 
>>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results starting with 'escape%' so escape%me is a valid result and also escape%esc
>>> Why ’starting’? I expect that it should be ‘exact matching’.
>>> 
>>> Also I expect that “ LIKE ‘%s%sc%’ ” will return ‘escape%esc’ but it returns nothing (CASSANDRA-12573).
>>> 
>>> What I’m missing?
>>> 
>>> Thanks,
>>> Mikhail
>>> 
>>>> On 13 Sep 2016, at 19:31, DuyHai Doan <doanduyhai@gmail.com <ma...@gmail.com>> wrote:
>>>> 
>>>> CREATE CUSTOM INDEX ON test.escape(val) USING 'org.apache.cassandra.index.sa <http://org.apache.cassandra.index.sa/>si.SASIIndex' WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sa <http://org.apache.cassandra.index.sa/>si.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false'};
>>>> 
>>>> I don't see any problem in the results you got
>>>> 
>>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results containing '%esc' so %escapeme is a possible match and also escape%esc
>>> Why ‘containing’? I expect that it should be ’starting’..
>>>> 
>>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results starting with 'escape%' so escape%me is a valid result and also escape%esc
>>> Why ’starting’? I expect that it should be ‘exact matching’.
>>> 
>>>> 
>>>> On Tue, Sep 13, 2016 at 5:58 PM, Mikhail Krupitskiy <mikhail.krupitskiy@jetbrains.com <ma...@jetbrains.com>> wrote:
>>>> Thanks for the reply.
>>>> Could you please provide what index definition did you use?
>>>> With the index from my script I get the following results:
>>>> 
>>>> cqlsh:test> select * from escape;
>>>> 
>>>>  id | val
>>>> ----+-----------
>>>>   1 | %escapeme
>>>>   2 | escape%me
>>>>   3 | escape%esc
>>>> 
>>>> Contains search
>>>> 
>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>>> 
>>>>  id | val
>>>> ----+-----------
>>>>   1 | %escapeme
>>>>   3 | escape%esc
>>>> (2 rows)
>>>> 
>>>> 
>>>> Prefix search
>>>> 
>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>>> 
>>>>  id | val
>>>> ----+-----------
>>>>   2 | escape%me
>>>>   3 | escape%esc
>>>> 
>>>> Thanks,
>>>> Mikhail 
>>>> 
>>>>> On 13 Sep 2016, at 18:16, DuyHai Doan <doanduyhai@gmail.com <ma...@gmail.com>> wrote:
>>>>> 
>>>>> Use % to escape %
>>>>> 
>>>>> cqlsh:test> select * from escape;
>>>>> 
>>>>>  id | val
>>>>> ----+-----------
>>>>>   1 | %escapeme
>>>>>   2 | escape%me
>>>>> 
>>>>> 
>>>>> Contains search
>>>>> 
>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>>>> 
>>>>>  id | val
>>>>> ----+-----------
>>>>>   1 | %escapeme
>>>>> 
>>>>> (1 rows)
>>>>> 
>>>>> 
>>>>> Prefix search
>>>>> 
>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>>>> 
>>>>>  id | val
>>>>> ----+-----------
>>>>>   2 | escape%me
>>>>> 
>>>>> On Tue, Sep 13, 2016 at 5:06 PM, Mikhail Krupitskiy <mikhail.krupitskiy@jetbrains.com <ma...@jetbrains.com>> wrote:
>>>>> Hi Cassandra guys,
>>>>> 
>>>>> I use Cassandra 3.7 and wondering how to use ‘%’ as a simple char in a search pattern.
>>>>> Here is my test script:
>>>>> 
>>>>> DROP keyspace if exists kmv;
>>>>> CREATE keyspace if not exists kmv WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor':'1'} ;
>>>>> USE kmv;
>>>>> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, c1));
>>>>> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 'org.apache.cassandra.index.sa <http://org.apache.cassandra.index.sa/>si.SASIIndex' WITH OPTIONS = {
>>>>> 'analyzed' : 'true',
>>>>> 'analyzer_class' : 'org.apache.cassandra.index.sa <http://org.apache.cassandra.index.sa/>si.analyzer.NonTokenizingAnalyzer',
>>>>> 'case_sensitive' : 'false',
>>>>> 'mode' : 'CONTAINS'
>>>>> };
>>>>> 
>>>>> INSERT into kmv (id, c1, c2) values (1, 'f22', 'qwe%asd');
>>>>> INSERT into kmv (id, c1, c2) values (2, 'f22', '%asd');
>>>>> INSERT into kmv (id, c1, c2) values (3, 'f22', 'asd%');
>>>>> INSERT into kmv (id, c1, c2) values (4, 'f22', 'asd%1');
>>>>> INSERT into kmv (id, c1, c2) values (5, 'f22', 'qweasd');
>>>>> 
>>>>> SELECT c2 from kmv.kmv where c2 like ‘_pattern_';
>>>>> 
>>>>> _pattern_ '%%%' finds all columns that contain %.
>>>>> How to find columns that start form ‘%’ or ‘%a’?
>>>>> How to find columns that end with ‘%’?
>>>>> What about more complex patterns: '%qwe%a%sd%’? How to differentiate ‘%’ char form % as a command symbol? (Also there is a related issue CASSANDRA-12573).
>>>>> 
>>>>> 
>>>>> Thanks,
>>>>> Mikhail
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> 
>> 
> 
> 


Re: How to query '%' character using LIKE operator in Cassandra 3.7?

Posted by DuyHai Doan <do...@gmail.com>.
Hello Mikhail

It's more complicated that it seems

LIKE '%%escape' means  EQUAL TO '%escape'

LIKE '%escape' means ENDS WITH 'escape'

What's about LIKE '%%%escape' ????

How should we treat this case ? Replace %% by % at the beginning of the
searched term ??



On Thu, Sep 22, 2016 at 3:31 PM, Mikhail Krupitskiy <
mikhail.krupitskiy@jetbrains.com> wrote:

> Hi!
>
> We’ve talked about two items:
> 1) ‘%’ as a wildcard in the middle of LIKE pattern.
> 2) How to escape ‘%’ to be able to find strings with the ‘%’ char with
> help of LIKE.
>
> Item #1was resolved as CASSANDRA-12573.
>
> Regarding to item #2: you said the following:
>
> A possible fix would be:
>
> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on
> the column data type)
> 2) remove the escape character e.g. before parsing OR use some advanced
> regex to exclude the %% from parsing e.g
>
> Step 2) is dead easy but step 1) is harder because I don't know if
> converting the bytebuffer into String at this stage of the CQL parser is
> expensive or not (in term of computation)
>
> Let me try a patch
>
> So is there any update on this?
>
> Thanks,
> Mikhail
>
>
> On 20 Sep 2016, at 18:38, Mikhail Krupitskiy <
> mikhail.krupitskiy@jetbrains.com> wrote:
>
> Hi!
>
> Have you had a chance to try your patch or solve the issue in an other
> way?
>
> Thanks,
> Mikhail
>
> On 15 Sep 2016, at 16:02, DuyHai Doan <do...@gmail.com> wrote:
>
> Ok so I've found the source of the issue, it's pretty well hidden because
> it is NOT in the SASI source code directly.
>
> Here is the method where C* determines what kind of LIKE expression you're
> using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES)
>
> https://github.com/apache/cassandra/blob/trunk/src/java/
> org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#
> L733-L778
>
> As you can see, it's pretty simple, maybe too simple. Indeed, they forget
> to remove escape character BEFORE doing the matching so if your search is LIKE
> '%%esc%', the detected expression is LIKE_CONTAINS.
>
> A possible fix would be:
>
> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on
> the column data type)
> 2) remove the escape character e.g. before parsing OR use some advanced
> regex to exclude the %% from parsing e.g
>
> Step 2) is dead easy but step 1) is harder because I don't know if
> converting the bytebuffer into String at this stage of the CQL parser is
> expensive or not (in term of computation)
>
> Let me try a patch
>
>
>
> On Wed, Sep 14, 2016 at 9:42 AM, DuyHai Doan <do...@gmail.com> wrote:
>
>> Ok you're right, I get your point
>>
>> LIKE '%%esc%' --> startWith('%esc')
>>
>> LIKE 'escape%%' -->  = 'escape%'
>>
>> What I strongly suspect is that in the source code of SASI, we parse the
>> % xxx % expression BEFORE applying escape. That will explain the observed
>> behavior. E.g:
>>
>> LIKE '%%esc%'  parsed as %xxx% where xxx = %esc
>>
>> LIKE 'escape%%' parsed as xxx% where xxx =escape%
>>
>> Let me check in the source code and try to reproduce the issue
>>
>>
>>
>> On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy <
>> mikhail.krupitskiy@jetbrains.com> wrote:
>>
>>> Looks like we have different understanding of what results are expected.
>>> I based my understanding on http://docs.datastax.com/en
>>> /cql/3.3/cql/cql_using/useSASIIndex.html
>>> According to the doc ‘esc’ is a pattern for exact match and I guess that
>>> there is no semantical difference between two LIKE patterns (both of
>>> patterns should be treated as ‘exact match'): ‘%%esc’ and ‘esc’.
>>>
>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results
>>> *containing* '%esc' so *%esc*apeme is a possible match and also escape
>>> *%esc*
>>>
>>> Why ‘containing’? I expect that it should be ’starting’..
>>>
>>>
>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results
>>> *starting* with 'escape%' so *escape%*me is a valid result and also
>>> *escape%*esc
>>>
>>> Why ’starting’? I expect that it should be ‘exact matching’.
>>>
>>> Also I expect that “ LIKE ‘%s%sc%’ ” will return ‘escape%esc’ but it
>>> returns nothing (CASSANDRA-12573).
>>>
>>> What I’m missing?
>>>
>>> Thanks,
>>> Mikhail
>>>
>>> On 13 Sep 2016, at 19:31, DuyHai Doan <do...@gmail.com> wrote:
>>>
>>> CREATE CUSTOM INDEX ON test.escape(val) USING '
>>> org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode':
>>> 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sa
>>> si.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false'};
>>>
>>> I don't see any problem in the results you got
>>>
>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results
>>> *containing* '%esc' so *%esc*apeme is a possible match and also escape
>>> *%esc*
>>>
>>> Why ‘containing’? I expect that it should be ’starting’..
>>>
>>>
>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results
>>> *starting* with 'escape%' so *escape%*me is a valid result and also
>>> *escape%*esc
>>>
>>> Why ’starting’? I expect that it should be ‘exact matching’.
>>>
>>>
>>> On Tue, Sep 13, 2016 at 5:58 PM, Mikhail Krupitskiy <
>>> mikhail.krupitskiy@jetbrains.com> wrote:
>>>
>>>> Thanks for the reply.
>>>> Could you please provide what index definition did you use?
>>>> With the index from my script I get the following results:
>>>>
>>>> cqlsh:test> select * from escape;
>>>>
>>>>  id | val
>>>> ----+-----------
>>>>   1 | %escapeme
>>>>   2 | escape%me
>>>> *  3 | escape%esc*
>>>>
>>>> Contains search
>>>>
>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>>>
>>>>  id | val
>>>> ----+-----------
>>>>   1 | %escapeme
>>>>   3
>>>> * | escape%esc*(2 rows)
>>>>
>>>>
>>>> Prefix search
>>>>
>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>>>
>>>>  id | val
>>>> ----+-----------
>>>>   2 | escape%me
>>>>   3
>>>> * | escape%esc*
>>>>
>>>> Thanks,
>>>> Mikhail
>>>>
>>>> On 13 Sep 2016, at 18:16, DuyHai Doan <do...@gmail.com> wrote:
>>>>
>>>> Use % to escape %
>>>>
>>>> cqlsh:test> select * from escape;
>>>>
>>>>  id | val
>>>> ----+-----------
>>>>   1 | %escapeme
>>>>   2 | escape%me
>>>>
>>>>
>>>> Contains search
>>>>
>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>>>
>>>>  id | val
>>>> ----+-----------
>>>>   1 | %escapeme
>>>>
>>>> (1 rows)
>>>>
>>>>
>>>> Prefix search
>>>>
>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>>>
>>>>  id | val
>>>> ----+-----------
>>>>   2 | escape%me
>>>>
>>>> On Tue, Sep 13, 2016 at 5:06 PM, Mikhail Krupitskiy <
>>>> mikhail.krupitskiy@jetbrains.com> wrote:
>>>>
>>>>> Hi Cassandra guys,
>>>>>
>>>>> I use Cassandra 3.7 and wondering how to use ‘%’ as a simple char in a
>>>>> search pattern.
>>>>> Here is my test script:
>>>>>
>>>>> DROP keyspace if exists kmv;
>>>>> CREATE keyspace if not exists kmv WITH REPLICATION = { 'class' :
>>>>> 'SimpleStrategy', 'replication_factor':'1'} ;
>>>>> USE kmv;
>>>>> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY
>>>>> KEY(id, c1));
>>>>> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING '
>>>>> org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>>>>> 'analyzed' : 'true',
>>>>> 'analyzer_class' : 'org.apache.cassandra.index.sa
>>>>> si.analyzer.NonTokenizingAnalyzer',
>>>>> 'case_sensitive' : 'false',
>>>>> 'mode' : 'CONTAINS'
>>>>> };
>>>>>
>>>>> INSERT into kmv (id, c1, c2) values (1, 'f22', 'qwe%asd');
>>>>> INSERT into kmv (id, c1, c2) values (2, 'f22', '%asd');
>>>>> INSERT into kmv (id, c1, c2) values (3, 'f22', 'asd%');
>>>>> INSERT into kmv (id, c1, c2) values (4, 'f22', 'asd%1');
>>>>> INSERT into kmv (id, c1, c2) values (5, 'f22', 'qweasd');
>>>>>
>>>>> SELECT c2 from kmv.kmv where c2 like ‘_pattern_';
>>>>>
>>>>> _pattern_ '%%%' finds all columns that contain %.
>>>>> How to find columns that start form ‘%’ or ‘%a’?
>>>>> How to find columns that end with ‘%’?
>>>>> What about more complex patterns: '%qwe%a%sd%’? How to differentiate
>>>>> ‘%’ char form % as a command symbol? (Also there is a related issue
>>>>> CASSANDRA-12573).
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Mikhail
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>
>
>

Re: How to query '%' character using LIKE operator in Cassandra 3.7?

Posted by Mikhail Krupitskiy <mi...@jetbrains.com>.
Hi!

We’ve talked about two items:
1) ‘%’ as a wildcard in the middle of LIKE pattern.
2) How to escape ‘%’ to be able to find strings with the ‘%’ char with help of LIKE.

Item #1was resolved as CASSANDRA-12573.

Regarding to item #2: you said the following:
> A possible fix would be:
> 
> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on the column data type)
> 2) remove the escape character e.g. before parsing OR use some advanced regex to exclude the %% from parsing e.g
> 
> Step 2) is dead easy but step 1) is harder because I don't know if converting the bytebuffer into String at this stage of the CQL parser is expensive or not (in term of computation)
> 
> Let me try a patch 

So is there any update on this?

Thanks,
Mikhail


> On 20 Sep 2016, at 18:38, Mikhail Krupitskiy <mi...@jetbrains.com> wrote:
> 
> Hi!
> 
> Have you had a chance to try your patch or solve the issue in an other way? 
> 
> Thanks,
> Mikhail
>> On 15 Sep 2016, at 16:02, DuyHai Doan <doanduyhai@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Ok so I've found the source of the issue, it's pretty well hidden because it is NOT in the SASI source code directly.
>> 
>> Here is the method where C* determines what kind of LIKE expression you're using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES)
>> 
>> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778 <https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778>
>> 
>> As you can see, it's pretty simple, maybe too simple. Indeed, they forget to remove escape character BEFORE doing the matching so if your search is LIKE '%%esc%', the detected expression is LIKE_CONTAINS.
>> 
>> A possible fix would be:
>> 
>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on the column data type)
>> 2) remove the escape character e.g. before parsing OR use some advanced regex to exclude the %% from parsing e.g
>> 
>> Step 2) is dead easy but step 1) is harder because I don't know if converting the bytebuffer into String at this stage of the CQL parser is expensive or not (in term of computation)
>> 
>> Let me try a patch  
>> 
>> 
>> 
>> On Wed, Sep 14, 2016 at 9:42 AM, DuyHai Doan <doanduyhai@gmail.com <ma...@gmail.com>> wrote:
>> Ok you're right, I get your point
>> 
>> LIKE '%%esc%' --> startWith('%esc')
>> 
>> LIKE 'escape%%' -->  = 'escape%'
>> 
>> What I strongly suspect is that in the source code of SASI, we parse the % xxx % expression BEFORE applying escape. That will explain the observed behavior. E.g:
>> 
>> LIKE '%%esc%'  parsed as %xxx% where xxx = %esc
>> 
>> LIKE 'escape%%' parsed as xxx% where xxx =escape%
>> 
>> Let me check in the source code and try to reproduce the issue
>> 
>> 
>> 
>> On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy <mikhail.krupitskiy@jetbrains.com <ma...@jetbrains.com>> wrote:
>> Looks like we have different understanding of what results are expected.
>> I based my understanding on http://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html <http://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html>
>> According to the doc ‘esc’ is a pattern for exact match and I guess that there is no semantical difference between two LIKE patterns (both of patterns should be treated as ‘exact match'): ‘%%esc’ and ‘esc’.
>> 
>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results containing '%esc' so %escapeme is a possible match and also escape%esc
>> Why ‘containing’? I expect that it should be ’starting’..
>>> 
>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results starting with 'escape%' so escape%me is a valid result and also escape%esc
>> Why ’starting’? I expect that it should be ‘exact matching’.
>> 
>> Also I expect that “ LIKE ‘%s%sc%’ ” will return ‘escape%esc’ but it returns nothing (CASSANDRA-12573).
>> 
>> What I’m missing?
>> 
>> Thanks,
>> Mikhail
>> 
>>> On 13 Sep 2016, at 19:31, DuyHai Doan <doanduyhai@gmail.com <ma...@gmail.com>> wrote:
>>> 
>>> CREATE CUSTOM INDEX ON test.escape(val) USING 'org.apache.cassandra.index.sa <http://org.apache.cassandra.index.sa/>si.SASIIndex' WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sa <http://org.apache.cassandra.index.sa/>si.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false'};
>>> 
>>> I don't see any problem in the results you got
>>> 
>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results containing '%esc' so %escapeme is a possible match and also escape%esc
>> Why ‘containing’? I expect that it should be ’starting’..
>>> 
>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results starting with 'escape%' so escape%me is a valid result and also escape%esc
>> Why ’starting’? I expect that it should be ‘exact matching’.
>> 
>>> 
>>> On Tue, Sep 13, 2016 at 5:58 PM, Mikhail Krupitskiy <mikhail.krupitskiy@jetbrains.com <ma...@jetbrains.com>> wrote:
>>> Thanks for the reply.
>>> Could you please provide what index definition did you use?
>>> With the index from my script I get the following results:
>>> 
>>> cqlsh:test> select * from escape;
>>> 
>>>  id | val
>>> ----+-----------
>>>   1 | %escapeme
>>>   2 | escape%me
>>>   3 | escape%esc
>>> 
>>> Contains search
>>> 
>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>> 
>>>  id | val
>>> ----+-----------
>>>   1 | %escapeme
>>>   3 | escape%esc
>>> (2 rows)
>>> 
>>> 
>>> Prefix search
>>> 
>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>> 
>>>  id | val
>>> ----+-----------
>>>   2 | escape%me
>>>   3 | escape%esc
>>> 
>>> Thanks,
>>> Mikhail 
>>> 
>>>> On 13 Sep 2016, at 18:16, DuyHai Doan <doanduyhai@gmail.com <ma...@gmail.com>> wrote:
>>>> 
>>>> Use % to escape %
>>>> 
>>>> cqlsh:test> select * from escape;
>>>> 
>>>>  id | val
>>>> ----+-----------
>>>>   1 | %escapeme
>>>>   2 | escape%me
>>>> 
>>>> 
>>>> Contains search
>>>> 
>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>>> 
>>>>  id | val
>>>> ----+-----------
>>>>   1 | %escapeme
>>>> 
>>>> (1 rows)
>>>> 
>>>> 
>>>> Prefix search
>>>> 
>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>>> 
>>>>  id | val
>>>> ----+-----------
>>>>   2 | escape%me
>>>> 
>>>> On Tue, Sep 13, 2016 at 5:06 PM, Mikhail Krupitskiy <mikhail.krupitskiy@jetbrains.com <ma...@jetbrains.com>> wrote:
>>>> Hi Cassandra guys,
>>>> 
>>>> I use Cassandra 3.7 and wondering how to use ‘%’ as a simple char in a search pattern.
>>>> Here is my test script:
>>>> 
>>>> DROP keyspace if exists kmv;
>>>> CREATE keyspace if not exists kmv WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor':'1'} ;
>>>> USE kmv;
>>>> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, c1));
>>>> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 'org.apache.cassandra.index.sa <http://org.apache.cassandra.index.sa/>si.SASIIndex' WITH OPTIONS = {
>>>> 'analyzed' : 'true',
>>>> 'analyzer_class' : 'org.apache.cassandra.index.sa <http://org.apache.cassandra.index.sa/>si.analyzer.NonTokenizingAnalyzer',
>>>> 'case_sensitive' : 'false',
>>>> 'mode' : 'CONTAINS'
>>>> };
>>>> 
>>>> INSERT into kmv (id, c1, c2) values (1, 'f22', 'qwe%asd');
>>>> INSERT into kmv (id, c1, c2) values (2, 'f22', '%asd');
>>>> INSERT into kmv (id, c1, c2) values (3, 'f22', 'asd%');
>>>> INSERT into kmv (id, c1, c2) values (4, 'f22', 'asd%1');
>>>> INSERT into kmv (id, c1, c2) values (5, 'f22', 'qweasd');
>>>> 
>>>> SELECT c2 from kmv.kmv where c2 like ‘_pattern_';
>>>> 
>>>> _pattern_ '%%%' finds all columns that contain %.
>>>> How to find columns that start form ‘%’ or ‘%a’?
>>>> How to find columns that end with ‘%’?
>>>> What about more complex patterns: '%qwe%a%sd%’? How to differentiate ‘%’ char form % as a command symbol? (Also there is a related issue CASSANDRA-12573).
>>>> 
>>>> 
>>>> Thanks,
>>>> Mikhail
>>>> 
>>> 
>>> 
>> 
>> 
>> 
> 


Re: How to query '%' character using LIKE operator in Cassandra 3.7?

Posted by Mikhail Krupitskiy <mi...@jetbrains.com>.
Hi!

Have you had a chance to try your patch or solve the issue in an other way? 

Thanks,
Mikhail
> On 15 Sep 2016, at 16:02, DuyHai Doan <do...@gmail.com> wrote:
> 
> Ok so I've found the source of the issue, it's pretty well hidden because it is NOT in the SASI source code directly.
> 
> Here is the method where C* determines what kind of LIKE expression you're using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES)
> 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778 <https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778>
> 
> As you can see, it's pretty simple, maybe too simple. Indeed, they forget to remove escape character BEFORE doing the matching so if your search is LIKE '%%esc%', the detected expression is LIKE_CONTAINS.
> 
> A possible fix would be:
> 
> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on the column data type)
> 2) remove the escape character e.g. before parsing OR use some advanced regex to exclude the %% from parsing e.g
> 
> Step 2) is dead easy but step 1) is harder because I don't know if converting the bytebuffer into String at this stage of the CQL parser is expensive or not (in term of computation)
> 
> Let me try a patch  
> 
> 
> 
> On Wed, Sep 14, 2016 at 9:42 AM, DuyHai Doan <doanduyhai@gmail.com <ma...@gmail.com>> wrote:
> Ok you're right, I get your point
> 
> LIKE '%%esc%' --> startWith('%esc')
> 
> LIKE 'escape%%' -->  = 'escape%'
> 
> What I strongly suspect is that in the source code of SASI, we parse the % xxx % expression BEFORE applying escape. That will explain the observed behavior. E.g:
> 
> LIKE '%%esc%'  parsed as %xxx% where xxx = %esc
> 
> LIKE 'escape%%' parsed as xxx% where xxx =escape%
> 
> Let me check in the source code and try to reproduce the issue
> 
> 
> 
> On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy <mikhail.krupitskiy@jetbrains.com <ma...@jetbrains.com>> wrote:
> Looks like we have different understanding of what results are expected.
> I based my understanding on http://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html <http://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html>
> According to the doc ‘esc’ is a pattern for exact match and I guess that there is no semantical difference between two LIKE patterns (both of patterns should be treated as ‘exact match'): ‘%%esc’ and ‘esc’.
> 
>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results containing '%esc' so %escapeme is a possible match and also escape%esc
> Why ‘containing’? I expect that it should be ’starting’..
>> 
>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results starting with 'escape%' so escape%me is a valid result and also escape%esc
> Why ’starting’? I expect that it should be ‘exact matching’.
> 
> Also I expect that “ LIKE ‘%s%sc%’ ” will return ‘escape%esc’ but it returns nothing (CASSANDRA-12573).
> 
> What I’m missing?
> 
> Thanks,
> Mikhail
> 
>> On 13 Sep 2016, at 19:31, DuyHai Doan <doanduyhai@gmail.com <ma...@gmail.com>> wrote:
>> 
>> CREATE CUSTOM INDEX ON test.escape(val) USING 'org.apache.cassandra.index.sa <http://org.apache.cassandra.index.sa/>si.SASIIndex' WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sa <http://org.apache.cassandra.index.sa/>si.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false'};
>> 
>> I don't see any problem in the results you got
>> 
>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results containing '%esc' so %escapeme is a possible match and also escape%esc
> Why ‘containing’? I expect that it should be ’starting’..
>> 
>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results starting with 'escape%' so escape%me is a valid result and also escape%esc
> Why ’starting’? I expect that it should be ‘exact matching’.
> 
>> 
>> On Tue, Sep 13, 2016 at 5:58 PM, Mikhail Krupitskiy <mikhail.krupitskiy@jetbrains.com <ma...@jetbrains.com>> wrote:
>> Thanks for the reply.
>> Could you please provide what index definition did you use?
>> With the index from my script I get the following results:
>> 
>> cqlsh:test> select * from escape;
>> 
>>  id | val
>> ----+-----------
>>   1 | %escapeme
>>   2 | escape%me
>>   3 | escape%esc
>> 
>> Contains search
>> 
>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>> 
>>  id | val
>> ----+-----------
>>   1 | %escapeme
>>   3 | escape%esc
>> (2 rows)
>> 
>> 
>> Prefix search
>> 
>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>> 
>>  id | val
>> ----+-----------
>>   2 | escape%me
>>   3 | escape%esc
>> 
>> Thanks,
>> Mikhail 
>> 
>>> On 13 Sep 2016, at 18:16, DuyHai Doan <doanduyhai@gmail.com <ma...@gmail.com>> wrote:
>>> 
>>> Use % to escape %
>>> 
>>> cqlsh:test> select * from escape;
>>> 
>>>  id | val
>>> ----+-----------
>>>   1 | %escapeme
>>>   2 | escape%me
>>> 
>>> 
>>> Contains search
>>> 
>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>> 
>>>  id | val
>>> ----+-----------
>>>   1 | %escapeme
>>> 
>>> (1 rows)
>>> 
>>> 
>>> Prefix search
>>> 
>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>> 
>>>  id | val
>>> ----+-----------
>>>   2 | escape%me
>>> 
>>> On Tue, Sep 13, 2016 at 5:06 PM, Mikhail Krupitskiy <mikhail.krupitskiy@jetbrains.com <ma...@jetbrains.com>> wrote:
>>> Hi Cassandra guys,
>>> 
>>> I use Cassandra 3.7 and wondering how to use ‘%’ as a simple char in a search pattern.
>>> Here is my test script:
>>> 
>>> DROP keyspace if exists kmv;
>>> CREATE keyspace if not exists kmv WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor':'1'} ;
>>> USE kmv;
>>> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, c1));
>>> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 'org.apache.cassandra.index.sa <http://org.apache.cassandra.index.sa/>si.SASIIndex' WITH OPTIONS = {
>>> 'analyzed' : 'true',
>>> 'analyzer_class' : 'org.apache.cassandra.index.sa <http://org.apache.cassandra.index.sa/>si.analyzer.NonTokenizingAnalyzer',
>>> 'case_sensitive' : 'false',
>>> 'mode' : 'CONTAINS'
>>> };
>>> 
>>> INSERT into kmv (id, c1, c2) values (1, 'f22', 'qwe%asd');
>>> INSERT into kmv (id, c1, c2) values (2, 'f22', '%asd');
>>> INSERT into kmv (id, c1, c2) values (3, 'f22', 'asd%');
>>> INSERT into kmv (id, c1, c2) values (4, 'f22', 'asd%1');
>>> INSERT into kmv (id, c1, c2) values (5, 'f22', 'qweasd');
>>> 
>>> SELECT c2 from kmv.kmv where c2 like ‘_pattern_';
>>> 
>>> _pattern_ '%%%' finds all columns that contain %.
>>> How to find columns that start form ‘%’ or ‘%a’?
>>> How to find columns that end with ‘%’?
>>> What about more complex patterns: '%qwe%a%sd%’? How to differentiate ‘%’ char form % as a command symbol? (Also there is a related issue CASSANDRA-12573).
>>> 
>>> 
>>> Thanks,
>>> Mikhail
>>> 
>> 
>> 
> 
> 
> 


Re: How to query '%' character using LIKE operator in Cassandra 3.7?

Posted by DuyHai Doan <do...@gmail.com>.
Ok so I've found the source of the issue, it's pretty well hidden because
it is NOT in the SASI source code directly.

Here is the method where C* determines what kind of LIKE expression you're
using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES)

https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778

As you can see, it's pretty simple, maybe too simple. Indeed, they forget
to remove escape character BEFORE doing the matching so if your search is LIKE
'%%esc%', the detected expression is LIKE_CONTAINS.

A possible fix would be:

1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on
the column data type)
2) remove the escape character e.g. before parsing OR use some advanced
regex to exclude the %% from parsing e.g

Step 2) is dead easy but step 1) is harder because I don't know if
converting the bytebuffer into String at this stage of the CQL parser is
expensive or not (in term of computation)

Let me try a patch



On Wed, Sep 14, 2016 at 9:42 AM, DuyHai Doan <do...@gmail.com> wrote:

> Ok you're right, I get your point
>
> LIKE '%%esc%' --> startWith('%esc')
>
> LIKE 'escape%%' -->  = 'escape%'
>
> What I strongly suspect is that in the source code of SASI, we parse the %
> xxx % expression BEFORE applying escape. That will explain the observed
> behavior. E.g:
>
> LIKE '%%esc%'  parsed as %xxx% where xxx = %esc
>
> LIKE 'escape%%' parsed as xxx% where xxx =escape%
>
> Let me check in the source code and try to reproduce the issue
>
>
>
> On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy <
> mikhail.krupitskiy@jetbrains.com> wrote:
>
>> Looks like we have different understanding of what results are expected.
>> I based my understanding on http://docs.datastax.com/en
>> /cql/3.3/cql/cql_using/useSASIIndex.html
>> According to the doc ‘esc’ is a pattern for exact match and I guess that
>> there is no semantical difference between two LIKE patterns (both of
>> patterns should be treated as ‘exact match'): ‘%%esc’ and ‘esc’.
>>
>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results
>> *containing* '%esc' so *%esc*apeme is a possible match and also escape
>> *%esc*
>>
>> Why ‘containing’? I expect that it should be ’starting’..
>>
>>
>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results
>> *starting* with 'escape%' so *escape%*me is a valid result and also
>> *escape%*esc
>>
>> Why ’starting’? I expect that it should be ‘exact matching’.
>>
>> Also I expect that “ LIKE ‘%s%sc%’ ” will return ‘escape%esc’ but it
>> returns nothing (CASSANDRA-12573).
>>
>> What I’m missing?
>>
>> Thanks,
>> Mikhail
>>
>> On 13 Sep 2016, at 19:31, DuyHai Doan <do...@gmail.com> wrote:
>>
>> CREATE CUSTOM INDEX ON test.escape(val) USING '
>> org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode':
>> 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sa
>> si.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false'};
>>
>> I don't see any problem in the results you got
>>
>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results
>> *containing* '%esc' so *%esc*apeme is a possible match and also escape
>> *%esc*
>>
>> Why ‘containing’? I expect that it should be ’starting’..
>>
>>
>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results
>> *starting* with 'escape%' so *escape%*me is a valid result and also
>> *escape%*esc
>>
>> Why ’starting’? I expect that it should be ‘exact matching’.
>>
>>
>> On Tue, Sep 13, 2016 at 5:58 PM, Mikhail Krupitskiy <
>> mikhail.krupitskiy@jetbrains.com> wrote:
>>
>>> Thanks for the reply.
>>> Could you please provide what index definition did you use?
>>> With the index from my script I get the following results:
>>>
>>> cqlsh:test> select * from escape;
>>>
>>>  id | val
>>> ----+-----------
>>>   1 | %escapeme
>>>   2 | escape%me
>>> *  3 | escape%esc*
>>>
>>> Contains search
>>>
>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>>
>>>  id | val
>>> ----+-----------
>>>   1 | %escapeme
>>>   3
>>> * | escape%esc*(2 rows)
>>>
>>>
>>> Prefix search
>>>
>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>>
>>>  id | val
>>> ----+-----------
>>>   2 | escape%me
>>>   3
>>> * | escape%esc*
>>>
>>> Thanks,
>>> Mikhail
>>>
>>> On 13 Sep 2016, at 18:16, DuyHai Doan <do...@gmail.com> wrote:
>>>
>>> Use % to escape %
>>>
>>> cqlsh:test> select * from escape;
>>>
>>>  id | val
>>> ----+-----------
>>>   1 | %escapeme
>>>   2 | escape%me
>>>
>>>
>>> Contains search
>>>
>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>>
>>>  id | val
>>> ----+-----------
>>>   1 | %escapeme
>>>
>>> (1 rows)
>>>
>>>
>>> Prefix search
>>>
>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>>
>>>  id | val
>>> ----+-----------
>>>   2 | escape%me
>>>
>>> On Tue, Sep 13, 2016 at 5:06 PM, Mikhail Krupitskiy <
>>> mikhail.krupitskiy@jetbrains.com> wrote:
>>>
>>>> Hi Cassandra guys,
>>>>
>>>> I use Cassandra 3.7 and wondering how to use ‘%’ as a simple char in a
>>>> search pattern.
>>>> Here is my test script:
>>>>
>>>> DROP keyspace if exists kmv;
>>>> CREATE keyspace if not exists kmv WITH REPLICATION = { 'class' :
>>>> 'SimpleStrategy', 'replication_factor':'1'} ;
>>>> USE kmv;
>>>> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY
>>>> KEY(id, c1));
>>>> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING '
>>>> org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>>>> 'analyzed' : 'true',
>>>> 'analyzer_class' : 'org.apache.cassandra.index.sa
>>>> si.analyzer.NonTokenizingAnalyzer',
>>>> 'case_sensitive' : 'false',
>>>> 'mode' : 'CONTAINS'
>>>> };
>>>>
>>>> INSERT into kmv (id, c1, c2) values (1, 'f22', 'qwe%asd');
>>>> INSERT into kmv (id, c1, c2) values (2, 'f22', '%asd');
>>>> INSERT into kmv (id, c1, c2) values (3, 'f22', 'asd%');
>>>> INSERT into kmv (id, c1, c2) values (4, 'f22', 'asd%1');
>>>> INSERT into kmv (id, c1, c2) values (5, 'f22', 'qweasd');
>>>>
>>>> SELECT c2 from kmv.kmv where c2 like ‘_pattern_';
>>>>
>>>> _pattern_ '%%%' finds all columns that contain %.
>>>> How to find columns that start form ‘%’ or ‘%a’?
>>>> How to find columns that end with ‘%’?
>>>> What about more complex patterns: '%qwe%a%sd%’? How to differentiate
>>>> ‘%’ char form % as a command symbol? (Also there is a related issue
>>>> CASSANDRA-12573).
>>>>
>>>>
>>>> Thanks,
>>>> Mikhail
>>>
>>>
>>>
>>>
>>
>>
>

Re: How to query '%' character using LIKE operator in Cassandra 3.7?

Posted by DuyHai Doan <do...@gmail.com>.
Ok you're right, I get your point

LIKE '%%esc%' --> startWith('%esc')

LIKE 'escape%%' -->  = 'escape%'

What I strongly suspect is that in the source code of SASI, we parse the %
xxx % expression BEFORE applying escape. That will explain the observed
behavior. E.g:

LIKE '%%esc%'  parsed as %xxx% where xxx = %esc

LIKE 'escape%%' parsed as xxx% where xxx =escape%

Let me check in the source code and try to reproduce the issue



On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy <
mikhail.krupitskiy@jetbrains.com> wrote:

> Looks like we have different understanding of what results are expected.
> I based my understanding on http://docs.datastax.com/
> en/cql/3.3/cql/cql_using/useSASIIndex.html
> According to the doc ‘esc’ is a pattern for exact match and I guess that
> there is no semantical difference between two LIKE patterns (both of
> patterns should be treated as ‘exact match'): ‘%%esc’ and ‘esc’.
>
> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results
> *containing* '%esc' so *%esc*apeme is a possible match and also escape
> *%esc*
>
> Why ‘containing’? I expect that it should be ’starting’..
>
>
> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results
> *starting* with 'escape%' so *escape%*me is a valid result and also
> *escape%*esc
>
> Why ’starting’? I expect that it should be ‘exact matching’.
>
> Also I expect that “ LIKE ‘%s%sc%’ ” will return ‘escape%esc’ but it
> returns nothing (CASSANDRA-12573).
>
> What I’m missing?
>
> Thanks,
> Mikhail
>
> On 13 Sep 2016, at 19:31, DuyHai Doan <do...@gmail.com> wrote:
>
> CREATE CUSTOM INDEX ON test.escape(val) USING 'org.apache.cassandra.index.sasi.SASIIndex'
> WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
> 'case_sensitive': 'false'};
>
> I don't see any problem in the results you got
>
> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results
> *containing* '%esc' so *%esc*apeme is a possible match and also escape
> *%esc*
>
> Why ‘containing’? I expect that it should be ’starting’..
>
>
> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results
> *starting* with 'escape%' so *escape%*me is a valid result and also
> *escape%*esc
>
> Why ’starting’? I expect that it should be ‘exact matching’.
>
>
> On Tue, Sep 13, 2016 at 5:58 PM, Mikhail Krupitskiy <
> mikhail.krupitskiy@jetbrains.com> wrote:
>
>> Thanks for the reply.
>> Could you please provide what index definition did you use?
>> With the index from my script I get the following results:
>>
>> cqlsh:test> select * from escape;
>>
>>  id | val
>> ----+-----------
>>   1 | %escapeme
>>   2 | escape%me
>> *  3 | escape%esc*
>>
>> Contains search
>>
>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>
>>  id | val
>> ----+-----------
>>   1 | %escapeme
>>   3
>> * | escape%esc*(2 rows)
>>
>>
>> Prefix search
>>
>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>
>>  id | val
>> ----+-----------
>>   2 | escape%me
>>   3
>> * | escape%esc*
>>
>> Thanks,
>> Mikhail
>>
>> On 13 Sep 2016, at 18:16, DuyHai Doan <do...@gmail.com> wrote:
>>
>> Use % to escape %
>>
>> cqlsh:test> select * from escape;
>>
>>  id | val
>> ----+-----------
>>   1 | %escapeme
>>   2 | escape%me
>>
>>
>> Contains search
>>
>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>
>>  id | val
>> ----+-----------
>>   1 | %escapeme
>>
>> (1 rows)
>>
>>
>> Prefix search
>>
>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>
>>  id | val
>> ----+-----------
>>   2 | escape%me
>>
>> On Tue, Sep 13, 2016 at 5:06 PM, Mikhail Krupitskiy <
>> mikhail.krupitskiy@jetbrains.com> wrote:
>>
>>> Hi Cassandra guys,
>>>
>>> I use Cassandra 3.7 and wondering how to use ‘%’ as a simple char in a
>>> search pattern.
>>> Here is my test script:
>>>
>>> DROP keyspace if exists kmv;
>>> CREATE keyspace if not exists kmv WITH REPLICATION = { 'class' :
>>> 'SimpleStrategy', 'replication_factor':'1'} ;
>>> USE kmv;
>>> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY
>>> KEY(id, c1));
>>> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING '
>>> org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>>> 'analyzed' : 'true',
>>> 'analyzer_class' : 'org.apache.cassandra.index.sa
>>> si.analyzer.NonTokenizingAnalyzer',
>>> 'case_sensitive' : 'false',
>>> 'mode' : 'CONTAINS'
>>> };
>>>
>>> INSERT into kmv (id, c1, c2) values (1, 'f22', 'qwe%asd');
>>> INSERT into kmv (id, c1, c2) values (2, 'f22', '%asd');
>>> INSERT into kmv (id, c1, c2) values (3, 'f22', 'asd%');
>>> INSERT into kmv (id, c1, c2) values (4, 'f22', 'asd%1');
>>> INSERT into kmv (id, c1, c2) values (5, 'f22', 'qweasd');
>>>
>>> SELECT c2 from kmv.kmv where c2 like ‘_pattern_';
>>>
>>> _pattern_ '%%%' finds all columns that contain %.
>>> How to find columns that start form ‘%’ or ‘%a’?
>>> How to find columns that end with ‘%’?
>>> What about more complex patterns: '%qwe%a%sd%’? How to differentiate ‘%’
>>> char form % as a command symbol? (Also there is a related issue
>>> CASSANDRA-12573).
>>>
>>>
>>> Thanks,
>>> Mikhail
>>
>>
>>
>>
>
>

Re: How to query '%' character using LIKE operator in Cassandra 3.7?

Posted by Mikhail Krupitskiy <mi...@jetbrains.com>.
Looks like we have different understanding of what results are expected.
I based my understanding on http://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html <http://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html>
According to the doc ‘esc’ is a pattern for exact match and I guess that there is no semantical difference between two LIKE patterns (both of patterns should be treated as ‘exact match'): ‘%%esc’ and ‘esc’.

> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results containing '%esc' so %escapeme is a possible match and also escape%esc
Why ‘containing’? I expect that it should be ’starting’..
> 
> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results starting with 'escape%' so escape%me is a valid result and also escape%esc
Why ’starting’? I expect that it should be ‘exact matching’.

Also I expect that “ LIKE ‘%s%sc%’ ” will return ‘escape%esc’ but it returns nothing (CASSANDRA-12573).

What I’m missing?

Thanks,
Mikhail

> On 13 Sep 2016, at 19:31, DuyHai Doan <do...@gmail.com> wrote:
> 
> CREATE CUSTOM INDEX ON test.escape(val) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false'};
> 
> I don't see any problem in the results you got
> 
> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results containing '%esc' so %escapeme is a possible match and also escape%esc
Why ‘containing’? I expect that it should be ’starting’..
> 
> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results starting with 'escape%' so escape%me is a valid result and also escape%esc
Why ’starting’? I expect that it should be ‘exact matching’.
> 
> On Tue, Sep 13, 2016 at 5:58 PM, Mikhail Krupitskiy <mikhail.krupitskiy@jetbrains.com <ma...@jetbrains.com>> wrote:
> Thanks for the reply.
> Could you please provide what index definition did you use?
> With the index from my script I get the following results:
> 
> cqlsh:test> select * from escape;
> 
>  id | val
> ----+-----------
>   1 | %escapeme
>   2 | escape%me
>   3 | escape%esc
> 
> Contains search
> 
> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
> 
>  id | val
> ----+-----------
>   1 | %escapeme
>   3 | escape%esc
> (2 rows)
> 
> 
> Prefix search
> 
> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
> 
>  id | val
> ----+-----------
>   2 | escape%me
>   3 | escape%esc
> 
> Thanks,
> Mikhail 
> 
>> On 13 Sep 2016, at 18:16, DuyHai Doan <doanduyhai@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Use % to escape %
>> 
>> cqlsh:test> select * from escape;
>> 
>>  id | val
>> ----+-----------
>>   1 | %escapeme
>>   2 | escape%me
>> 
>> 
>> Contains search
>> 
>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>> 
>>  id | val
>> ----+-----------
>>   1 | %escapeme
>> 
>> (1 rows)
>> 
>> 
>> Prefix search
>> 
>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>> 
>>  id | val
>> ----+-----------
>>   2 | escape%me
>> 
>> On Tue, Sep 13, 2016 at 5:06 PM, Mikhail Krupitskiy <mikhail.krupitskiy@jetbrains.com <ma...@jetbrains.com>> wrote:
>> Hi Cassandra guys,
>> 
>> I use Cassandra 3.7 and wondering how to use ‘%’ as a simple char in a search pattern.
>> Here is my test script:
>> 
>> DROP keyspace if exists kmv;
>> CREATE keyspace if not exists kmv WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor':'1'} ;
>> USE kmv;
>> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, c1));
>> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 'org.apache.cassandra.index.sa <http://org.apache.cassandra.index.sa/>si.SASIIndex' WITH OPTIONS = {
>> 'analyzed' : 'true',
>> 'analyzer_class' : 'org.apache.cassandra.index.sa <http://org.apache.cassandra.index.sa/>si.analyzer.NonTokenizingAnalyzer',
>> 'case_sensitive' : 'false',
>> 'mode' : 'CONTAINS'
>> };
>> 
>> INSERT into kmv (id, c1, c2) values (1, 'f22', 'qwe%asd');
>> INSERT into kmv (id, c1, c2) values (2, 'f22', '%asd');
>> INSERT into kmv (id, c1, c2) values (3, 'f22', 'asd%');
>> INSERT into kmv (id, c1, c2) values (4, 'f22', 'asd%1');
>> INSERT into kmv (id, c1, c2) values (5, 'f22', 'qweasd');
>> 
>> SELECT c2 from kmv.kmv where c2 like ‘_pattern_';
>> 
>> _pattern_ '%%%' finds all columns that contain %.
>> How to find columns that start form ‘%’ or ‘%a’?
>> How to find columns that end with ‘%’?
>> What about more complex patterns: '%qwe%a%sd%’? How to differentiate ‘%’ char form % as a command symbol? (Also there is a related issue CASSANDRA-12573).
>> 
>> 
>> Thanks,
>> Mikhail
>> 
> 
> 


Re: How to query '%' character using LIKE operator in Cassandra 3.7?

Posted by DuyHai Doan <do...@gmail.com>.
CREATE CUSTOM INDEX ON test.escape(val) USING
'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode':
'CONTAINS', 'analyzer_class':
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
'case_sensitive': 'false'};

I don't see any problem in the results you got

SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results
*containing* '%esc' so *%esc*apeme is a possible match and also escape*%esc*

SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results
*starting* with 'escape%' so *escape%*me is a valid result and also
*escape%*esc

On Tue, Sep 13, 2016 at 5:58 PM, Mikhail Krupitskiy <
mikhail.krupitskiy@jetbrains.com> wrote:

> Thanks for the reply.
> Could you please provide what index definition did you use?
> With the index from my script I get the following results:
>
> cqlsh:test> select * from escape;
>
>  id | val
> ----+-----------
>   1 | %escapeme
>   2 | escape%me
> *  3 | escape%esc*
>
> Contains search
>
> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>
>  id | val
> ----+-----------
>   1 | %escapeme
>   3
> * | escape%esc*(2 rows)
>
>
> Prefix search
>
> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>
>  id | val
> ----+-----------
>   2 | escape%me
>   3
> * | escape%esc*
>
> Thanks,
> Mikhail
>
> On 13 Sep 2016, at 18:16, DuyHai Doan <do...@gmail.com> wrote:
>
> Use % to escape %
>
> cqlsh:test> select * from escape;
>
>  id | val
> ----+-----------
>   1 | %escapeme
>   2 | escape%me
>
>
> Contains search
>
> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>
>  id | val
> ----+-----------
>   1 | %escapeme
>
> (1 rows)
>
>
> Prefix search
>
> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>
>  id | val
> ----+-----------
>   2 | escape%me
>
> On Tue, Sep 13, 2016 at 5:06 PM, Mikhail Krupitskiy <
> mikhail.krupitskiy@jetbrains.com> wrote:
>
>> Hi Cassandra guys,
>>
>> I use Cassandra 3.7 and wondering how to use ‘%’ as a simple char in a
>> search pattern.
>> Here is my test script:
>>
>> DROP keyspace if exists kmv;
>> CREATE keyspace if not exists kmv WITH REPLICATION = { 'class' :
>> 'SimpleStrategy', 'replication_factor':'1'} ;
>> USE kmv;
>> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id,
>> c1));
>> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING '
>> org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>> 'analyzed' : 'true',
>> 'analyzer_class' : 'org.apache.cassandra.index.sa
>> si.analyzer.NonTokenizingAnalyzer',
>> 'case_sensitive' : 'false',
>> 'mode' : 'CONTAINS'
>> };
>>
>> INSERT into kmv (id, c1, c2) values (1, 'f22', 'qwe%asd');
>> INSERT into kmv (id, c1, c2) values (2, 'f22', '%asd');
>> INSERT into kmv (id, c1, c2) values (3, 'f22', 'asd%');
>> INSERT into kmv (id, c1, c2) values (4, 'f22', 'asd%1');
>> INSERT into kmv (id, c1, c2) values (5, 'f22', 'qweasd');
>>
>> SELECT c2 from kmv.kmv where c2 like ‘_pattern_';
>>
>> _pattern_ '%%%' finds all columns that contain %.
>> How to find columns that start form ‘%’ or ‘%a’?
>> How to find columns that end with ‘%’?
>> What about more complex patterns: '%qwe%a%sd%’? How to differentiate ‘%’
>> char form % as a command symbol? (Also there is a related issue
>> CASSANDRA-12573).
>>
>>
>> Thanks,
>> Mikhail
>
>
>
>

Re: How to query '%' character using LIKE operator in Cassandra 3.7?

Posted by Mikhail Krupitskiy <mi...@jetbrains.com>.
Thanks for the reply.
Could you please provide what index definition did you use?
With the index from my script I get the following results:

cqlsh:test> select * from escape;

 id | val
----+-----------
  1 | %escapeme
  2 | escape%me
  3 | escape%esc

Contains search

cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';

 id | val
----+-----------
  1 | %escapeme
  3 | escape%esc
(2 rows)


Prefix search

cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';

 id | val
----+-----------
  2 | escape%me
  3 | escape%esc

Thanks,
Mikhail 

> On 13 Sep 2016, at 18:16, DuyHai Doan <do...@gmail.com> wrote:
> 
> Use % to escape %
> 
> cqlsh:test> select * from escape;
> 
>  id | val
> ----+-----------
>   1 | %escapeme
>   2 | escape%me
> 
> 
> Contains search
> 
> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
> 
>  id | val
> ----+-----------
>   1 | %escapeme
> 
> (1 rows)
> 
> 
> Prefix search
> 
> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
> 
>  id | val
> ----+-----------
>   2 | escape%me
> 
> On Tue, Sep 13, 2016 at 5:06 PM, Mikhail Krupitskiy <mikhail.krupitskiy@jetbrains.com <ma...@jetbrains.com>> wrote:
> Hi Cassandra guys,
> 
> I use Cassandra 3.7 and wondering how to use ‘%’ as a simple char in a search pattern.
> Here is my test script:
> 
> DROP keyspace if exists kmv;
> CREATE keyspace if not exists kmv WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor':'1'} ;
> USE kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
> 'analyzed' : 'true',
> 'analyzer_class' : 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
> 'case_sensitive' : 'false',
> 'mode' : 'CONTAINS'
> };
> 
> INSERT into kmv (id, c1, c2) values (1, 'f22', 'qwe%asd');
> INSERT into kmv (id, c1, c2) values (2, 'f22', '%asd');
> INSERT into kmv (id, c1, c2) values (3, 'f22', 'asd%');
> INSERT into kmv (id, c1, c2) values (4, 'f22', 'asd%1');
> INSERT into kmv (id, c1, c2) values (5, 'f22', 'qweasd');
> 
> SELECT c2 from kmv.kmv where c2 like ‘_pattern_';
> 
> _pattern_ '%%%' finds all columns that contain %.
> How to find columns that start form ‘%’ or ‘%a’?
> How to find columns that end with ‘%’?
> What about more complex patterns: '%qwe%a%sd%’? How to differentiate ‘%’ char form % as a command symbol? (Also there is a related issue CASSANDRA-12573).
> 
> 
> Thanks,
> Mikhail
> 


Re: How to query '%' character using LIKE operator in Cassandra 3.7?

Posted by DuyHai Doan <do...@gmail.com>.
Use % to escape %

cqlsh:test> select * from escape;

 id | val
----+-----------
  1 | %escapeme
  2 | escape%me


Contains search

cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';

 id | val
----+-----------
  1 | %escapeme

(1 rows)


Prefix search

cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';

 id | val
----+-----------
  2 | escape%me

On Tue, Sep 13, 2016 at 5:06 PM, Mikhail Krupitskiy <
mikhail.krupitskiy@jetbrains.com> wrote:

> Hi Cassandra guys,
>
> I use Cassandra 3.7 and wondering how to use ‘%’ as a simple char in a
> search pattern.
> Here is my test script:
>
> DROP keyspace if exists kmv;
> CREATE keyspace if not exists kmv WITH REPLICATION = { 'class' :
> 'SimpleStrategy', 'replication_factor':'1'} ;
> USE kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id,
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 'org.apache.cassandra.index.sasi.SASIIndex'
> WITH OPTIONS = {
> 'analyzed' : 'true',
> 'analyzer_class' : 'org.apache.cassandra.index.sasi.analyzer.
> NonTokenizingAnalyzer',
> 'case_sensitive' : 'false',
> 'mode' : 'CONTAINS'
> };
>
> INSERT into kmv (id, c1, c2) values (1, 'f22', 'qwe%asd');
> INSERT into kmv (id, c1, c2) values (2, 'f22', '%asd');
> INSERT into kmv (id, c1, c2) values (3, 'f22', 'asd%');
> INSERT into kmv (id, c1, c2) values (4, 'f22', 'asd%1');
> INSERT into kmv (id, c1, c2) values (5, 'f22', 'qweasd');
>
> SELECT c2 from kmv.kmv where c2 like ‘_pattern_';
>
> _pattern_ '%%%' finds all columns that contain %.
> How to find columns that start form ‘%’ or ‘%a’?
> How to find columns that end with ‘%’?
> What about more complex patterns: '%qwe%a%sd%’? How to differentiate ‘%’
> char form % as a command symbol? (Also there is a related issue
> CASSANDRA-12573).
>
>
> Thanks,
> Mikhail