You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "DOAN DuyHai (JIRA)" <ji...@apache.org> on 2016/09/20 15:43:20 UTC
[jira] [Created] (CASSANDRA-12674) [SASI] Confusing AND/OR
semantics for StandardAnalyzer
DOAN DuyHai created CASSANDRA-12674:
---------------------------------------
Summary: [SASI] Confusing AND/OR semantics for StandardAnalyzer
Key: CASSANDRA-12674
URL: https://issues.apache.org/jira/browse/CASSANDRA-12674
Project: Cassandra
Issue Type: Bug
Components: sasi
Environment: Cassandra 3.7
Reporter: DOAN DuyHai
{code:sql}
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.7 | CQL spec 3.4.2 | Native protocol v4]
Use HELP for help.
cqlsh> use test;
cqlsh:test> CREATE TABLE sasi_bug(id int, clustering int, val text, PRIMARY KEY((id), clustering));
cqlsh:test> CREATE CUSTOM INDEX ON sasi_bug(val) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
'mode': 'CONTAINS',
'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
'analyzed': 'true'};
//1st example SAME PARTITION KEY
cqlsh:test> INSERT INTO sasi_bug(id, clustering , val ) VALUES(1, 1, 'homeworker');
cqlsh:test> INSERT INTO sasi_bug(id, clustering , val ) VALUES(1, 2, 'hardworker');
cqlsh:test> SELECT * FROM sasi_bug WHERE val LIKE '%work home%';
id | clustering | val
----+------------+------------
1 | 1 | homeworker
1 | 2 | hardworker
(2 rows)
//2nd example DIFFERENT PARTITION KEY
cqlsh:test> INSERT INTO sasi_bug(id, clustering, val) VALUES(10, 1, 'speedrun');
cqlsh:test> INSERT INTO sasi_bug(id, clustering, val) VALUES(11, 1, 'longrun');
cqlsh:test> SELECT * FROM sasi_bug WHERE val LIKE '%long run%';
id | clustering | val
----+------------+---------
11 | 1 | longrun
(1 rows)
{code}
In the 1st example, both rows belong to the same partition so SASI returns both values. Indeed {{LIKE '%work home%'}} means {{contains 'work' OR 'home'}} so the result makes sense
In the 2nd example, only one row is returned whereas we expect 2 rows because {{LIKE '%long run%'}} means {{contains 'long' OR 'run'}} so *speedrun* should be returned too.
So where is the problem ? Explanation:
When there is only 1 predicate, the root operation type is an *AND*:
{code:java|title=QueryPlan}
private Operation analyze()
{
try
{
Operation.Builder and = new Operation.Builder(OperationType.AND, controller);
controller.getExpressions().forEach(and::add);
return and.complete();
}
...
}
{code}
During the parsing of {{LIKE '%long run%'}}, SASI creates 2 expressions for the searched term: {{long}} and {{run}}, which corresponds to an *OR* logic. However, this piece of code just ruins the *OR* logic:
{code:java|title=Operation}
public Operation complete()
{
if (!expressions.isEmpty())
{
ListMultimap<ColumnDefinition, Expression> analyzedExpressions = analyzeGroup(controller, op, expressions);
RangeIterator.Builder<Long, Token> range = controller.getIndexes(op, analyzedExpressions.values());
...
}
{code}
As you can see, we blindly take all the *values* of the MultiMap (which contains a single entry for the {{val}} column with 2 expressions) and pass it to {{controller.getIndexes(...)}}
{code:java|title=QueryController}
public RangeIterator.Builder<Long, Token> getIndexes(OperationType op, Collection<Expression> expressions)
{
if (resources.containsKey(expressions))
throw new IllegalArgumentException("Can't process the same expressions multiple times.");
RangeIterator.Builder<Long, Token> builder = op == OperationType.OR
? RangeUnionIterator.<Long, Token>builder()
: RangeIntersectionIterator.<Long, Token>builder();
...
}
{code}
And because the root operation has *AND* type, the {{RangeIntersectionIterator}} will be used on both expressions {{long}} and {{run}}.
So when data belong to different partitions, we have the *AND* logic that applies and eliminates _speedrun_
When data belong to the same partition but different row, the {{RangeIntersectionIterator}} returns a single partition and then the rows are filtered further by {{operationTree.satisfiedBy}} and the results are correct
{code:java|title=QueryPlan}
while (currentKeys.hasNext())
{
DecoratedKey key = currentKeys.next();
if (!keyRange.right.isMinimum() && keyRange.right.compareTo(key) < 0)
return endOfData();
try (UnfilteredRowIterator partition = controller.getPartition(key, executionController))
{
Row staticRow = partition.staticRow();
List<Unfiltered> clusters = new ArrayList<>();
while (partition.hasNext())
{
Unfiltered row = partition.next();
if (operationTree.satisfiedBy(row, staticRow, true))
clusters.add(row);
}
...
}
{code}
/cc [~xedin] [~ifesdjeen]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)