You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "James Taylor (JIRA)" <ji...@apache.org> on 2014/07/18 20:50:05 UTC
[jira] [Comment Edited] (PHOENIX-1083) IN list of RVC combined with AND doesn't return expected rows

    [ https://issues.apache.org/jira/browse/PHOENIX-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14066721#comment-14066721 ] 

James Taylor edited comment on PHOENIX-1083 at 7/18/14 6:49 PM:
----------------------------------------------------------------

Might be multiple bugs here, but one to fix is the case where user is part of the row key:
{code}
CREATE TABLE in_test (tenant_id VARCHAR(5) NOT NULL, id INTEGER NOT NULL, user VARCHAR CONSTRAINT pk PRIMARY KEY (tenant_id, id, user))

upsert into in_test (tenant_id, id, user) values ('a', 1, 'BonA')
upsert into in_test (tenant_id, id, user) values ('a', 2, 'BonB')

select id from in_test WHERE tenant_id = 'a' AND ((id, user) IN ((1, 'BonA'), (1, 'BonB')))
{code}

In this case, this code in WhereOptimizer.pushKeyExpressionsToScan() needs to recognize that it is fully qualified here and the flatten code should get passed the constant value(s) for the columns slots *before* the RVC so that it can prefix the RVC value with them correctly.
{code}
            if (fullyQualifiedColumnCount > 1 && slot.getPKSpan() == fullyQualifiedColumnCount && !EVERYTHING_RANGES.equals(slot.getKeyRanges())) {
                schema = nBuckets == null ? SchemaUtil.VAR_BINARY_SCHEMA : SaltingUtil.VAR_BINARY_SALTED_SCHEMA;
            }
{code}

Another alternative may be to recognize this in KeyExpressionVisitor.andKeySlots() and re-formulate the RVC constants by prepending them with the prior column constant values.

In the case you brought up, as long as the skip scan filter is not attempted to be used, things should work, as the RVC will be evaluated row-by-row. Is the skip scan filter being used? This is determined in WhereOptimizer.pushKeyExpressionsToScan(), at the end, in the call here:
{code}
        context.setScanRanges(
                ScanRanges.create(cnf, schema, statement.getHint().hasHint(Hint.RANGE_SCAN), nBuckets),
                keySlots.getMinMaxRange());
{code}

I think the problem may be with this code:
{code}
            if (fullyQualifiedColumnCount > 1 && slot.getPKSpan() == fullyQualifiedColumnCount && !EVERYTHING_RANGES.equals(slot.getKeyRanges())) {
                schema = nBuckets == null ? SchemaUtil.VAR_BINARY_SCHEMA : SaltingUtil.VAR_BINARY_SALTED_SCHEMA;
            }
{code}
It should be something like this instead:
{code}
            if (fullyQualifiedColumnCount > 1) {
                if (      slot.getPKSpan() == fullyQualifiedColumnCount
                    && !EVERYTHING_RANGES.equals(slot.getKeyRanges())) {
                    schema = nBuckets == null ? SchemaUtil.VAR_BINARY_SCHEMA : 
                        SaltingUtil.VAR_BINARY_SALTED_SCHEMA;
                } else {
                    // Exit early, as if we have a RVC which doesn't span the entire primary constraint,
                    // we cannot optimize it.
                    // TODO: still optimize it if all columns before have constants specified
                    break;
                }
{code}

Would you mind giving this a try, [~kbuzsaki]? I'll be in the office today and we can discuss further then. Thanks!



was (Author: jamestaylor):
Might be multiple bugs here, but one to fix is the case where user is part of the row key:
{code}
CREATE TABLE in_test (tenant_id VARCHAR(5) NOT NULL, id INTEGER NOT NULL, user VARCHAR CONSTRAINT pk PRIMARY KEY (tenant_id, id, user))

upsert into in_test (tenant_id, id, user) values ('a', 1, 'BonA')
upsert into in_test (tenant_id, id, user) values ('a', 2, 'BonB')

select id from in_test WHERE tenant_id = 'a' AND ((id, user) IN ((1, 'BonA'), (1, 'BonB')))
{code}
            if (fullyQualifiedColumnCount > 1 && slot.getPKSpan() == fullyQualifiedColumnCount && !EVERYTHING_RANGES.equals(slot.getKeyRanges())) {
                schema = nBuckets == null ? SchemaUtil.VAR_BINARY_SCHEMA : SaltingUtil.VAR_BINARY_SALTED_SCHEMA;
            }
{code}

In this case, this code in WhereOptimizer.pushKeyExpressionsToScan() needs to recognize that it is fully qualified and the flatten code should get passed the constant value(s) for the columns slots *before* the RVC so that it can prefix the RVC value with them correctly.

Another alternative may be to recognize this in KeyExpressionVisitor.andKeySlots() and re-formulate the RVC constants by prepending them with the prior column constant values.

In the case you brought up, as long as the skip scan filter is not attempted to be used, things should work, as the RVC will be evaluated row-by-row. Is the skip scan filter being used? This is determined in WhereOptimizer.pushKeyExpressionsToScan(), at the end, in the call here:
{code}
        context.setScanRanges(
                ScanRanges.create(cnf, schema, statement.getHint().hasHint(Hint.RANGE_SCAN), nBuckets),
                keySlots.getMinMaxRange());
{code}

I think the problem may be with this code:
{code}
            if (fullyQualifiedColumnCount > 1 && slot.getPKSpan() == fullyQualifiedColumnCount && !EVERYTHING_RANGES.equals(slot.getKeyRanges())) {
                schema = nBuckets == null ? SchemaUtil.VAR_BINARY_SCHEMA : SaltingUtil.VAR_BINARY_SALTED_SCHEMA;
            }
{code}
It should be something like this instead:
{code}
            if (fullyQualifiedColumnCount > 1) {
                if (      slot.getPKSpan() == fullyQualifiedColumnCount
                    && !EVERYTHING_RANGES.equals(slot.getKeyRanges())) {
                    schema = nBuckets == null ? SchemaUtil.VAR_BINARY_SCHEMA : 
                        SaltingUtil.VAR_BINARY_SALTED_SCHEMA;
                } else {
                    // Exit early, as if we have a RVC which doesn't span the entire primary constraint,
                    // we cannot optimize it.
                    // TODO: still optimize it if all columns before have constants specified
                    break;
                }
{code}

Would you mind giving this a try, [~kbuzsaki]? I'll be in the office today and we can discuss further then. Thanks!


> IN list of RVC combined with AND doesn't return expected rows
> -------------------------------------------------------------
>
>                 Key: PHOENIX-1083
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1083
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 3.0.0, 4.0.0, 5.0.0
>            Reporter: Samarth Jain
>            Assignee: James Taylor
>
> {code}
> CREATE TABLE in_test ( user VARCHAR, tenant_id VARCHAR(5) NOT NULL,tenant_type_id VARCHAR(3) NOT NULL,  id INTEGER NOT NULL CONSTRAINT pk PRIMARY KEY (tenant_id, tenant_type_id, id))
> upsert into in_test (tenant_id, tenant_type_id, id, user) values ('a', 'a', 1, 'BonA')
> upsert into in_test (tenant_id, tenant_type_id, id, user) values ('a', 'a', 2, 'BonB')
> select id from in_test WHERE tenant_id = 'a' and tenant_type_id = 'a' and ((id, user) IN ((1, 'BonA'),(1, 'BonA')))
> Rows returned - none. Should have returned one row. 
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)