You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "Kyle Buzsaki (JIRA)" <ji...@apache.org> on 2014/07/17 23:12:05 UTC

[jira] [Commented] (PHOENIX-1082) IN List of RVCs doesn't return all the rows when executed against a tenant-specific view for a multi-tenant table that is salted.

    [ https://issues.apache.org/jira/browse/PHOENIX-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065556#comment-14065556 ] 

Kyle Buzsaki commented on PHOENIX-1082:
---------------------------------------

It looks like the issue is in the WhereOptimizer when constructing the keySlots / cnf and determining the schema. There's a mismatch in their construction that leads to the SkipScanFilter improperly filtering out values.

With a salted multi-tenant table as described about and the query
{code}
select pk2, pk3 from t_view WHERE (pk2, pk3) IN (('helo3',  3),  ('helo5',  5)) ORDER BY pk2
{code}

You get the following cnf / slots passed to the SkipScanFilter:
{code}
[[\x00, \x01, \x02, \x03], [ABCDEF], [[helo3\x00\x80\x00\x00\x03 - helo5\x00\x80\x00\x00\x05]]]
{code}
The first slot is the salt byte, the second slot is the tenant id, and the third slot is the range of concatenated bytes allowed by the InListExpression in the where clause ( WHERE (pk2, pk3) IN (('helo3',  3),  ('helo5',  5)) ).

When the SkipScanFilter is actually filtering the rows passed to it, it incorrectly evaluates this row key, returning ReturnCode.SEEK_NEXT_USING_HINT instead of ReturnCode.INCLUDE:
{code}
\x00ABCDEF\x00helo3\x00\x80\x00\x00\x03/0:C1/300/Put/vlen=4/mvcc=2
{code}
Stepping through the SkipScanFilter's navigate method, the issue happens when evaluating the last slot. The ImmutableBytesPointer that the schema maps the relevant section of the row key to is incomplete. Instead of "helo3\x00\x80\x00\x00\x03" as seen in the the rowkey, the ptr is only initialized to "helo3". This comes from the following code in schema.next():
{code}
Field field = this.getField(position);
if (field.getDataType().isFixedWidth()) {
    ptr.set(ptr.get(),ptr.getOffset(), field.getByteSize());
} else {
    if (position+1 == getFieldCount() ) { // Last field has no terminator
        ptr.set(ptr.get(), ptr.getOffset(), maxOffset - ptr.getOffset());
    } else {
        byte[] buf = ptr.get();
        int offset = ptr.getOffset();
        while (offset < maxOffset && buf[offset] != SEPARATOR_BYTE) {
            offset++;
        }
        ptr.set(buf, ptr.getOffset(), offset - ptr.getOffset());
    }
}
{code}
The row key schema has two fields left, one for the pk2 varchar, and one for the pk3 integer. Because of this, it only reads up to the separator byte after "helo3". 

There is a section of code intended to deal with this in the WhereOptimizer:
{code}
// We support (a,b) IN ((1,2),(3,4)), so in this case we switch to a flattened schema
if (fullyQualifiedColumnCount > 1 && slot.getPKSpan() == fullyQualifiedColumnCount && !EVERYTHING_RANGES.equals(slot.getKeyRanges())) {
    schema = nBuckets == null ? SchemaUtil.VAR_BINARY_SCHEMA : SaltingUtil.VAR_BINARY_SALTED_SCHEMA;
}
{code}
In the case of multi-tenant tables, however, the fullQualifiedColumnCount will be increased to account for the tenant id primary key, and the slot's pkSpan will be too small.

I'm going to investigate fixes for this further, but it seems like we'll need to introduce a new method of overriding the schema when RowValueConstructors are used with multi-tenant tables.

> IN List of RVCs doesn't return all the rows when executed against a tenant-specific view for a multi-tenant table that is salted.
> ---------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-1082
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1082
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Samarth Jain
>            Assignee: Samarth Jain
>
> {code}
> Table type - Multitenant and salted. Query - IN list of RVCs. Result - All rows not returned.
> Base table DDL - CREATE TABLE t (tenantId varchar(5) NOT NULL, pk2 varchar(5) NOT NULL, pk3 INTEGER NOT NULL, c1 INTEGER constraint pk primary key (tenantId,pk2,pk3)) MULTI_TENANT=true, SALT_BUCKETS=4
> Tenant View DDL - CREATE VIEW t_view (tenant_col VARCHAR) AS SELECT * FROM t
> Upserts:
> upsert into t_view (pk2, pk3, c1) values ('helo1', 1, 1)
> upsert into t_view (pk2, pk3, c1) values ('helo2', 2, 2)
> upsert into t_view (pk2, pk3, c1) values ('helo3', 3, 3)
> upsert into t_view (pk2, pk3, c1) values ('helo4', 4, 4)
> upsert into t_view (pk2, pk3, c1) values ('helo5', 5, 5)
> Query using tenant specific connection - select pk2, pk3 from t_view WHERE (pk2, pk3) IN ( ('helo3',  3),  ('helo5',  5) ) ORDER BY pk2
> Result - Only one row returned - helo3, 3 
> This has likely to do with salting because on removing SALT_BUCKETS=4 from the base table DDL all the expected rows are returned.
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)