You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "eugen yushin (JIRA)" <ji...@apache.org> on 2016/05/11 13:05:12 UTC
[jira] [Updated] (DRILL-4666) Pushdown doesn't apply for HBase with substr(key) from UT

     [ https://issues.apache.org/jira/browse/DRILL-4666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

eugen yushin updated DRILL-4666:
--------------------------------
    Description: 
Following [example|https://github.com/apache/drill/blob/95623912ebf348962fe8a8846c5f47c5fdcf2f78/contrib/storage-hbase/src/test/java/org/apache/drill/hbase/TestHBaseFilterPushDown.java] section, running query from {{testFilterPushDownCompositeBigIntRowKey1()}} results in following execution plan:
{code}
EXPLAIN PLAN FOR
SELECT
     CONVERT_FROM(BYTE_SUBSTR(row_key, 1, 8), 'bigint_be') d
    ,CONVERT_FROM(BYTE_SUBSTR(row_key, 9, 8), 'bigint_be') id
    ,CONVERT_FROM(tableName.f.c, 'UTF8')
FROM hbase.`TestTableCompositeDate` tableName
WHERE
    CONVERT_FROM(BYTE_SUBSTR(row_key, 1, 8), 'bigint_be') = cast(1409040000000 as bigint)
;
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      Project(d=[CONVERT_FROMBIGINT_BE(BYTE_SUBSTR($0, 1, 8))], id=[CONVERT_FROMBIGINT_BE(BYTE_SUBSTR($0, 9, 8))], EXPR$2=[CONVERT_FROMUTF8(ITEM($1, 'c'))])
00-02        SelectionVectorRemover
00-03          Filter(condition=[=(CONVERT_FROM(BYTE_SUBSTR($0, 1, 8), 'bigint_be'), 1409040000000)])
00-04            Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec [tableName=TestTableCompositeDate, startRow=null, stopRow=null, filter=null], columns=[`*`]]])
{code}

From the above, Drill uses full scan and then filters out rows by key substring started from 1st position.

This query executes pretty fast in test dataset provided in repo, but performance dramatically decreases with real use cases.

I've used _contrib\storage-hbase\src\test\java\org\apache\drill\hbase\TestTableGenerator.java_ to populate test table.

Moreover, [TestHBaseFilterPushDown|https://github.com/apache/drill/blob/95623912ebf348962fe8a8846c5f47c5fdcf2f78/contrib/storage-hbase/src/test/java/org/apache/drill/hbase/TestHBaseFilterPushDown.java] uses [runHBaseSQLVerifyCount|https://github.com/apache/drill/blob/95623912ebf348962fe8a8846c5f47c5fdcf2f78/contrib/storage-hbase/src/test/java/org/apache/drill/hbase/BaseHBaseTest.java] to pass the tests. It checks result set count, and not execution plan.

  was:
Following [example|https://github.com/apache/drill/blob/95623912ebf348962fe8a8846c5f47c5fdcf2f78/contrib/storage-hbase/src/test/java/org/apache/drill/hbase/TestHBaseFilterPushDown.java] section, running query from {{testFilterPushDownCompositeBigIntRowKey1()}} results in following execution plan:
{code}
EXPLAIN PLAN FOR
SELECT
     CONVERT_FROM(BYTE_SUBSTR(row_key, 1, 8), 'bigint_be') d
    ,CONVERT_FROM(BYTE_SUBSTR(row_key, 9, 8), 'bigint_be') id
    ,CONVERT_FROM(tableName.f.c, 'UTF8')
FROM hbase.`TestTableCompositeDate` tableName
WHERE
    CONVERT_FROM(BYTE_SUBSTR(row_key, 1, 8), 'bigint_be') = cast(1409040000000 as bigint)
;
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      Project(d=[CONVERT_FROMBIGINT_BE(BYTE_SUBSTR($0, 1, 8))], id=[CONVERT_FROMBIGINT_BE(BYTE_SUBSTR($0, 9, 8))], EXPR$2=[CONVERT_FROMUTF8(ITEM($1, 'c'))])
00-02        SelectionVectorRemover
00-03          Filter(condition=[=(CONVERT_FROM(BYTE_SUBSTR($0, 1, 8), 'bigint_be'), 1409040000000)])
00-04            Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec [tableName=TestTableCompositeDate, startRow=null, stopRow=null, filter=null], columns=[`*`]]])
{code}

From the above, Drill uses full scan and then filters out rows by key substring started from 1st position.

This query executes pretty fast in test dataset provided in repo, but performance dramatically decreases with real use cases.

I've used _contrib\storage-hbase\src\test\java\org\apache\drill\hbase\TestTableGenerator.java_ to populate test table.

Moreover, [TestHBaseFilterPushDown|TestHBaseFilterPushDown.java] uses [runHBaseSQLVerifyCount|https://github.com/apache/drill/blob/95623912ebf348962fe8a8846c5f47c5fdcf2f78/contrib/storage-hbase/src/test/java/org/apache/drill/hbase/TestHBaseFilterPushDown.java] to pass the tests. It checks result set count, and not execution plan.


> Pushdown doesn't apply for HBase with substr(key) from UT
> ---------------------------------------------------------
>
>                 Key: DRILL-4666
>                 URL: https://issues.apache.org/jira/browse/DRILL-4666
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.6.0
>            Reporter: eugen yushin
>
> Following [example|https://github.com/apache/drill/blob/95623912ebf348962fe8a8846c5f47c5fdcf2f78/contrib/storage-hbase/src/test/java/org/apache/drill/hbase/TestHBaseFilterPushDown.java] section, running query from {{testFilterPushDownCompositeBigIntRowKey1()}} results in following execution plan:
> {code}
> EXPLAIN PLAN FOR
> SELECT
>      CONVERT_FROM(BYTE_SUBSTR(row_key, 1, 8), 'bigint_be') d
>     ,CONVERT_FROM(BYTE_SUBSTR(row_key, 9, 8), 'bigint_be') id
>     ,CONVERT_FROM(tableName.f.c, 'UTF8')
> FROM hbase.`TestTableCompositeDate` tableName
> WHERE
>     CONVERT_FROM(BYTE_SUBSTR(row_key, 1, 8), 'bigint_be') = cast(1409040000000 as bigint)
> ;
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(d=[CONVERT_FROMBIGINT_BE(BYTE_SUBSTR($0, 1, 8))], id=[CONVERT_FROMBIGINT_BE(BYTE_SUBSTR($0, 9, 8))], EXPR$2=[CONVERT_FROMUTF8(ITEM($1, 'c'))])
> 00-02        SelectionVectorRemover
> 00-03          Filter(condition=[=(CONVERT_FROM(BYTE_SUBSTR($0, 1, 8), 'bigint_be'), 1409040000000)])
> 00-04            Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec [tableName=TestTableCompositeDate, startRow=null, stopRow=null, filter=null], columns=[`*`]]])
> {code}
> From the above, Drill uses full scan and then filters out rows by key substring started from 1st position.
> This query executes pretty fast in test dataset provided in repo, but performance dramatically decreases with real use cases.
> I've used _contrib\storage-hbase\src\test\java\org\apache\drill\hbase\TestTableGenerator.java_ to populate test table.
> Moreover, [TestHBaseFilterPushDown|https://github.com/apache/drill/blob/95623912ebf348962fe8a8846c5f47c5fdcf2f78/contrib/storage-hbase/src/test/java/org/apache/drill/hbase/TestHBaseFilterPushDown.java] uses [runHBaseSQLVerifyCount|https://github.com/apache/drill/blob/95623912ebf348962fe8a8846c5f47c5fdcf2f78/contrib/storage-hbase/src/test/java/org/apache/drill/hbase/BaseHBaseTest.java] to pass the tests. It checks result set count, and not execution plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)