You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Andrey Gusev <an...@siftscience.com> on 2016/05/31 22:54:14 UTC

query pushdown into HBase subscan

Hello Drill,

We're noticing somewhat of an odd behavior with the following query against
HBase table.

They key of the table is roughly speaking
*8byteHash(string1)8byteHash(string2)*


SELECT CONVERT_FROM(BYTE_SUBSTR(row_key, 1, 8), 'BIGINT') p1_long, ...
from {table}
WHERE CONVERT_FROM(BYTE_SUBSTR(row_key, 1, 8), 'BIGINT_BE') =
hash_to_long('key_part1') limit 10

The query does seem to work correctly in terms of result set but times out
on larger tables. The hash_to_long is udf that I wrote that converts a
string to long such that the above equality can be satisfied.

It appears that it doesn't push down this into subscan (i.e. prefix HBase
scan) - while the operator profile shows HBASE_SUB_SCAN:

[image: Inline image 1]

The physical plan start with unconstrained full table scan:

Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec
[tableName={table}, startRow=null, stopRow=null, filter=null],


How can we force the where clause to be reflected into scan bounds?

We're running latest Drill 1.6.

Andrey

Re: query pushdown into HBase subscan

Posted by Jason Altekruse <ja...@dremio.com>.
The constant folding feature is turned on by default (and can be disabled
with planner.enable_constant_folding).

It should be able to work with UDFs, as it has access to all of the same
function definitions as our standard resolution/evaluation during full
execution.

In the plan that includes the full scan, in the filter above the scan does
your expression appear as written (i.e convert_from(...) =
hash_to_long('key_part1')), or has the right hand side been reduced to a
constant value?

The next thing that would probably be good to debug would be pre-computing
the right hand side and seeing if that gets pushed down.




Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer

On Tue, May 31, 2016 at 5:04 PM, Aditya <ad...@gmail.com> wrote:

> Hi Andrey,
>
> Drill currently does require a constant value on the right hand side of a
> comparison operator to pushdown the filter.
>
> I believe that Jason had worked on constant folding feature which would
> evaluate a constant expression during planning phase and rewrite the plan
> to replace the expression with the corresponding constant value.
>
> Not sure if that works with UDFs as well.
>
> Jason?
>
> On Tue, May 31, 2016 at 3:54 PM, Andrey Gusev <an...@siftscience.com>
> wrote:
>
> > Hello Drill,
> >
> > We're noticing somewhat of an odd behavior with the following query
> > against HBase table.
> >
> > They key of the table is roughly speaking
> > *8byteHash(string1)8byteHash(string2)*
> >
> >
> > SELECT CONVERT_FROM(BYTE_SUBSTR(row_key, 1, 8), 'BIGINT') p1_long, ...
> from {table}
> > WHERE CONVERT_FROM(BYTE_SUBSTR(row_key, 1, 8), 'BIGINT_BE') =
> hash_to_long('key_part1') limit 10
> >
> > The query does seem to work correctly in terms of result set but times
> out
> > on larger tables. The hash_to_long is udf that I wrote that converts a
> > string to long such that the above equality can be satisfied.
> >
> > It appears that it doesn't push down this into subscan (i.e. prefix HBase
> > scan) - while the operator profile shows HBASE_SUB_SCAN:
> >
> > [image: Inline image 1]
> >
> > The physical plan start with unconstrained full table scan:
> >
> > Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec
> [tableName={table}, startRow=null, stopRow=null, filter=null],
> >
> >
> > How can we force the where clause to be reflected into scan bounds?
> >
> > We're running latest Drill 1.6.
> >
> > Andrey
> >
>

Re: query pushdown into HBase subscan

Posted by Aditya <ad...@gmail.com>.
Hi Andrey,

Drill currently does require a constant value on the right hand side of a
comparison operator to pushdown the filter.

I believe that Jason had worked on constant folding feature which would
evaluate a constant expression during planning phase and rewrite the plan
to replace the expression with the corresponding constant value.

Not sure if that works with UDFs as well.

Jason?

On Tue, May 31, 2016 at 3:54 PM, Andrey Gusev <an...@siftscience.com>
wrote:

> Hello Drill,
>
> We're noticing somewhat of an odd behavior with the following query
> against HBase table.
>
> They key of the table is roughly speaking
> *8byteHash(string1)8byteHash(string2)*
>
>
> SELECT CONVERT_FROM(BYTE_SUBSTR(row_key, 1, 8), 'BIGINT') p1_long, ... from {table}
> WHERE CONVERT_FROM(BYTE_SUBSTR(row_key, 1, 8), 'BIGINT_BE') = hash_to_long('key_part1') limit 10
>
> The query does seem to work correctly in terms of result set but times out
> on larger tables. The hash_to_long is udf that I wrote that converts a
> string to long such that the above equality can be satisfied.
>
> It appears that it doesn't push down this into subscan (i.e. prefix HBase
> scan) - while the operator profile shows HBASE_SUB_SCAN:
>
> [image: Inline image 1]
>
> The physical plan start with unconstrained full table scan:
>
> Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec [tableName={table}, startRow=null, stopRow=null, filter=null],
>
>
> How can we force the where clause to be reflected into scan bounds?
>
> We're running latest Drill 1.6.
>
> Andrey
>