You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kudu.apache.org by Clifford Resnick <cr...@mediamath.com> on 2017/07/21 15:43:35 UTC

KuduScanToken pushdown question

I’m playing with Drill’s kudu-storage code and converting it to use the ScanToken api. It’s a fairly simple matter to do this by serializing ScanTokens (thanks of the great api!), but I’m not sure how much to “push down” to Kudu. For Tablet/Drillbit affinity and for pruning it’s clear I need to push hash key and range bounds predicates, but it can get a bit tricky given that the ScanToken api does not support OR logic unless it fits into an IN_LIST expression, so for more complex logic I suppose it’s best to not push down the filtering.  However, I’m wondering if there is a way to push disjoint bounds to Kudu. For example, if I have a table with range keys of Year,Month,Day, it there a way I can include only (2017,6,1) OR (2017,5,5) in a group scan?

Thanks,
Cliff

Re: KuduScanToken pushdown question

Posted by Dan Burkert <da...@apache.org>.
Hi Clifford,

Currently there isn't a way to do that.  If you are 100% sure the PK ranges
don't overlap, you might consider creating multiple sets of scan tokens,
each with a unique range (through separate ScanTokenBuilder instances).
This is more or less what Kudu would do behind the scenes to support
disjoint ranges, anyway.

- Dan

On Fri, Jul 21, 2017 at 8:43 AM, Clifford Resnick <cr...@mediamath.com>
wrote:

> I’m playing with Drill’s kudu-storage code and converting it to use the
> ScanToken api. It’s a fairly simple matter to do this by serializing
> ScanTokens (thanks of the great api!), but I’m not sure how much to “push
> down” to Kudu. For Tablet/Drillbit affinity and for pruning it’s clear I
> need to push hash key and range bounds predicates, but it can get a bit
> tricky given that the ScanToken api does not support OR logic unless it
> fits into an IN_LIST expression, so for more complex logic I suppose it’s
> best to not push down the filtering.  However, I’m wondering if there is a
> way to push disjoint bounds to Kudu. For example, if I have a table with
> range keys of Year,Month,Day, it there a way I can include only (2017,6,1)
> OR (2017,5,5) in a group scan?
>
> Thanks,
> Cliff
>