You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Ashutosh Chauhan (JIRA)" <ji...@apache.org> on 2014/12/15 03:23:13 UTC

[jira] [Commented] (HIVE-7796) Provide subquery pushdown facility for storage handlers

    [ https://issues.apache.org/jira/browse/HIVE-7796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246244#comment-14246244 ] 

Ashutosh Chauhan commented on HIVE-7796:
----------------------------------------

Does this work as follows:
* Phoenix Jdbc handler implements {{HiveStorageSubQueryHandler}}
* Using source ast, TokenRewriteStream stream and QBParseInfo, it tries to recreate sql text.
* Phoenix jdbc handler than sends this query to phoenix which parses and plans this sql.
* Phoenix jdbc handler than constructs Hive's {{TableScanOperator}} which it returns via this interface.
* This TSOp is hooked into Hive pipeline.
* All the data from hbase flows through Phoenix client to Hive.

Am I somewhere even remotely close : ) here about design. It will help immensely to write up a design doc for this with what goal you are trying to achieve.

I am interested in this work, so want to understand more of this. If design inferred above is remotely close to what you have implemented, than one area of concern is last bullet. This design makes phoenix client a bottleneck. It will be much more scalable if we can suck in data directly from RegionServers instead of phoenix client.

> Provide subquery pushdown facility for storage handlers
> -------------------------------------------------------
>
>                 Key: HIVE-7796
>                 URL: https://issues.apache.org/jira/browse/HIVE-7796
>             Project: Hive
>          Issue Type: Improvement
>          Components: StorageHandler
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Minor
>         Attachments: HIVE-7796.1.patch.txt
>
>
> If underlying storage can handle basic filtering or aggregation, hive can delegate execution of whole subquery to the storage and handle it as a simple scanning operation.
> Experimentally implemented on JDBC / Phoenix handler and seemed working good. Hopefully open the code for those too, but it's not allowed to me yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)