You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Lars Francke (JIRA)" <ji...@apache.org> on 2014/09/12 14:00:35 UTC

[jira] [Commented] (HIVE-721) Integration with HadoopDB

    [ https://issues.apache.org/jira/browse/HIVE-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14131455#comment-14131455 ] 

Lars Francke commented on HIVE-721:
-----------------------------------

There's not much development on HadoopDB and there's Tez and Spark now. Do you plan to work on this? Otherwise I suggest closing it.

> Integration with HadoopDB
> -------------------------
>
>                 Key: HIVE-721
>                 URL: https://issues.apache.org/jira/browse/HIVE-721
>             Project: Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.4.0
>            Reporter: Azza Abouzeid
>            Priority: Minor
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The HadoopDB project integrates Hadoop with single node databases, which provide a high performance data layer for analytical queries over structured data. HadoopDB's SMS (SQL-to-MapReduce-to-SQL) component uses Hive's SemanticAnalyzer to convert SQL to MapReduce plans. After plan generation, we recreate SQL from the lower plan operators and push the SQL into database layer maintaining the upper layers of the plan, that can't be pushed into the single node databases, intact. For more information on this process, please read the HadoopDB paper (http://db.cs.yale.edu/hadoopdb/hadoopdb.pdf) and browse the source code if you feel like it (more specifically the SQLQueryGenerator class) at http://sourceforge.net/projects/hadoopdb/. 
> HadoopDB is a natural system level extension of Hive's goal of providing a simple SQL interface for large-scale data processing.
> A simple patch that integrates Hive with HadoopDB's SMS could be found here: http://hadoopdb.svn.sourceforge.net/viewvc/hadoopdb/trunk/Patches/hive-sms.patch?view=log
> In addition to the semantic analyzer post-processing, we modified certain areas to allow paths to be associated with databases to allow the recreation of the operator tree from the map.input.file configuration. Instead of FileInputSplit --- we set up an interface Pathable, to allow any inputsplit that implements pathable to return a dummy path equivalent to the map.input.file path.
> Instead of the post semantic analysis function call to the SQLQueryGenerator class, you could also use hooks. One such suggestion provided by a HadoopDB user is found here http://sourceforge.net/tracker/index.php?func=detail&aid=2829253&group_id=269559&atid=1146689.
> We would really appreciate your help in better integrating Hive and HadoopDB. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)