You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Azza Abouzeid (JIRA)" <ji...@apache.org> on 2009/08/04 17:44:14 UTC

[jira] Created: (HIVE-721) Integration with HadoopDB

Integration with HadoopDB
-------------------------

                 Key: HIVE-721
                 URL: https://issues.apache.org/jira/browse/HIVE-721
             Project: Hadoop Hive
          Issue Type: New Feature
          Components: Query Processor
    Affects Versions: 0.4.0
            Reporter: Azza Abouzeid
            Priority: Minor
             Fix For: 0.4.0


The HadoopDB project integrates Hadoop with single node databases, which provide a high performance data layer for analytical queries over structured data. HadoopDB's SMS (SQL-to-MapReduce-to-SQL) component uses Hive's SemanticAnalyzer to convert SQL to MapReduce plans. After plan generation, we recreate SQL from the lower plan operators and push the SQL into database layer maintaining the upper layers of the plan, that can't be pushed into the single node databases, intact. For more information on this process, please read the HadoopDB paper (http://db.cs.yale.edu/hadoopdb/hadoopdb.pdf) and browse the source code if you feel like it (more specifically the SQLQueryGenerator class) at http://sourceforge.net/projects/hadoopdb/. 

HadoopDB is a natural system level extension of Hive's goal of providing a simple SQL interface for large-scale data processing.

A simple patch that integrates Hive with HadoopDB's SMS could be found here: http://hadoopdb.svn.sourceforge.net/viewvc/hadoopdb/trunk/Patches/hive-sms.patch?view=log

In addition to the semantic analyzer post-processing, we modified certain areas to allow paths to be associated with databases to allow the recreation of the operator tree from the map.input.file configuration. Instead of FileInputSplit --- we set up an interface Pathable, to allow any inputsplit that implements pathable to return a dummy path equivalent to the map.input.file path.

Instead of the post semantic analysis function call to the SQLQueryGenerator class, you could also use hooks. One such suggestion provided by a HadoopDB user is found here http://sourceforge.net/tracker/index.php?func=detail&aid=2829253&group_id=269559&atid=1146689.

We would really appreciate your help in better integrating Hive and HadoopDB. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-721) Integration with HadoopDB

Posted by "Azza Abouzeid (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740139#action_12740139 ] 

Azza Abouzeid commented on HIVE-721:
------------------------------------

BTW: we checked out code from the trunk around July 10th 2009.

> Integration with HadoopDB
> -------------------------
>
>                 Key: HIVE-721
>                 URL: https://issues.apache.org/jira/browse/HIVE-721
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.4.0
>            Reporter: Azza Abouzeid
>            Priority: Minor
>             Fix For: 0.4.0
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The HadoopDB project integrates Hadoop with single node databases, which provide a high performance data layer for analytical queries over structured data. HadoopDB's SMS (SQL-to-MapReduce-to-SQL) component uses Hive's SemanticAnalyzer to convert SQL to MapReduce plans. After plan generation, we recreate SQL from the lower plan operators and push the SQL into database layer maintaining the upper layers of the plan, that can't be pushed into the single node databases, intact. For more information on this process, please read the HadoopDB paper (http://db.cs.yale.edu/hadoopdb/hadoopdb.pdf) and browse the source code if you feel like it (more specifically the SQLQueryGenerator class) at http://sourceforge.net/projects/hadoopdb/. 
> HadoopDB is a natural system level extension of Hive's goal of providing a simple SQL interface for large-scale data processing.
> A simple patch that integrates Hive with HadoopDB's SMS could be found here: http://hadoopdb.svn.sourceforge.net/viewvc/hadoopdb/trunk/Patches/hive-sms.patch?view=log
> In addition to the semantic analyzer post-processing, we modified certain areas to allow paths to be associated with databases to allow the recreation of the operator tree from the map.input.file configuration. Instead of FileInputSplit --- we set up an interface Pathable, to allow any inputsplit that implements pathable to return a dummy path equivalent to the map.input.file path.
> Instead of the post semantic analysis function call to the SQLQueryGenerator class, you could also use hooks. One such suggestion provided by a HadoopDB user is found here http://sourceforge.net/tracker/index.php?func=detail&aid=2829253&group_id=269559&atid=1146689.
> We would really appreciate your help in better integrating Hive and HadoopDB. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-721) Integration with HadoopDB

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-721:
----------------------------

    Fix Version/s:     (was: 0.5.0)

> Integration with HadoopDB
> -------------------------
>
>                 Key: HIVE-721
>                 URL: https://issues.apache.org/jira/browse/HIVE-721
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.4.0
>            Reporter: Azza Abouzeid
>            Priority: Minor
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The HadoopDB project integrates Hadoop with single node databases, which provide a high performance data layer for analytical queries over structured data. HadoopDB's SMS (SQL-to-MapReduce-to-SQL) component uses Hive's SemanticAnalyzer to convert SQL to MapReduce plans. After plan generation, we recreate SQL from the lower plan operators and push the SQL into database layer maintaining the upper layers of the plan, that can't be pushed into the single node databases, intact. For more information on this process, please read the HadoopDB paper (http://db.cs.yale.edu/hadoopdb/hadoopdb.pdf) and browse the source code if you feel like it (more specifically the SQLQueryGenerator class) at http://sourceforge.net/projects/hadoopdb/. 
> HadoopDB is a natural system level extension of Hive's goal of providing a simple SQL interface for large-scale data processing.
> A simple patch that integrates Hive with HadoopDB's SMS could be found here: http://hadoopdb.svn.sourceforge.net/viewvc/hadoopdb/trunk/Patches/hive-sms.patch?view=log
> In addition to the semantic analyzer post-processing, we modified certain areas to allow paths to be associated with databases to allow the recreation of the operator tree from the map.input.file configuration. Instead of FileInputSplit --- we set up an interface Pathable, to allow any inputsplit that implements pathable to return a dummy path equivalent to the map.input.file path.
> Instead of the post semantic analysis function call to the SQLQueryGenerator class, you could also use hooks. One such suggestion provided by a HadoopDB user is found here http://sourceforge.net/tracker/index.php?func=detail&aid=2829253&group_id=269559&atid=1146689.
> We would really appreciate your help in better integrating Hive and HadoopDB. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.