You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Namit Jain (JIRA)" <ji...@apache.org> on 2010/11/30 07:59:11 UTC

[jira] Created: (HIVE-1813) Hive should be able to run on multiple data centers

Hive should be able to run on multiple data centers
---------------------------------------------------

                 Key: HIVE-1813
                 URL: https://issues.apache.org/jira/browse/HIVE-1813
             Project: Hive
          Issue Type: New Feature
            Reporter: Namit Jain
             Fix For: 0.7.0


Currently, hive assumes a single metastore and the HADOOP_HOME is passed as a environment variable. 

It would be desirable to support hive on top of multiple data centers (dfs + mr).

For eg. there could be 2 metastores: primary and secondary. They would have different dfs's , and there will be a
dfs->mr mapping maintained by the metastore.

Hive would be enhanced to support multiple metastores and all operations (ddl + query) would span multiple metastores.

Different consistency pluggable policies can be employed - for eg. if a table/partition can be present in both the metastores with different
last modification times, either the last one can be used or an error can be thrown.

It will be upto the application (outside hive) to copy the data from one metastore to another, and to maintain consistency inside.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1813) Hive should be able to run on multiple data centers

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965126#action_12965126 ] 

Namit Jain commented on HIVE-1813:
----------------------------------

The data can be copied from one dfs to another using distcp - later on a wrapper can be developed in hive for the same.
Something like:

alter table <T> partition <P> copy <src> to <dst>;
alter table <T> partition <P> move <src> to <dst>;

> Hive should be able to run on multiple data centers
> ---------------------------------------------------
>
>                 Key: HIVE-1813
>                 URL: https://issues.apache.org/jira/browse/HIVE-1813
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Namit Jain
>             Fix For: 0.7.0
>
>
> Currently, hive assumes a single metastore and the HADOOP_HOME is passed as a environment variable. 
> It would be desirable to support hive on top of multiple data centers (dfs + mr).
> For eg. there could be 2 metastores: primary and secondary. They would have different dfs's , and there will be a
> dfs->mr mapping maintained by the metastore.
> Hive would be enhanced to support multiple metastores and all operations (ddl + query) would span multiple metastores.
> Different consistency pluggable policies can be employed - for eg. if a table/partition can be present in both the metastores with different
> last modification times, either the last one can be used or an error can be thrown.
> It will be upto the application (outside hive) to copy the data from one metastore to another, and to maintain consistency inside.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.