You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Sergey Shelukhin (JIRA)" <ji...@apache.org> on 2016/11/15 19:24:59 UTC

[jira] [Commented] (HIVE-14841) Replication - Phase 2

    [ https://issues.apache.org/jira/browse/HIVE-14841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15668011#comment-15668011 ] 

Sergey Shelukhin commented on HIVE-14841:
-----------------------------------------

Is it possible to do work in the branch? This causes immense conflicts with hive-14535 branch, and I see tons of comments that purport with FIXMEs and stuff to move code around and refactor this and that.
I think this should be done on the branch and merged once when ready, so that conflicts with parallel changes to the code affected by the moves are minimized.

> Replication - Phase 2
> ---------------------
>
>                 Key: HIVE-14841
>                 URL: https://issues.apache.org/jira/browse/HIVE-14841
>             Project: Hive
>          Issue Type: New Feature
>          Components: repl
>    Affects Versions: 2.1.0
>            Reporter: Sushanth Sowmyan
>            Assignee: Sushanth Sowmyan
>
> Per email sent out to the dev list, the current implementation of replication in hive has certain drawbacks, for instance :
> * Replication follows a rubberbanding pattern, wherein different tables/ptns can be in a different/mixed state on the destination, so that unless all events are caught up on, we do not have an equivalent warehouse. Thus, this only satisfies DR cases, not load balancing usecases, and the secondary warehouse is really only seen as a backup, rather than as a live warehouse that trails the primary.
> * The base implementation is a naive implementation, and has several performance problems, including a large amount of duplication of data for subsequent events, as mentioned in HIVE-13348, having to copy out entire partitions/tables when just a delta of files might be sufficient/etc. Also, using EXPORT/IMPORT allows us a simple implementation, but at the cost of tons of temporary space, much of which is not actually applied at the destination.
> Thus, to track this, we now create a new branch (repl2) and a uber-jira(this one) to track experimental development towards improvement of this situation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)