You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by "Sapan Shah (Created) (JIRA)" <ji...@apache.org> on 2012/02/07 18:16:59 UTC

[jira] [Created] (ACCUMULO-378) Multi data center replication

Multi data center replication
-----------------------------

                 Key: ACCUMULO-378
                 URL: https://issues.apache.org/jira/browse/ACCUMULO-378
             Project: Accumulo
          Issue Type: New Feature
            Reporter: Sapan Shah
            Assignee: Sapan Shah
            Priority: Minor


The use case here is where people have multiple data centers and need to replicate the data in between them.  Accumulo can model this replication after the way that HBase currently handles the replication as detailed here (http://hbase.apache.org/replication.html).  

There will be one master Cluster and multiple slave clusters.  Accumulo will use the Master-Push model to replicate the statements from the master clusters WAL to the various slaves WALs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ACCUMULO-378) Multi data center replication

Posted by "Sapan Shah (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202550#comment-13202550 ] 

Sapan Shah commented on ACCUMULO-378:
-------------------------------------

I have started some basic work on this, such as working on trying to get the WAL working on HDFS.
                
> Multi data center replication
> -----------------------------
>
>                 Key: ACCUMULO-378
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-378
>             Project: Accumulo
>          Issue Type: New Feature
>            Reporter: Sapan Shah
>            Assignee: Sapan Shah
>            Priority: Minor
>
> The use case here is where people have multiple data centers and need to replicate the data in between them.  Accumulo can model this replication after the way that HBase currently handles the replication as detailed here (http://hbase.apache.org/replication.html).  
> There will be one master Cluster and multiple slave clusters.  Accumulo will use the Master-Push model to replicate the statements from the master clusters WAL to the various slaves WALs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ACCUMULO-378) Multi data center replication

Posted by "Sapan Shah (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202760#comment-13202760 ] 

Sapan Shah commented on ACCUMULO-378:
-------------------------------------

John: I am currently adapting the WAL to append to a cloned copy in HDFS while still being performant.

Keith:

I think collaborating would be a great idea.  I'll work on getting design document together.  I will be at the meetup, so we can discuss there the various tasks to work on for this.  I see there being quite a bit.

For the questions you asked.
1) To begin with I was thinking about maybe doing just select tables so that you did not have complete replicas.  Then maybe working on a way to possibly do total replicas.
2) I am still working out a good way to have ZooKeeper send the updates for the user information.  I am not sure about the table metadata yet, as if all we are doing is calling the client API, I think that might be taken care of, shouldn't it?  As the slave table will maintain its own metadata.
3) What you described with cloning the table, copying the data, and replicating the logs was my current plan.
4) I have not looked into FATE that much, but will check it out.
5) I am not sure about replicating the splits unless the user defined the splits before hand.

Let me check into FATE, but from the skimming it seems really useful for this.
                
> Multi data center replication
> -----------------------------
>
>                 Key: ACCUMULO-378
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-378
>             Project: Accumulo
>          Issue Type: New Feature
>            Reporter: Sapan Shah
>            Assignee: Sapan Shah
>            Priority: Minor
>
> The use case here is where people have multiple data centers and need to replicate the data in between them.  Accumulo can model this replication after the way that HBase currently handles the replication as detailed here (http://hbase.apache.org/replication.html).  
> There will be one master Cluster and multiple slave clusters.  Accumulo will use the Master-Push model to replicate the statements from the master clusters WAL to the various slaves WALs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ACCUMULO-378) Multi data center replication

Posted by "Keith Turner (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202568#comment-13202568 ] 

Keith Turner commented on ACCUMULO-378:
---------------------------------------

This sounds really cool.  I looked at the HBase doc, it seems like it replays the walogs on the slave cluster through the client API.   

Where you thinking of doing this for all tables, or just select tables?  
What are your thoughts on replicating user and table metadata in zookeeper?  
What are your thoughts on enabling replication for existing data? (we clould clone the table, copy its existing data, and replicate new walogs created after the clone operation).
How are you thinking of handling bulk imported data? (could possible copy to slave and bulk import on their also, this could be a FATE operation initiated by the bulk import FATE operation).
What are your thoughts on replicating split and merge operations on the master cluster?

I am wondering how much we can leverage FATE to make this easier and more reliable.
                
> Multi data center replication
> -----------------------------
>
>                 Key: ACCUMULO-378
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-378
>             Project: Accumulo
>          Issue Type: New Feature
>            Reporter: Sapan Shah
>            Assignee: Sapan Shah
>            Priority: Minor
>
> The use case here is where people have multiple data centers and need to replicate the data in between them.  Accumulo can model this replication after the way that HBase currently handles the replication as detailed here (http://hbase.apache.org/replication.html).  
> There will be one master Cluster and multiple slave clusters.  Accumulo will use the Master-Push model to replicate the statements from the master clusters WAL to the various slaves WALs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ACCUMULO-378) Multi data center replication

Posted by "Keith Turner (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202799#comment-13202799 ] 

Keith Turner commented on ACCUMULO-378:
---------------------------------------

Replicating table configuration would be useful.  For example if a user enables an age off iterator on the master cluster for major compaction, it would be nice to have that run on the slave cluster and throw old data away.  Would want the same iterators configured for the master and slave table, compression, locality groups, etc.  Wonder if we could leverage ZOOKEEPER-892.
                
> Multi data center replication
> -----------------------------
>
>                 Key: ACCUMULO-378
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-378
>             Project: Accumulo
>          Issue Type: New Feature
>            Reporter: Sapan Shah
>            Assignee: Sapan Shah
>            Priority: Minor
>
> The use case here is where people have multiple data centers and need to replicate the data in between them.  Accumulo can model this replication after the way that HBase currently handles the replication as detailed here (http://hbase.apache.org/replication.html).  
> There will be one master Cluster and multiple slave clusters.  Accumulo will use the Master-Push model to replicate the statements from the master clusters WAL to the various slaves WALs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ACCUMULO-378) Multi data center replication

Posted by "Keith Turner (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212686#comment-13212686 ] 

Keith Turner commented on ACCUMULO-378:
---------------------------------------

Sapan and I were discussing this issue.  We were considering the use case were a user wants to filter some data in a table.  To do this they may add filter, force a compaction, and then remove the filter.  It would be nice to have this action replicate to the backup cluster.  This may be easier if the action were more atomic, see ACCUMULO-420.
                
> Multi data center replication
> -----------------------------
>
>                 Key: ACCUMULO-378
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-378
>             Project: Accumulo
>          Issue Type: New Feature
>            Reporter: Sapan Shah
>            Assignee: Sapan Shah
>            Priority: Minor
>
> The use case here is where people have multiple data centers and need to replicate the data in between them.  Accumulo can model this replication after the way that HBase currently handles the replication as detailed here (http://hbase.apache.org/replication.html).  
> There will be one master Cluster and multiple slave clusters.  Accumulo will use the Master-Push model to replicate the statements from the master clusters WAL to the various slaves WALs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ACCUMULO-378) Multi data center replication

Posted by "Keith Turner (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202803#comment-13202803 ] 

Keith Turner commented on ACCUMULO-378:
---------------------------------------

Replicating all of zookeeper would not work well, would not want to replicate info related to the root tablet location, tablet servers, loggers, and FATE operations from the master cluster.  ZOOKEEPER-892 mentions the ability to replicate a sub-tree.
                
> Multi data center replication
> -----------------------------
>
>                 Key: ACCUMULO-378
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-378
>             Project: Accumulo
>          Issue Type: New Feature
>            Reporter: Sapan Shah
>            Assignee: Sapan Shah
>            Priority: Minor
>
> The use case here is where people have multiple data centers and need to replicate the data in between them.  Accumulo can model this replication after the way that HBase currently handles the replication as detailed here (http://hbase.apache.org/replication.html).  
> There will be one master Cluster and multiple slave clusters.  Accumulo will use the Master-Push model to replicate the statements from the master clusters WAL to the various slaves WALs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ACCUMULO-378) Multi data center replication

Posted by "Keith Turner (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202577#comment-13202577 ] 

Keith Turner commented on ACCUMULO-378:
---------------------------------------

I would like to collaborate w/ you on this.  It seems like a starting point might be a design doc.  Would you mind putting together a design doc detailing your thoughts on this?  Any other suggestions on how we could collaborate?  We could also meet at the meetup (http://www.meetup.com/Accumulo-Users-DC/events/45491582/) if you are in this area.
                
> Multi data center replication
> -----------------------------
>
>                 Key: ACCUMULO-378
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-378
>             Project: Accumulo
>          Issue Type: New Feature
>            Reporter: Sapan Shah
>            Assignee: Sapan Shah
>            Priority: Minor
>
> The use case here is where people have multiple data centers and need to replicate the data in between them.  Accumulo can model this replication after the way that HBase currently handles the replication as detailed here (http://hbase.apache.org/replication.html).  
> There will be one master Cluster and multiple slave clusters.  Accumulo will use the Master-Push model to replicate the statements from the master clusters WAL to the various slaves WALs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ACCUMULO-378) Multi data center replication

Posted by "Keith Turner (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212691#comment-13212691 ] 

Keith Turner commented on ACCUMULO-378:
---------------------------------------

We were discussing generating secondary indexes.  This feature may be useful for that in addition to replicating to a remote cluster.  So instead of replicating data to a remote cluster, replicate to another table on the local cluster with a data transformation step.  For example, data is inserted in table A, then the mutations from table A get pushed to table B with a transformation step.  This could also push bulk imports to table B and through the transformation.
                
> Multi data center replication
> -----------------------------
>
>                 Key: ACCUMULO-378
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-378
>             Project: Accumulo
>          Issue Type: New Feature
>            Reporter: Sapan Shah
>            Assignee: Sapan Shah
>            Priority: Minor
>
> The use case here is where people have multiple data centers and need to replicate the data in between them.  Accumulo can model this replication after the way that HBase currently handles the replication as detailed here (http://hbase.apache.org/replication.html).  
> There will be one master Cluster and multiple slave clusters.  Accumulo will use the Master-Push model to replicate the statements from the master clusters WAL to the various slaves WALs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ACCUMULO-378) Multi data center replication

Posted by "John Vines (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202573#comment-13202573 ] 

John Vines commented on ACCUMULO-378:
-------------------------------------

I need a bit of clarification- are you adapting the WAL to log to HDFS via appends or are you working on a mechanism to shove the logs into HDFS once they are complete?
                
> Multi data center replication
> -----------------------------
>
>                 Key: ACCUMULO-378
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-378
>             Project: Accumulo
>          Issue Type: New Feature
>            Reporter: Sapan Shah
>            Assignee: Sapan Shah
>            Priority: Minor
>
> The use case here is where people have multiple data centers and need to replicate the data in between them.  Accumulo can model this replication after the way that HBase currently handles the replication as detailed here (http://hbase.apache.org/replication.html).  
> There will be one master Cluster and multiple slave clusters.  Accumulo will use the Master-Push model to replicate the statements from the master clusters WAL to the various slaves WALs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira