You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@accumulo.apache.org by Josh Elser <jo...@gmail.com> on 2014/04/01 03:58:20 UTC

Review Request 19862: Design document for review on cross-cluster replication

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19862/
-----------------------------------------------------------

Review request for accumulo.


Bugs: ACCUMULO-378
    https://issues.apache.org/jira/browse/ACCUMULO-378


Repository: accumulo


Description
-------

Re-posting a version of the design doc that I own. Contains grammatical fixes from round one, with a few extra clarifications. New content should be posted here, but I'll maintain the old review as discussion progresses.


Diffs
-----

  docs/src/main/resources/design/ACCUMULO-378-design.mdtext PRE-CREATION 

Diff: https://reviews.apache.org/r/19862/diff/


Testing
-------


Thanks,

Josh Elser

Re: Review Request 19862: Design document for review on cross-cluster replication

Posted by Josh Elser <jo...@gmail.com>.

The formatting really didn't come across well at all. It may be easier 
to read it on the google doc (I tried to put effort into the new section 
I added to be clear).

https://docs.google.com/document/d/1MHwINIVV2kT5x54zrd3jBjbLbA0h-pLfPSdSsxehyZQ/

RB is nice for have localized discussion, but it sure does suck to try 
to make formatted and easy-to-read text.

On 4/3/14, 5:16 PM, Josh Elser wrote:
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19862/
>
>
> Review request for accumulo.
> By Josh Elser.
>
> /Updated April 3, 2014, 9:16 p.m./
>
>
>   Changes
>
> Added a large section about implementation of "bookkeeping" on the "master" cluster. Fixed line-wraps, and a couple of other grammatical nits from other reviews.
>
> *Bugs: * ACCUMULO-378 <https://issues.apache.org/jira/browse/ACCUMULO-378>
> *Repository: * accumulo
>
>
>   Description
>
> Re-posting a version of the design doc that I own. Contains grammatical fixes from round one, with a few extra clarifications. New content should be posted here, but I'll maintain the old review as discussion progresses.
>
>
>   Diffs (updated)
>
>   * docs/src/main/resources/design/ACCUMULO-378-design.mdtext (PRE-CREATION)
>
> View Diff <https://reviews.apache.org/r/19862/diff/>
>

Re: Review Request 19862: Design document for review on cross-cluster replication

Posted by Josh Elser <jo...@gmail.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19862/
-----------------------------------------------------------

(Updated April 3, 2014, 9:16 p.m.)


Review request for accumulo.


Changes
-------

Added a large section about implementation of "bookkeeping" on the "master" cluster. Fixed line-wraps, and a couple of other grammatical nits from other reviews.


Bugs: ACCUMULO-378
    https://issues.apache.org/jira/browse/ACCUMULO-378


Repository: accumulo


Description
-------

Re-posting a version of the design doc that I own. Contains grammatical fixes from round one, with a few extra clarifications. New content should be posted here, but I'll maintain the old review as discussion progresses.


Diffs (updated)
-----

  docs/src/main/resources/design/ACCUMULO-378-design.mdtext PRE-CREATION 

Diff: https://reviews.apache.org/r/19862/diff/


Testing
-------


Thanks,

Josh Elser

Re: Review Request 19862: Design document for review on cross-cluster replication

Posted by Mike Drob <md...@mdrob.com>.


> On April 2, 2014, 5:44 p.m., Josh Elser wrote:
> > docs/src/main/resources/design/ACCUMULO-378-design.mdtext, line 62
> > <https://reviews.apache.org/r/19862/diff/1/?file=543190#file543190line62>
> >
> >     Need to consider what kind of authentication/authorization is done before the slave will accept data from a "master". 
> >     
> >     The master needs to know the slave's secret?

If we are not encrypting this communication, as stated in the "non-goals" section, then I am very uncomfortable with sending a cluster secret over the wire.


- Mike


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19862/#review39304
-----------------------------------------------------------


On April 1, 2014, 1:58 a.m., Josh Elser wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19862/
> -----------------------------------------------------------
> 
> (Updated April 1, 2014, 1:58 a.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-378
>     https://issues.apache.org/jira/browse/ACCUMULO-378
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Re-posting a version of the design doc that I own. Contains grammatical fixes from round one, with a few extra clarifications. New content should be posted here, but I'll maintain the old review as discussion progresses.
> 
> 
> Diffs
> -----
> 
>   docs/src/main/resources/design/ACCUMULO-378-design.mdtext PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/19862/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Josh Elser
> 
>

Re: Review Request 19862: Design document for review on cross-cluster replication

Posted by Josh Elser <jo...@gmail.com>.


> On April 2, 2014, 5:44 p.m., Josh Elser wrote:
> > docs/src/main/resources/design/ACCUMULO-378-design.mdtext, line 62
> > <https://reviews.apache.org/r/19862/diff/1/?file=543190#file543190line62>
> >
> >     Need to consider what kind of authentication/authorization is done before the slave will accept data from a "master". 
> >     
> >     The master needs to know the slave's secret?
> 
> Mike Drob wrote:
>     If we are not encrypting this communication, as stated in the "non-goals" section, then I am very uncomfortable with sending a cluster secret over the wire.

Well, it wouldn't have to be the instance.secret. It could be some extra configuration piece. Although, that's really only adding obscurity, so is there really any (meaningful) added benefit of accepting connections from anyone?


- Josh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19862/#review39304
-----------------------------------------------------------


On April 1, 2014, 1:58 a.m., Josh Elser wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19862/
> -----------------------------------------------------------
> 
> (Updated April 1, 2014, 1:58 a.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-378
>     https://issues.apache.org/jira/browse/ACCUMULO-378
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Re-posting a version of the design doc that I own. Contains grammatical fixes from round one, with a few extra clarifications. New content should be posted here, but I'll maintain the old review as discussion progresses.
> 
> 
> Diffs
> -----
> 
>   docs/src/main/resources/design/ACCUMULO-378-design.mdtext PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/19862/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Josh Elser
> 
>

Re: Review Request 19862: Design document for review on cross-cluster replication

Posted by Josh Elser <jo...@gmail.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19862/#review39304
-----------------------------------------------------------



docs/src/main/resources/design/ACCUMULO-378-design.mdtext
<https://reviews.apache.org/r/19862/#comment71651>

    Need to consider what kind of authentication/authorization is done before the slave will accept data from a "master". 
    
    The master needs to know the slave's secret?


- Josh Elser


On April 1, 2014, 1:58 a.m., Josh Elser wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19862/
> -----------------------------------------------------------
> 
> (Updated April 1, 2014, 1:58 a.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-378
>     https://issues.apache.org/jira/browse/ACCUMULO-378
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Re-posting a version of the design doc that I own. Contains grammatical fixes from round one, with a few extra clarifications. New content should be posted here, but I'll maintain the old review as discussion progresses.
> 
> 
> Diffs
> -----
> 
>   docs/src/main/resources/design/ACCUMULO-378-design.mdtext PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/19862/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Josh Elser
> 
>

Re: Review Request 19862: Design document for review on cross-cluster replication

Posted by Mike Drob <md...@mdrob.com>.


> On April 2, 2014, 3:36 p.m., Mike Drob wrote:
> >
> 
> Mike Drob wrote:
>     Huh, RB proxy error ate my comment.
>     
>     I was speaking to some of the HBase team about this yesterday, and they mentioned that they do not support replicated bulk import. Their recommended solution is just to externally copy files and run bulk import on the slave. Since this is something that is possible for users to configure themselves, I'd like to make sure we focus on the difficult case of like ingest.
>     
>     Is the assumption that replication is an all-or-nothing deal? Either you replicate all of the tables on a system, or you replicate none of them, but just a defined set is not allowed? I believe the WAL groups mutations by table IDs, so care would need to be taken to make sure those do not get out of sync.
>     
>     What happens when I clone a table, for example when running an offline MR job. does the clone need to be replicated? I assume no. If the slave is a read-only implementation, can I make clones there to run MR? Maybe another thing that will come out of this is 'transient clones' that have IDs in a reserved high range that can be reused after they are deleted.
>     
>
> 
> Josh Elser wrote:
>     I believe I already said elsewhere that replication is on a per-table basis. Replication for tables would (likely) have to be turned on, at which point the offline-MR case isn't a worry.
> 
> kturner wrote:
>     Why not support replicating bulk imports?  Seems like it makes things easier on users.
> 
> Mike Drob wrote:
>     Then the ID mapping is a worry.
> 
> Josh Elser wrote:
>     When configuring the replication, we would just track the source tableID and the destination cluster and the destination tableID. Am I missing something?

If we're shipping WALs around, then the slave has to know the mapping from source table ID to destination table ID. Then you need to have an extra code path that checks for a mapping before performing "recovery."

If we have cyclic replication, then you have to know which WAL you are shipping, because that could imply a different mapping. Master table x maps to slave table y maps to other slave table z. If we have master-master, then both sides need to know the mapping, so I guess the table needs to exist on both clusters before replication can be configured (so that we have a table ID to use in the configuration).

Also, if we're shipping WALs around, then it is possible that you have 99 mutations for a table that isn't replicated and 1 mutation that is replpicated. Sending offsets and chunks can help minimize the bandwidth, but...


- Mike


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19862/#review39264
-----------------------------------------------------------


On April 1, 2014, 1:58 a.m., Josh Elser wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19862/
> -----------------------------------------------------------
> 
> (Updated April 1, 2014, 1:58 a.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-378
>     https://issues.apache.org/jira/browse/ACCUMULO-378
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Re-posting a version of the design doc that I own. Contains grammatical fixes from round one, with a few extra clarifications. New content should be posted here, but I'll maintain the old review as discussion progresses.
> 
> 
> Diffs
> -----
> 
>   docs/src/main/resources/design/ACCUMULO-378-design.mdtext PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/19862/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Josh Elser
> 
>

Re: Review Request 19862: Design document for review on cross-cluster replication

Posted by Josh Elser <jo...@gmail.com>.


> On April 2, 2014, 3:36 p.m., Mike Drob wrote:
> >
> 
> Mike Drob wrote:
>     Huh, RB proxy error ate my comment.
>     
>     I was speaking to some of the HBase team about this yesterday, and they mentioned that they do not support replicated bulk import. Their recommended solution is just to externally copy files and run bulk import on the slave. Since this is something that is possible for users to configure themselves, I'd like to make sure we focus on the difficult case of like ingest.
>     
>     Is the assumption that replication is an all-or-nothing deal? Either you replicate all of the tables on a system, or you replicate none of them, but just a defined set is not allowed? I believe the WAL groups mutations by table IDs, so care would need to be taken to make sure those do not get out of sync.
>     
>     What happens when I clone a table, for example when running an offline MR job. does the clone need to be replicated? I assume no. If the slave is a read-only implementation, can I make clones there to run MR? Maybe another thing that will come out of this is 'transient clones' that have IDs in a reserved high range that can be reused after they are deleted.
>     
>

I believe I already said elsewhere that replication is on a per-table basis. Replication for tables would (likely) have to be turned on, at which point the offline-MR case isn't a worry.


- Josh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19862/#review39264
-----------------------------------------------------------


On April 1, 2014, 1:58 a.m., Josh Elser wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19862/
> -----------------------------------------------------------
> 
> (Updated April 1, 2014, 1:58 a.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-378
>     https://issues.apache.org/jira/browse/ACCUMULO-378
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Re-posting a version of the design doc that I own. Contains grammatical fixes from round one, with a few extra clarifications. New content should be posted here, but I'll maintain the old review as discussion progresses.
> 
> 
> Diffs
> -----
> 
>   docs/src/main/resources/design/ACCUMULO-378-design.mdtext PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/19862/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Josh Elser
> 
>

Re: Review Request 19862: Design document for review on cross-cluster replication

Posted by Mike Drob <md...@mdrob.com>.

> On April 2, 2014, 3:36 p.m., Mike Drob wrote:
> >

Huh, RB proxy error ate my comment.

I was speaking to some of the HBase team about this yesterday, and they mentioned that they do not support replicated bulk import. Their recommended solution is just to externally copy files and run bulk import on the slave. Since this is something that is possible for users to configure themselves, I'd like to make sure we focus on the difficult case of like ingest.

Is the assumption that replication is an all-or-nothing deal? Either you replicate all of the tables on a system, or you replicate none of them, but just a defined set is not allowed? I believe the WAL groups mutations by table IDs, so care would need to be taken to make sure those do not get out of sync.

What happens when I clone a table, for example when running an offline MR job. does the clone need to be replicated? I assume no. If the slave is a read-only implementation, can I make clones there to run MR? Maybe another thing that will come out of this is 'transient clones' that have IDs in a reserved high range that can be reused after they are deleted.

- Mike

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19862/#review39264
-----------------------------------------------------------

On April 1, 2014, 1:58 a.m., Josh Elser wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19862/
> -----------------------------------------------------------
> 
> (Updated April 1, 2014, 1:58 a.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-378
>     https://issues.apache.org/jira/browse/ACCUMULO-378
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Re-posting a version of the design doc that I own. Contains grammatical fixes from round one, with a few extra clarifications. New content should be posted here, but I'll maintain the old review as discussion progresses.
> 
> 
> Diffs
> -----
> 
>   docs/src/main/resources/design/ACCUMULO-378-design.mdtext PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/19862/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Josh Elser
> 
>

Re: Review Request 19862: Design document for review on cross-cluster replication

Posted by Mike Drob <md...@mdrob.com>.


> On April 2, 2014, 3:36 p.m., Mike Drob wrote:
> >
> 
> Mike Drob wrote:
>     Huh, RB proxy error ate my comment.
>     
>     I was speaking to some of the HBase team about this yesterday, and they mentioned that they do not support replicated bulk import. Their recommended solution is just to externally copy files and run bulk import on the slave. Since this is something that is possible for users to configure themselves, I'd like to make sure we focus on the difficult case of like ingest.
>     
>     Is the assumption that replication is an all-or-nothing deal? Either you replicate all of the tables on a system, or you replicate none of them, but just a defined set is not allowed? I believe the WAL groups mutations by table IDs, so care would need to be taken to make sure those do not get out of sync.
>     
>     What happens when I clone a table, for example when running an offline MR job. does the clone need to be replicated? I assume no. If the slave is a read-only implementation, can I make clones there to run MR? Maybe another thing that will come out of this is 'transient clones' that have IDs in a reserved high range that can be reused after they are deleted.
>     
>
> 
> Josh Elser wrote:
>     I believe I already said elsewhere that replication is on a per-table basis. Replication for tables would (likely) have to be turned on, at which point the offline-MR case isn't a worry.
> 
> kturner wrote:
>     Why not support replicating bulk imports?  Seems like it makes things easier on users.

Then the ID mapping is a worry.


- Mike


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19862/#review39264
-----------------------------------------------------------


On April 1, 2014, 1:58 a.m., Josh Elser wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19862/
> -----------------------------------------------------------
> 
> (Updated April 1, 2014, 1:58 a.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-378
>     https://issues.apache.org/jira/browse/ACCUMULO-378
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Re-posting a version of the design doc that I own. Contains grammatical fixes from round one, with a few extra clarifications. New content should be posted here, but I'll maintain the old review as discussion progresses.
> 
> 
> Diffs
> -----
> 
>   docs/src/main/resources/design/ACCUMULO-378-design.mdtext PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/19862/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Josh Elser
> 
>

Re: Review Request 19862: Design document for review on cross-cluster replication

Posted by Josh Elser <jo...@gmail.com>.


> On April 2, 2014, 3:36 p.m., Mike Drob wrote:
> >
> 
> Mike Drob wrote:
>     Huh, RB proxy error ate my comment.
>     
>     I was speaking to some of the HBase team about this yesterday, and they mentioned that they do not support replicated bulk import. Their recommended solution is just to externally copy files and run bulk import on the slave. Since this is something that is possible for users to configure themselves, I'd like to make sure we focus on the difficult case of like ingest.
>     
>     Is the assumption that replication is an all-or-nothing deal? Either you replicate all of the tables on a system, or you replicate none of them, but just a defined set is not allowed? I believe the WAL groups mutations by table IDs, so care would need to be taken to make sure those do not get out of sync.
>     
>     What happens when I clone a table, for example when running an offline MR job. does the clone need to be replicated? I assume no. If the slave is a read-only implementation, can I make clones there to run MR? Maybe another thing that will come out of this is 'transient clones' that have IDs in a reserved high range that can be reused after they are deleted.
>     
>
> 
> Josh Elser wrote:
>     I believe I already said elsewhere that replication is on a per-table basis. Replication for tables would (likely) have to be turned on, at which point the offline-MR case isn't a worry.
> 
> kturner wrote:
>     Why not support replicating bulk imports?  Seems like it makes things easier on users.
> 
> Mike Drob wrote:
>     Then the ID mapping is a worry.

When configuring the replication, we would just track the source tableID and the destination cluster and the destination tableID. Am I missing something?


- Josh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19862/#review39264
-----------------------------------------------------------


On April 1, 2014, 1:58 a.m., Josh Elser wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19862/
> -----------------------------------------------------------
> 
> (Updated April 1, 2014, 1:58 a.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-378
>     https://issues.apache.org/jira/browse/ACCUMULO-378
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Re-posting a version of the design doc that I own. Contains grammatical fixes from round one, with a few extra clarifications. New content should be posted here, but I'll maintain the old review as discussion progresses.
> 
> 
> Diffs
> -----
> 
>   docs/src/main/resources/design/ACCUMULO-378-design.mdtext PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/19862/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Josh Elser
> 
>

Re: Review Request 19862: Design document for review on cross-cluster replication

Posted by Josh Elser <jo...@gmail.com>.


> On April 2, 2014, 3:36 p.m., Mike Drob wrote:
> >
> 
> Mike Drob wrote:
>     Huh, RB proxy error ate my comment.
>     
>     I was speaking to some of the HBase team about this yesterday, and they mentioned that they do not support replicated bulk import. Their recommended solution is just to externally copy files and run bulk import on the slave. Since this is something that is possible for users to configure themselves, I'd like to make sure we focus on the difficult case of like ingest.
>     
>     Is the assumption that replication is an all-or-nothing deal? Either you replicate all of the tables on a system, or you replicate none of them, but just a defined set is not allowed? I believe the WAL groups mutations by table IDs, so care would need to be taken to make sure those do not get out of sync.
>     
>     What happens when I clone a table, for example when running an offline MR job. does the clone need to be replicated? I assume no. If the slave is a read-only implementation, can I make clones there to run MR? Maybe another thing that will come out of this is 'transient clones' that have IDs in a reserved high range that can be reused after they are deleted.
>     
>
> 
> Josh Elser wrote:
>     I believe I already said elsewhere that replication is on a per-table basis. Replication for tables would (likely) have to be turned on, at which point the offline-MR case isn't a worry.
> 
> kturner wrote:
>     Why not support replicating bulk imports?  Seems like it makes things easier on users.
> 
> Mike Drob wrote:
>     Then the ID mapping is a worry.
> 
> Josh Elser wrote:
>     When configuring the replication, we would just track the source tableID and the destination cluster and the destination tableID. Am I missing something?
> 
> Mike Drob wrote:
>     If we're shipping WALs around, then the slave has to know the mapping from source table ID to destination table ID. Then you need to have an extra code path that checks for a mapping before performing "recovery."
>     
>     If we have cyclic replication, then you have to know which WAL you are shipping, because that could imply a different mapping. Master table x maps to slave table y maps to other slave table z. If we have master-master, then both sides need to know the mapping, so I guess the table needs to exist on both clusters before replication can be configured (so that we have a table ID to use in the configuration).
>     
>     Also, if we're shipping WALs around, then it is possible that you have 99 mutations for a table that isn't replicated and 1 mutation that is replpicated. Sending offsets and chunks can help minimize the bandwidth, but...
> 
> Mike Drob wrote:
>     Actually, another thing that would be really cool is self-replication where I clone a table and then replicate future writes to it.

re: table mapping

The destination tableID could be included in the message from the source. Then, the slave would just have to do some validation that it has a table with such an ID. I wasn't initially considering the slave having the replication configuration of the master. A couple of security concerns arise again here (although I think they're general to the problem).

re: cyclic replication

To handle cycles, both sides need to have a replication configuration, yes, but they don't need to know each others'. Cluster 1 knows to send to cluster2, and cluster 2 knows to send to cluster 1. To prevent re-replication, cluster 1 just needs to know not to send data to cluster 2 that originated from cluster 2. This will be of interest in state-keeping (~repl records).

re: new/cloned tables

We had touched on remote table configuration propagation before, and while I still think that scope is outside the original plan, I'd agree that auto-replication of new tables should be added to the list of future work. For this go-around, we definitely need to firm out how a new table is added to replication. Assuming the slave is in some "read-only" mode, are table operations still permitted?


- Josh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19862/#review39264
-----------------------------------------------------------


On April 1, 2014, 1:58 a.m., Josh Elser wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19862/
> -----------------------------------------------------------
> 
> (Updated April 1, 2014, 1:58 a.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-378
>     https://issues.apache.org/jira/browse/ACCUMULO-378
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Re-posting a version of the design doc that I own. Contains grammatical fixes from round one, with a few extra clarifications. New content should be posted here, but I'll maintain the old review as discussion progresses.
> 
> 
> Diffs
> -----
> 
>   docs/src/main/resources/design/ACCUMULO-378-design.mdtext PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/19862/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Josh Elser
> 
>

Re: Review Request 19862: Design document for review on cross-cluster replication

Posted by ke...@deenlo.com.


> On April 2, 2014, 3:36 p.m., Mike Drob wrote:
> >
> 
> Mike Drob wrote:
>     Huh, RB proxy error ate my comment.
>     
>     I was speaking to some of the HBase team about this yesterday, and they mentioned that they do not support replicated bulk import. Their recommended solution is just to externally copy files and run bulk import on the slave. Since this is something that is possible for users to configure themselves, I'd like to make sure we focus on the difficult case of like ingest.
>     
>     Is the assumption that replication is an all-or-nothing deal? Either you replicate all of the tables on a system, or you replicate none of them, but just a defined set is not allowed? I believe the WAL groups mutations by table IDs, so care would need to be taken to make sure those do not get out of sync.
>     
>     What happens when I clone a table, for example when running an offline MR job. does the clone need to be replicated? I assume no. If the slave is a read-only implementation, can I make clones there to run MR? Maybe another thing that will come out of this is 'transient clones' that have IDs in a reserved high range that can be reused after they are deleted.
>     
>
> 
> Josh Elser wrote:
>     I believe I already said elsewhere that replication is on a per-table basis. Replication for tables would (likely) have to be turned on, at which point the offline-MR case isn't a worry.

Why not support replicating bulk imports?  Seems like it makes things easier on users.


- kturner


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19862/#review39264
-----------------------------------------------------------


On April 1, 2014, 1:58 a.m., Josh Elser wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19862/
> -----------------------------------------------------------
> 
> (Updated April 1, 2014, 1:58 a.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-378
>     https://issues.apache.org/jira/browse/ACCUMULO-378
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Re-posting a version of the design doc that I own. Contains grammatical fixes from round one, with a few extra clarifications. New content should be posted here, but I'll maintain the old review as discussion progresses.
> 
> 
> Diffs
> -----
> 
>   docs/src/main/resources/design/ACCUMULO-378-design.mdtext PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/19862/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Josh Elser
> 
>

Re: Review Request 19862: Design document for review on cross-cluster replication

Posted by Mike Drob <md...@mdrob.com>.


> On April 2, 2014, 3:36 p.m., Mike Drob wrote:
> >
> 
> Mike Drob wrote:
>     Huh, RB proxy error ate my comment.
>     
>     I was speaking to some of the HBase team about this yesterday, and they mentioned that they do not support replicated bulk import. Their recommended solution is just to externally copy files and run bulk import on the slave. Since this is something that is possible for users to configure themselves, I'd like to make sure we focus on the difficult case of like ingest.
>     
>     Is the assumption that replication is an all-or-nothing deal? Either you replicate all of the tables on a system, or you replicate none of them, but just a defined set is not allowed? I believe the WAL groups mutations by table IDs, so care would need to be taken to make sure those do not get out of sync.
>     
>     What happens when I clone a table, for example when running an offline MR job. does the clone need to be replicated? I assume no. If the slave is a read-only implementation, can I make clones there to run MR? Maybe another thing that will come out of this is 'transient clones' that have IDs in a reserved high range that can be reused after they are deleted.
>     
>
> 
> Josh Elser wrote:
>     I believe I already said elsewhere that replication is on a per-table basis. Replication for tables would (likely) have to be turned on, at which point the offline-MR case isn't a worry.
> 
> kturner wrote:
>     Why not support replicating bulk imports?  Seems like it makes things easier on users.
> 
> Mike Drob wrote:
>     Then the ID mapping is a worry.
> 
> Josh Elser wrote:
>     When configuring the replication, we would just track the source tableID and the destination cluster and the destination tableID. Am I missing something?
> 
> Mike Drob wrote:
>     If we're shipping WALs around, then the slave has to know the mapping from source table ID to destination table ID. Then you need to have an extra code path that checks for a mapping before performing "recovery."
>     
>     If we have cyclic replication, then you have to know which WAL you are shipping, because that could imply a different mapping. Master table x maps to slave table y maps to other slave table z. If we have master-master, then both sides need to know the mapping, so I guess the table needs to exist on both clusters before replication can be configured (so that we have a table ID to use in the configuration).
>     
>     Also, if we're shipping WALs around, then it is possible that you have 99 mutations for a table that isn't replicated and 1 mutation that is replpicated. Sending offsets and chunks can help minimize the bandwidth, but...

Actually, another thing that would be really cool is self-replication where I clone a table and then replicate future writes to it.


- Mike


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19862/#review39264
-----------------------------------------------------------


On April 1, 2014, 1:58 a.m., Josh Elser wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19862/
> -----------------------------------------------------------
> 
> (Updated April 1, 2014, 1:58 a.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-378
>     https://issues.apache.org/jira/browse/ACCUMULO-378
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Re-posting a version of the design doc that I own. Contains grammatical fixes from round one, with a few extra clarifications. New content should be posted here, but I'll maintain the old review as discussion progresses.
> 
> 
> Diffs
> -----
> 
>   docs/src/main/resources/design/ACCUMULO-378-design.mdtext PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/19862/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Josh Elser
> 
>

Re: Review Request 19862: Design document for review on cross-cluster replication

Posted by Mike Drob <md...@mdrob.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19862/#review39264
-----------------------------------------------------------


- Mike Drob


On April 1, 2014, 1:58 a.m., Josh Elser wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19862/
> -----------------------------------------------------------
> 
> (Updated April 1, 2014, 1:58 a.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-378
>     https://issues.apache.org/jira/browse/ACCUMULO-378
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Re-posting a version of the design doc that I own. Contains grammatical fixes from round one, with a few extra clarifications. New content should be posted here, but I'll maintain the old review as discussion progresses.
> 
> 
> Diffs
> -----
> 
>   docs/src/main/resources/design/ACCUMULO-378-design.mdtext PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/19862/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Josh Elser
> 
>

Re: Review Request 19862: Design document for review on cross-cluster replication

Posted by Josh Elser <jo...@gmail.com>.


> On April 2, 2014, 3:15 p.m., Josh Elser wrote:
> > docs/src/main/resources/design/ACCUMULO-378-design.mdtext, line 62
> > <https://reviews.apache.org/r/19862/diff/1/?file=543190#file543190line62>
> >
> >     Thinking about this from a total ordering standpoint. Say we're replicating to two slaves, and we have three rfiles to replicate (1, 2 and 3) to those two slaves.
> >     
> >     We replicate rfile1 to both, but then the link to slave2 goes down. We can still replicate rfile2 and then rfile3 to slave1, while we try to send rfile2 to slave2.
> >     
> >     What, if instead of the link being down, we happen to communicate to an angry server inside of slave2 which never completes the transfer. We don't want to transfer rfile3 to attempt to better preserve global ordering.
> >     
> >     This can be restated as "we only want to replicate one 'file' to a slave at a time" so that we preserve the original semantics of the replication "queue" (table). The problem is that this could drastically slow down replication when the link between master and slave cannot be saturated by one replication task at a time.
> >     
> >     This isn't anything that we can reliably guarantee now (without conditional mutations), right? Is it worth trying to tackle? The one clear change I want to make is that we do want to put the identifier for the slave in with the replication record rather than defer determination of where a record should be replicated.
> 
> kturner wrote:
>     I also think transferring files should be an external concern like Mike Said.  One way this could work is the following.
>     
>      1. Cluster A exports a batch of file uris and a control file (similar to export table)
>      2. The user distcps the uris and control file
>      3. The control file and dir containing distcp files is provided to cluter B to import 
>     
>     The difference between this and import/export table is thats its stateful.  Export on Cluster A provides the list of changes since the last export.  The control file contains ordering information about how to apply the files.  The control file also contains ordering information about other import/exports.   But this process is incomplete.  The feedback process would need to be worked out.  The entire process should be resiliant to users trying to apply things out of order.

What would the ordering guarantees be that we would need to provide for bulk import? Ordering of when bulk-imports finished on the master? If you have two tables and two bulk imports which each had a file to each of those tables, could you have interleaved imports? e.g. table1 sees the file from import1 and then import2, while table2 sees the file from import2 and then import1?

I am leaning towards not supporting bulk import replication for simplicity's sake on a first go now.


- Josh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19862/#review39260
-----------------------------------------------------------


On April 1, 2014, 1:58 a.m., Josh Elser wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19862/
> -----------------------------------------------------------
> 
> (Updated April 1, 2014, 1:58 a.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-378
>     https://issues.apache.org/jira/browse/ACCUMULO-378
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Re-posting a version of the design doc that I own. Contains grammatical fixes from round one, with a few extra clarifications. New content should be posted here, but I'll maintain the old review as discussion progresses.
> 
> 
> Diffs
> -----
> 
>   docs/src/main/resources/design/ACCUMULO-378-design.mdtext PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/19862/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Josh Elser
> 
>

Re: Review Request 19862: Design document for review on cross-cluster replication

Posted by ke...@deenlo.com.


> On April 2, 2014, 3:15 p.m., Josh Elser wrote:
> > docs/src/main/resources/design/ACCUMULO-378-design.mdtext, line 62
> > <https://reviews.apache.org/r/19862/diff/1/?file=543190#file543190line62>
> >
> >     Thinking about this from a total ordering standpoint. Say we're replicating to two slaves, and we have three rfiles to replicate (1, 2 and 3) to those two slaves.
> >     
> >     We replicate rfile1 to both, but then the link to slave2 goes down. We can still replicate rfile2 and then rfile3 to slave1, while we try to send rfile2 to slave2.
> >     
> >     What, if instead of the link being down, we happen to communicate to an angry server inside of slave2 which never completes the transfer. We don't want to transfer rfile3 to attempt to better preserve global ordering.
> >     
> >     This can be restated as "we only want to replicate one 'file' to a slave at a time" so that we preserve the original semantics of the replication "queue" (table). The problem is that this could drastically slow down replication when the link between master and slave cannot be saturated by one replication task at a time.
> >     
> >     This isn't anything that we can reliably guarantee now (without conditional mutations), right? Is it worth trying to tackle? The one clear change I want to make is that we do want to put the identifier for the slave in with the replication record rather than defer determination of where a record should be replicated.

I also think transferring files should be an external concern like Mike Said.  One way this could work is the following.

 1. Cluster A exports a batch of file uris and a control file (similar to export table)
 2. The user distcps the uris and control file
 3. The control file and dir containing distcp files is provided to cluter B to import 

The difference between this and import/export table is thats its stateful.  Export on Cluster A provides the list of changes since the last export.  The control file contains ordering information about how to apply the files.  The control file also contains ordering information about other import/exports.   But this process is incomplete.  The feedback process would need to be worked out.  The entire process should be resiliant to users trying to apply things out of order.


- kturner


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19862/#review39260
-----------------------------------------------------------


On April 1, 2014, 1:58 a.m., Josh Elser wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19862/
> -----------------------------------------------------------
> 
> (Updated April 1, 2014, 1:58 a.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-378
>     https://issues.apache.org/jira/browse/ACCUMULO-378
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Re-posting a version of the design doc that I own. Contains grammatical fixes from round one, with a few extra clarifications. New content should be posted here, but I'll maintain the old review as discussion progresses.
> 
> 
> Diffs
> -----
> 
>   docs/src/main/resources/design/ACCUMULO-378-design.mdtext PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/19862/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Josh Elser
> 
>

Re: Review Request 19862: Design document for review on cross-cluster replication

Posted by Josh Elser <jo...@gmail.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19862/#review39260
-----------------------------------------------------------



docs/src/main/resources/design/ACCUMULO-378-design.mdtext
<https://reviews.apache.org/r/19862/#comment71626>

    Thinking about this from a total ordering standpoint. Say we're replicating to two slaves, and we have three rfiles to replicate (1, 2 and 3) to those two slaves.
    
    We replicate rfile1 to both, but then the link to slave2 goes down. We can still replicate rfile2 and then rfile3 to slave1, while we try to send rfile2 to slave2.
    
    What, if instead of the link being down, we happen to communicate to an angry server inside of slave2 which never completes the transfer. We don't want to transfer rfile3 to attempt to better preserve global ordering.
    
    This can be restated as "we only want to replicate one 'file' to a slave at a time" so that we preserve the original semantics of the replication "queue" (table). The problem is that this could drastically slow down replication when the link between master and slave cannot be saturated by one replication task at a time.
    
    This isn't anything that we can reliably guarantee now (without conditional mutations), right? Is it worth trying to tackle? The one clear change I want to make is that we do want to put the identifier for the slave in with the replication record rather than defer determination of where a record should be replicated.


- Josh Elser


On April 1, 2014, 1:58 a.m., Josh Elser wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19862/
> -----------------------------------------------------------
> 
> (Updated April 1, 2014, 1:58 a.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-378
>     https://issues.apache.org/jira/browse/ACCUMULO-378
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Re-posting a version of the design doc that I own. Contains grammatical fixes from round one, with a few extra clarifications. New content should be posted here, but I'll maintain the old review as discussion progresses.
> 
> 
> Diffs
> -----
> 
>   docs/src/main/resources/design/ACCUMULO-378-design.mdtext PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/19862/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Josh Elser
> 
>