You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Mark Kerzner <ke...@shmsoft.com> on 2011/02/24 17:00:41 UTC

HBase replication

Hi,

I have two HBases running in separate clusters, and ideally I would like to
synchronize them: records not found in one should be copied over to the
other, and vice versa.

Now, I do know that there is master-slave replication in 0.89 already, but
that master-master is experimental in 0.90 on, with some of it left for
0.92, and it will only copy tables from one HBase that are not present in
another.

Are there any other approaches that can get me as close to this kind of
synchronization as possible?

Thank you,
Mark
-- 
View this message in context: http://old.nabble.com/HBase-replication-tp31005404p31005404.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: HBase replication

Posted by Jean-Daniel Cryans <jd...@apache.org>.

A scan can scan as many families as you have.

J-D

On Fri, Mar 11, 2011 at 7:56 AM, Mark Kerzner <ma...@gmail.com> wrote:
> J-D,
> when I read from two family, I cannot combine it in one scan, so I would
> have to do two scans, correct?
> Thank you,
> Mark
>
> On Thu, Feb 24, 2011 at 2:13 PM, Mark Kerzner <ma...@gmail.com> wrote:
>>
>> Thanks, J-D, that's the best answer.
>> Merci!
>> Mark
>>
>> On Thu, Feb 24, 2011 at 1:34 PM, Jean-Daniel Cryans <jd...@apache.org>
>> wrote:
>>>
>>> Ah ok so really just master-master... well it's possible to do it in
>>> 0.90 as long as a family that's replicated from one cluster isn't
>>> replicated when inserted in the other. That means you would have to
>>> use 2 columns families and merge the results.
>>>
>>> Let's say you have table "test", on cluster 1 you would create it like
>>> this:
>>>
>>> create "test", {NAME => 'f1', REPLICATION_SCOPE => '1'}, {NAME => 'f2'}
>>>
>>> Then on cluster 2:
>>>
>>> create "test", {NAME => 'f1'}, {NAME => 'f2', REPLICATION_SCOPE => '1'}
>>>
>>> When you write, always write to 1 family (and that family is different
>>> depending on the cluster you're on). When you read, always get the
>>> data from both families.
>>>
>>> Another option is to write yourself to both clusters (async or not).
>>>
>>> J-D
>>>
>>> On Thu, Feb 24, 2011 at 10:28 AM, Mark Kerzner <ke...@shmsoft.com>
>>> wrote:
>>> > Yes, J-D, your understanding of my understanding is correct.
>>> > The two would actually be the same if all new records in one HBase
>>> > could be
>>> > copied to the other HBase through log shipping. That is assuming that
>>> > the
>>> > two databases never get records with the same row key. I did not mean
>>> > to
>>> > synchronize after the fact, but only as the records are being written,
>>> > from
>>> > the very beginning.
>>> > If we agree on the question, what would be the solution?
>>> > Thank you,
>>> > Mark
>>> >
>>> > On Thu, Feb 24, 2011 at 12:14 PM, Jean-Daniel Cryans
>>> > <jd...@apache.org>
>>> > wrote:
>>> >>
>>> >> What you describe is more like the rsync tool, which isn't what HBase
>>> >> replication is doing at all. Replication works with log shipping, and
>>> >> only copies data when it reads it from a log, there's no proactive
>>> >> thread that checks for differences between two clusters and that
>>> >> copies the missing pieces.
>>> >>
>>> >> Is my understanding of your understanding of replication correct?
>>> >>
>>> >> J-D
>>> >>
>>> >> On Thu, Feb 24, 2011 at 8:00 AM, Mark Kerzner <ke...@shmsoft.com>
>>> >> wrote:
>>> >> >
>>> >> > Hi,
>>> >> >
>>> >> > I have two HBases running in separate clusters, and ideally I would
>>> >> > like
>>> >> > to
>>> >> > synchronize them: records not found in one should be copied over to
>>> >> > the
>>> >> > other, and vice versa.
>>> >> >
>>> >> > Now, I do know that there is master-slave replication in 0.89
>>> >> > already,
>>> >> > but
>>> >> > that master-master is experimental in 0.90 on, with some of it left
>>> >> > for
>>> >> > 0.92, and it will only copy tables from one HBase that are not
>>> >> > present
>>> >> > in
>>> >> > another.
>>> >> >
>>> >> > Are there any other approaches that can get me as close to this kind
>>> >> > of
>>> >> > synchronization as possible?
>>> >> >
>>> >> > Thank you,
>>> >> > Mark
>>> >> > --
>>> >> > View this message in context:
>>> >> > http://old.nabble.com/HBase-replication-tp31005404p31005404.html
>>> >> > Sent from the HBase User mailing list archive at Nabble.com.
>>> >> >
>>> >> >
>>> >
>>> >
>>
>
>

Re: HBase replication

Posted by Jean-Daniel Cryans <jd...@apache.org>.

Ah ok so really just master-master... well it's possible to do it in
0.90 as long as a family that's replicated from one cluster isn't
replicated when inserted in the other. That means you would have to
use 2 columns families and merge the results.

Let's say you have table "test", on cluster 1 you would create it like this:

create "test", {NAME => 'f1', REPLICATION_SCOPE => '1'}, {NAME => 'f2'}

Then on cluster 2:

create "test", {NAME => 'f1'}, {NAME => 'f2', REPLICATION_SCOPE => '1'}

When you write, always write to 1 family (and that family is different
depending on the cluster you're on). When you read, always get the
data from both families.

Another option is to write yourself to both clusters (async or not).

J-D

On Thu, Feb 24, 2011 at 10:28 AM, Mark Kerzner <ke...@shmsoft.com> wrote:
> Yes, J-D, your understanding of my understanding is correct.
> The two would actually be the same if all new records in one HBase could be
> copied to the other HBase through log shipping. That is assuming that the
> two databases never get records with the same row key. I did not mean to
> synchronize after the fact, but only as the records are being written, from
> the very beginning.
> If we agree on the question, what would be the solution?
> Thank you,
> Mark
>
> On Thu, Feb 24, 2011 at 12:14 PM, Jean-Daniel Cryans <jd...@apache.org>
> wrote:
>>
>> What you describe is more like the rsync tool, which isn't what HBase
>> replication is doing at all. Replication works with log shipping, and
>> only copies data when it reads it from a log, there's no proactive
>> thread that checks for differences between two clusters and that
>> copies the missing pieces.
>>
>> Is my understanding of your understanding of replication correct?
>>
>> J-D
>>
>> On Thu, Feb 24, 2011 at 8:00 AM, Mark Kerzner <ke...@shmsoft.com> wrote:
>> >
>> > Hi,
>> >
>> > I have two HBases running in separate clusters, and ideally I would like
>> > to
>> > synchronize them: records not found in one should be copied over to the
>> > other, and vice versa.
>> >
>> > Now, I do know that there is master-slave replication in 0.89 already,
>> > but
>> > that master-master is experimental in 0.90 on, with some of it left for
>> > 0.92, and it will only copy tables from one HBase that are not present
>> > in
>> > another.
>> >
>> > Are there any other approaches that can get me as close to this kind of
>> > synchronization as possible?
>> >
>> > Thank you,
>> > Mark
>> > --
>> > View this message in context:
>> > http://old.nabble.com/HBase-replication-tp31005404p31005404.html
>> > Sent from the HBase User mailing list archive at Nabble.com.
>> >
>> >
>
>

Re: HBase replication

Posted by Jean-Daniel Cryans <jd...@apache.org>.

What you describe is more like the rsync tool, which isn't what HBase
replication is doing at all. Replication works with log shipping, and
only copies data when it reads it from a log, there's no proactive
thread that checks for differences between two clusters and that
copies the missing pieces.

Is my understanding of your understanding of replication correct?

J-D

On Thu, Feb 24, 2011 at 8:00 AM, Mark Kerzner <ke...@shmsoft.com> wrote:
>
> Hi,
>
> I have two HBases running in separate clusters, and ideally I would like to
> synchronize them: records not found in one should be copied over to the
> other, and vice versa.
>
> Now, I do know that there is master-slave replication in 0.89 already, but
> that master-master is experimental in 0.90 on, with some of it left for
> 0.92, and it will only copy tables from one HBase that are not present in
> another.
>
> Are there any other approaches that can get me as close to this kind of
> synchronization as possible?
>
> Thank you,
> Mark
> --
> View this message in context: http://old.nabble.com/HBase-replication-tp31005404p31005404.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>