You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@phoenix.apache.org by Jean-Marc Spaggiari <je...@spaggiari.org> on 2016/06/09 13:49:52 UTC

Table replication

Hi,

When Phoenix is used, what is the recommended way to do replication?

Replication acts as a client on the 2nd cluster, so should we simply
configure Phoenix on both cluster and on the destination it will take care
of updating the index tables, etc. Or should all the tables on the
destination side, including Phoenix tables, be replicated on the
destination side too? I seached a bit about that on the Phoenix site and
google and did not find anything.

Thanks,

JMS

Re: Table replication

Posted by James Taylor <ja...@apache.org>.

Hi JM,
Are you looking toward replication to support DR? If so, you can rely on
HBase-level replication with a few gotchas and some operational hurdles:

- When upgrading Phoenix versions, upgrade the server-side first for both
the primary and secondary cluster. You can do a rolling upgrade and old
clients will continue to work with the upgraded server, so no downtime is
required (see Backward Compatibility[1] for more details).
- Execute Phoenix DDL (i.e. user-level changes to existing Phoenix tables,
creation of new tables, indexes, sequences) against both the primary and
secondary cluster with replication suspended (as otherwise you end up with
a race condition for the replication of the SYSTEM.CATALOG table and any
not yet existing tables). If you've upgraded Phoenix, then even if there's
no DDL, you should at a minimum connect a Phoenix client to both the
primary and secondary cluster to trigger any upgrades to Phoenix system
tables. Once the DDL is complete, resume replication.
- Do not replicate the SYSTEM.SEQUENCE table since replication is
asynchronous and may fall behind which would be a big issue if switching
over to the secondary cluster as sequence values could start repeating.
Instead, incorporate a cluster ID into any sequence-based identifiers and
concatenate this with the sequence value. In that way, the identifiers will
continue to be unique after a DR event.
- Replicate Phoenix indexes just like data tables as the HBase-level
replication of the data table will not trigger index updates.
- In theory, you really only need to replicate views from SYSTEM.CATALOG
since you're executing DDL on both the primary and secondary cluster,
however I don't think HBase has that capability (but it sure would be
nice). FWIW, we're thinking of separating views from table definitions into
separate Phoenix tables but need to first make these tables transactional
(we're using an HBase mechanism that allows all or none commits to the
SYSTEM.CATALOG, but it only works if all updates are to the same RS which
is too limiting).
- It's a good idea to monitor the depth of the replication queue so you
know if/when replication is falling behind.
- Care has to be taken wrt keeping deleted cells on both clusters if you
want to support point-in-time backup and restore, as it's possible that
compaction would remove cells before you're backup window has passed (this
orthogonal to replication, but just wanted to bring it up).
- Given the asynchronous nature of HBase replication, there's no good way
of knowing the transaction ID (i.e. timestamp) at which you have all of the
data. Also, replication of the state that is kept by the transaction
manager in terms of inflight and invalid transactions is left as an
exercise to the reader. :-) In short - there's still some work to do wrt
the combination of transactions and replication (but it'd be really
interesting work if anyone is interested).

HTH. Thanks,

James

[1] https://phoenix.apache.org/upgrading.html

On Thu, Jun 9, 2016 at 7:56 AM, anil gupta <an...@gmail.com> wrote:

> Hi Jean,
>
> Phoenix does not supports replication at present.(It will be super awesome
> if it can) So, if you want to do replication of Phoenix tables you will
> need to setup replication of all the underlying HBase tables for
> corresponding Phoenix tables.
>
> I think you will need to replicate all the Phoenix system hbase tables,
> Global/Local secondary index table and then Primary Phoenix table.
>
> I haven't done it yet. But, above is the way i would approach it.
>
> Thanks,
> Anil Gupta.
>
>
> On Thu, Jun 9, 2016 at 6:49 AM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
>> Hi,
>>
>> When Phoenix is used, what is the recommended way to do replication?
>>
>> Replication acts as a client on the 2nd cluster, so should we simply
>> configure Phoenix on both cluster and on the destination it will take care
>> of updating the index tables, etc. Or should all the tables on the
>> destination side, including Phoenix tables, be replicated on the
>> destination side too? I seached a bit about that on the Phoenix site and
>> google and did not find anything.
>>
>> Thanks,
>>
>> JMS
>>
>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>

Re: Table replication

Posted by anil gupta <an...@gmail.com>.

Hi Jean,

Phoenix does not supports replication at present.(It will be super awesome
if it can) So, if you want to do replication of Phoenix tables you will
need to setup replication of all the underlying HBase tables for
corresponding Phoenix tables.

I think you will need to replicate all the Phoenix system hbase tables,
Global/Local secondary index table and then Primary Phoenix table.

I haven't done it yet. But, above is the way i would approach it.

Thanks,
Anil Gupta.

On Thu, Jun 9, 2016 at 6:49 AM, Jean-Marc Spaggiari <jean-marc@spaggiari.org
> wrote:

> Hi,
>
> When Phoenix is used, what is the recommended way to do replication?
>
> Replication acts as a client on the 2nd cluster, so should we simply
> configure Phoenix on both cluster and on the destination it will take care
> of updating the index tables, etc. Or should all the tables on the
> destination side, including Phoenix tables, be replicated on the
> destination side too? I seached a bit about that on the Phoenix site and
> google and did not find anything.
>
> Thanks,
>
> JMS
>

-- 
Thanks & Regards,
Anil Gupta