You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by Nick Dimiduk <nd...@gmail.com> on 2019/05/04 01:19:12 UTC
Re: About mapping a phoenix table

Hi Reid,

Congratulations on getting it done, and thank you for reporting back with
the details of your success!

Thanks,
Nick

On Sat, Apr 27, 2019 at 8:17 PM Reid Chan <re...@outlook.com> wrote:

> Many thanks to Anil, Nick, Varun.
>
> I successfully finished the table migration from Phoenix 4.7 to Phoenix
> 4.14 by overcoming two problems:
> a. table in 4.7 without namespace mapping can be mapped to namespace
> mapping table.
> b. table in 4.7 can be read in 4.14.
>
> And here to share my experience:
>
> 1. In order to support `a`, i added a new feature in hbck. What it does is
> modifying the meta recorded in .tableinfo and .regioninfo in table. Then a
> command like this `hbck -upgradeNamespace(new feature) -fixMeta
> -fixAssignments` will make the table namespace from
> 'default:NAMESPACE.TABLE` to ` NAMESPACE:TABLE` from hbase perspective.
>
> 2. Solving `b` was a pretty hard job, since i didn't have a good knowledge
> of phoenix source code. But i found the cause finally which is tricky,
> after a lot of code tracking. The reason why table from 4.7 can't be read
> in 4.14 is `QualifierEncodingScheme`. 4.7 doesn't have that
> feature(introduced in 4.10), and the underlying data in hfile, of course,
> don't have the encoded qualifier as well, while 4.14 is enable this feature
> by default, the SCAN constructed in 4.14 will use the encoded qualifier as
> column condition which reads nothing from hbase. So, the solution is adding
> a `COLUMN_ENCODED_BYTES=NONE` at the end of the original CREATE STATEMENT.
>
> If hbase community allowed, i'm willing to contribute my new feature to
> hbck, then we may have an official solution of table migration. (Or it may
> not count as a solution, but we can have a document, at least?)
>
> Let me list the procedure of table migration from lower version (<4.10) to
> higher version (>=4.10) Phoenix:
> 1. Taking snapshot from src cluster, exporting snapshot to dst cluster,
> restoring snapshot in dst cluster.
> 2. run `-upgradeNamespace` in dst cluster. (no need if you don't want
> namespace mapping feature)
> 3. CREATE TABLE ***, COLUMN_ENCODED_BYTES=NONE;
> 4. Enjoy your queries.
>
>
> Cheers!
>
>
> --------------------------
>
> Best regards,
> R.C
>
>
>
> ________________________________________
> From: Varun Rao <va...@cloudera.com.INVALID>
> Sent: 03 April 2019 00:31
> To: dev@phoenix.apache.org
> Subject: Re: About mapping a phoenix table
>
> Hello Reid,
>
> I dont know if this fits your use case but there is a way of copying data
> from a Phoenix Table to another Phoenix table in another cluster if data is
> not present yet in either table.
>
> We can use the fact that Phoenix stores its metadata using HBase tables.
> Therefore by enabling replication on the underlying source HBase table,
> adding the destination cluster as a peer in the source hbase cluster, and
> setting the source hbase table's replication scope to 1 any data flowing
> into the source phoenix table will be copied to a destination phoenix
> table.
>
> 1) Create Phoenix tables on source and destination cluster
> 2) Set replication=true on source hbase cluster through cm
> 3) Add peer on source cluster via hbase shell
>
>
> *add_peer '1’, CLUSTER_KEY => "some_node:2181:/hbase", TABLE_CFS =>
> {"TABLE_NAME" => ["column_1", "column_2"]}*
>
> Here you can specify which columns you would like to copy over
>
> 4) Disable the source Phoenix table, set replication scope as 1, re enable
> it
>
> *disable ‘SOURCE_TABLE’*
>
> *alter ‘BIGTABLE_PHOENIX’, {NAME=>’w_a’, REPLICATION_SCOPE=>’1’},
> {NAME=>’w_b’, REPLICATION_SCOPE=>’1’}*
>
> *enable ‘BIGTABLE_PHOENIX’*
>
> 5) Send data. In my test case I used psql to send 2 million records from
> CSV to the phoenix source table
>
> *phoenix-psql.py -t BIGTABLE_PHOENIX localhost:2181 wine_mag.csv *
>
>
> 6) You can now see the same data in the source and target cluster
>
>
> Thanks
> Yours Truly,
> Varun Rao
>
>
> On Tue, Apr 2, 2019 at 12:04 PM Nick Dimiduk <nd...@gmail.com> wrote:
>
> > Hi Reid,
> >
> > I'll throw my +1 onto Anil's Approach #1. I followed this path recently
> to
> > migrate all of our production data. Migrating Phoenix metadata by
> creating
> > tables manually on the destination is a little clunky, but HBase
> Snapshots
> > are quite easy to work with.
> >
> > Good luck,
> > Nick
> >
> > On Tue, Apr 2, 2019 at 5:26 AM anil gupta <an...@gmail.com> wrote:
> >
> > > Hey Reid,
> > > AFAIK, there is no official Phoenix tool to copy table between
> clusters.
> > > IMO, it would be great to have an official tool to copy tables.
> > > In our case, source and destination clusters are running Phoenix4.7.
> IMO,
> > > copy between 4.7-4.14 might have some version incompatibility. So, you
> > > might need to test following in non-prod first.
> > >
> > > Approach 1: We usually move tables by taking a snapshot of hbase table,
> > > exporting the snapshot to remote cluster, create Phoenix table, delete
> > > underlying hbase table, and restoring the snapshot. Please keep in mind
> > > that you will need to do similar exercise if your table has secondary
> > > indexes since they are stored in another hbase table.  Also, make sure
> > that
> > > you don’t have any live traffic to Phoenix table in destination cluster
> > > until restoring of snapshot and verification of data in table.
> > >
> > > Approach 2: Use copyTable util of hbase. In this case, you will just
> need
> > > to create Phoenix table on remote cluster and then kick off hbase copy
> > > table. In this approach also, you will need to perform copyTable for
> each
> > > secondary index.
> > >
> > > We usually use approach1 because it’s usually faster and doesn’t puts
> > > write load on cluster.
> > >
> > > HTH,
> > > Anil Gupta
> > >
> > > > On Apr 2, 2019, at 4:32 AM, Reid Chan <re...@outlook.com>
> wrote:
> > > >
> > > > Hi team,
> > > >
> > > > I'm trying to transport a phoenix table between two clusters, by
> > copying
> > > all related hbase files on hdfs from cluster A to cluster B.
> > > > But after i executed CreateTableStatement in phoenix, phoenix failed
> to
> > > map those files into table, and `select *` got nothing.
> > > >
> > > > The questions are,
> > > > Is there a proper way or tool to do the table transportation?
> > > > If answer is no, can team provide some code pointers if i want to
> > > implement it?
> > > > Or reason why is this infeasible?
> > > >
> > > > FYI,
> > > > both hbase version are both 1.x but different in minor version,
> > > > phoenix version gap is huge, 4.7.0 and 4.14.1.
> > > >
> > > > Any suggestions are appreciated!
> > > > Thanks
> > > >
> > > >
> > > > --------------------------
> > > >
> > > > Best regards,
> > > > R.C
> > > >
> > > >
> > >
> >
>