You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@phoenix.apache.org by Vamsi Krishna <va...@gmail.com> on 2016/06/28 14:09:46 UTC

How to deal with Phoenix Secondary Indexes & HBase snapshots in batch job

Team,

We are using HDP 2.3.2 (HBase 1.1.2, Phoenix 4.4.0).

We have two Phoenix tables 'TABLE_A', 'TABLE_B' and a Phoenix view
'TABLE_VIEW'.
Phoenix view is always pointing to one of the above two Phoenix tables
which is called Active table and the other table is called Standby table.
We have a batch job (Oozie workflow) that executes every night to process
some data files and insert data into Phoenix table.
In one of the Oozie actions we do:
Figure out which Phoenix table is Active/Standby. We do this by querying
phoenix meta table (SYSTEM.CATALOG) to check which phoenix table is the
phoenix view pointing to.
Drop Phoenix standby table.
Create HBase snapshot of HBase Active table.
Clone the above snapshot to create HBase Standby table.
Create Phoenix Standby table pointing to the HBase Standby table cloned
from HBase Active table in the above step.
By this point we are able to get Phoenix Standby table to the same state of
Phoenix Active table without any actual movement of data.
Now, we will process the new data files and insert the data into Phoenix
Standby table using Phoenix CsvBulkLoadTool.
At the end we flip the Phoenix view to point to the Phoenix Standby table.

*New requirement:*
With a need for secondary access pattern, we are planning on adding
secondary index (local index) on one of the columns of the Phoenix table.
Now in the Oozie action detailed above, recreating the HBase Standby table
using snapshot of HBase Active table and recreating Phoenix Standby table
on top of the HBase Standby table is not going to create the secondary
index on the Phoenix Standby table.
Ths is because the data table and the index table are completely
independent in HBase. Please correct me if my assumption is wrong.
One option that I can think of here is to create the secondary index on the
Phoenix Standby table after processing the data files and inserting data
using Phoenix CsvBulkLoadTool.
But, as the table volume keeps getting bigger the above step is going to
take more and more time.
What are the other alternative solutions for this scenario?

*Idea:*
After recreating the HBase Standby table using snapshot of HBase Active
table and recreating Phoenix Standby table on top of the HBase Standby
table, create HBase index table for Standby table using snapshot of HBase
index table of Active table.
Create secondary index on the Phoenix Standby table pointing to the HBase
index table created above.
Is this possible?


Thanks,

Vamsi Attluri
-- 
Vamsi Attluri

Re: How to deal with Phoenix Secondary Indexes & HBase snapshots in batch job

Posted by vikashtalanki <vt...@visa.com>.

Hi Vamsi,

Can you please explain what is the rational behind maintaining 2 phoenix
tables?
If I'm not wrong, maintaining a single phoenix table with single view on it
and updating/inserting new data into that table with oozie job should work
fine.
Phoenix csvBulkLoad tool also populates all indexes of a table if any. There
is no need to write any other jobs to populate index tables separately.
But, Yes, inserting data into data+Index tables is much time consuming than
just inserting into data table.



--
View this message in context: http://apache-phoenix-user-list.1124778.n5.nabble.com/How-to-deal-with-Phoenix-Secondary-Indexes-HBase-snapshots-in-batch-job-tp1997p2001.html
Sent from the Apache Phoenix User List mailing list archive at Nabble.com.