You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "Samarth Jain (JIRA)" <ji...@apache.org> on 2017/02/02 00:43:51 UTC

[jira] [Created] (PHOENIX-3645) Build a mechanism for creating a table and populating it with data from a source table

Samarth Jain created PHOENIX-3645:
-------------------------------------

             Summary: Build a mechanism for creating a table and populating it with data from a source table
                 Key: PHOENIX-3645
                 URL: https://issues.apache.org/jira/browse/PHOENIX-3645
             Project: Phoenix
          Issue Type: New Feature
            Reporter: Samarth Jain


As part of PHOENIX-1598, we are introducing the capability of mapping column names and encoding column values. For users to be able to use this new scheme, they would need to recreate their tables from the scratch. For situations like this, it would be nice to have a mechanism where we can create a new table and fill it with data of the existing table. 

A simple possibility is to disable the source table, take a snapshot of it, create new table using the snapshot of the old table, and drop the old table. However, this would require downtime. 

Another way would be use an UPSERT INTO TARGET TABLE SELECT * FROM SOURCE TABLE or a map reduce job to the bulk load. These mechanisms though have the inherent limitation that they miss the updates to the old table after they were kicked off or after they were complete. To handle the case of these missing updates, a somewhat crazy idea would be mark the new table as an index on the existing table. The index table would have the same exact schema as the data table. Incremental changes would then be automatically taken care of by our index change mechanism. We can then use our existing map reduce index build job to bulk load the "old" data into the new table.

There is a slight chance that we would miss the update happening to the source table when we are in the process of doing the index->table conversion.

One way to handle that would be store the physical hbase table name for a phoenix table in the SYSTEM.CATALOG. Then the reducer of the map reduce job would simply have to change this mapping in the SYSTEM.CATALOG table. This should cause the new updates to go to the new hbase table. 

There are probably some edge cases or gotchas that I am not thinking about right now. [~jamestaylor], probably has more thoughts on this.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)