You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "James Violette (JIRA)" <ji...@apache.org> on 2014/03/26 22:14:15 UTC

[jira] [Created] (PHOENIX-898) Extend PhoenixHBaseStorage to specify upsert columns

James Violette created PHOENIX-898:
--------------------------------------

             Summary: Extend PhoenixHBaseStorage to specify upsert columns
                 Key: PHOENIX-898
                 URL: https://issues.apache.org/jira/browse/PHOENIX-898
             Project: Phoenix
          Issue Type: Improvement
    Affects Versions: 3.0.0
            Reporter: James Violette
             Fix For: 3.0.0


We have a Phoenix table with data from multiple sources.  We would like to write a pig script that upserts only data associated with a feed, leaving other data alone.  The current PhoenixHBaseStorage automatically upserts all columns in a table.

Given this table schema as an example, 
create TABLE IF NOT EXISTS MYSCHEMA.MYTABLE
 (NAME varchar not null
  ,D.INFO VARCHAR
  ,D.D1 DOUBLE
  ,D.I1 INTEGER
  ,D.C1 VARCHAR
 CONSTRAINT pk PRIMARY KEY (NAME));	

Assuming 'A' is loaded into pig,

The current syntax loads all columns into MYSCHEMA.MYTABLE:
STORE A into 'hbase://MYSCHEMA.MYTABLE' using
    org.apache.phoenix.pig.PhoenixHBaseStorage('localhost','-batchSize 5000');

We could specify upsert columns after the table in the hbase:// url.  

This column-based example is equivalent to the full table upsert.
STORE A into 'hbase://MYSCHEMA.MYTABLE/NAME,D.INFO,D.D1,D.I1,D.C1' using
    org.apache.phoenix.pig.PhoenixHBaseStorage('localhost','-batchSize 5000');

This column-based example chooses to load only three of the five columns.
STORE A into 'hbase://MYSCHEMA.MYTABLE/NAME,D.INFO,D.I1' using
    org.apache.phoenix.pig.PhoenixHBaseStorage('localhost','-batchSize 5000');

This change would touch 
PhoenixHBaseStorage.setStoreLocation - parse the columns
PhoenixPigConfiguration.configure - add an optional column list parameter.
PhoenixPigConfiguration.setup - create the upsert statement and create the column metadata list

The rest of the code should work as-is.






--
This message was sent by Atlassian JIRA
(v6.2#6252)