You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Eric Yang (JIRA)" <ji...@apache.org> on 2011/01/28 18:42:44 UTC

[jira] Created: (PIG-1832) Support timestamp in HBaseStorage

Support timestamp in HBaseStorage
---------------------------------

                 Key: PIG-1832
                 URL: https://issues.apache.org/jira/browse/PIG-1832
             Project: Pig
          Issue Type: Improvement
         Environment: Java 6, Mac OS X 10.6
            Reporter: Eric Yang


When storing data into HBase using org.apache.pig.backend.hadoop.hbase.HBaseStorage, HBase timestamp field is stored with insertion time of the mapreduce job.  It would be nice to have a way to populate timestamp from user data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] [Commented] (PIG-1832) Support timestamp in HBaseStorage

Posted by "Andrew Clegg (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13274773#comment-13274773 ] 

Andrew Clegg commented on PIG-1832:
-----------------------------------

This would be really handy e.g. for replaying log files into hbase after a failure. So the cells could be dated with the actual time of the event, for example.
                
> Support timestamp in HBaseStorage
> ---------------------------------
>
>                 Key: PIG-1832
>                 URL: https://issues.apache.org/jira/browse/PIG-1832
>             Project: Pig
>          Issue Type: Improvement
>         Environment: Java 6, Mac OS X 10.6
>            Reporter: Eric Yang
>
> When storing data into HBase using org.apache.pig.backend.hadoop.hbase.HBaseStorage, HBase timestamp field is stored with insertion time of the mapreduce job.  It would be nice to have a way to populate timestamp from user data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-1832) Support timestamp in HBaseStorage

Posted by "Bill Graham (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072427#comment-13072427 ] 

Bill Graham commented on PIG-1832:
----------------------------------

@Vincent, timestamp filtering at read time is being implemented as part of PIG-2114 FYI.

> Support timestamp in HBaseStorage
> ---------------------------------
>
>                 Key: PIG-1832
>                 URL: https://issues.apache.org/jira/browse/PIG-1832
>             Project: Pig
>          Issue Type: Improvement
>         Environment: Java 6, Mac OS X 10.6
>            Reporter: Eric Yang
>
> When storing data into HBase using org.apache.pig.backend.hadoop.hbase.HBaseStorage, HBase timestamp field is stored with insertion time of the mapreduce job.  It would be nice to have a way to populate timestamp from user data.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-1832) Support timestamp in HBaseStorage

Posted by "Eric Yang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503522#comment-13503522 ] 

Eric Yang commented on PIG-1832:
--------------------------------

For loading HBase data with timestamp, the API could look like this:

{code}
a = load 'hbase://table1' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf1:*', 
  '-loadKey -gt $START -caster Utf8StorageConverter -timeRange $startTs,$endTs');
{code}

For storing, I am inclined to suggest a new callback user defined function in HBaseStorage as parameter, this will enable to extract timestamp from row key, and set the timestamp at cell level.  For example:

{code}
STORE table2 INTO 'table2' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf1:s1 cf2:s2', 
  '-cb org.apache.pig.backend.hadoop.hbase.TimestampExtractor("\\w+-\\d+-\\w+")');
{code}

It could also be used by setting data with bulk loaded timestamp:

{code}
STORE table2 INTO 'table2' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf1:s1 cf2:s2', 
  '-cb org.apache.pig.backend.hadoop.hbase.TimestampSetter($ts)');
{code}

Any thoughts?
                
> Support timestamp in HBaseStorage
> ---------------------------------
>
>                 Key: PIG-1832
>                 URL: https://issues.apache.org/jira/browse/PIG-1832
>             Project: Pig
>          Issue Type: Improvement
>         Environment: Java 6, Mac OS X 10.6
>            Reporter: Eric Yang
>
> When storing data into HBase using org.apache.pig.backend.hadoop.hbase.HBaseStorage, HBase timestamp field is stored with insertion time of the mapreduce job.  It would be nice to have a way to populate timestamp from user data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-1832) Support timestamp in HBaseStorage

Posted by "Vincent BARAT (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072325#comment-13072325 ] 

Vincent BARAT commented on PIG-1832:
------------------------------------

It would be definitively nice if timestamp could be also specified when loading data: in the same way the -lt and -gt options work for row keys, it would be nice to be able to specify a timestamp threshold.

> Support timestamp in HBaseStorage
> ---------------------------------
>
>                 Key: PIG-1832
>                 URL: https://issues.apache.org/jira/browse/PIG-1832
>             Project: Pig
>          Issue Type: Improvement
>         Environment: Java 6, Mac OS X 10.6
>            Reporter: Eric Yang
>
> When storing data into HBase using org.apache.pig.backend.hadoop.hbase.HBaseStorage, HBase timestamp field is stored with insertion time of the mapreduce job.  It would be nice to have a way to populate timestamp from user data.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira