You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Eric Yang (JIRA)" <ji...@apache.org> on 2011/01/28 18:42:44 UTC
[jira] Created: (PIG-1832) Support timestamp in HBaseStorage
Support timestamp in HBaseStorage
---------------------------------
Key: PIG-1832
URL: https://issues.apache.org/jira/browse/PIG-1832
Project: Pig
Issue Type: Improvement
Environment: Java 6, Mac OS X 10.6
Reporter: Eric Yang
When storing data into HBase using org.apache.pig.backend.hadoop.hbase.HBaseStorage, HBase timestamp field is stored with insertion time of the mapreduce job. It would be nice to have a way to populate timestamp from user data.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] [Commented] (PIG-1832) Support timestamp in HBaseStorage
Posted by "Andrew Clegg (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13274773#comment-13274773 ]
Andrew Clegg commented on PIG-1832:
-----------------------------------
This would be really handy e.g. for replaying log files into hbase after a failure. So the cells could be dated with the actual time of the event, for example.
> Support timestamp in HBaseStorage
> ---------------------------------
>
> Key: PIG-1832
> URL: https://issues.apache.org/jira/browse/PIG-1832
> Project: Pig
> Issue Type: Improvement
> Environment: Java 6, Mac OS X 10.6
> Reporter: Eric Yang
>
> When storing data into HBase using org.apache.pig.backend.hadoop.hbase.HBaseStorage, HBase timestamp field is stored with insertion time of the mapreduce job. It would be nice to have a way to populate timestamp from user data.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1832) Support timestamp in HBaseStorage
Posted by "Bill Graham (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072427#comment-13072427 ]
Bill Graham commented on PIG-1832:
----------------------------------
@Vincent, timestamp filtering at read time is being implemented as part of PIG-2114 FYI.
> Support timestamp in HBaseStorage
> ---------------------------------
>
> Key: PIG-1832
> URL: https://issues.apache.org/jira/browse/PIG-1832
> Project: Pig
> Issue Type: Improvement
> Environment: Java 6, Mac OS X 10.6
> Reporter: Eric Yang
>
> When storing data into HBase using org.apache.pig.backend.hadoop.hbase.HBaseStorage, HBase timestamp field is stored with insertion time of the mapreduce job. It would be nice to have a way to populate timestamp from user data.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1832) Support timestamp in HBaseStorage
Posted by "Eric Yang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503522#comment-13503522 ]
Eric Yang commented on PIG-1832:
--------------------------------
For loading HBase data with timestamp, the API could look like this:
{code}
a = load 'hbase://table1' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf1:*',
'-loadKey -gt $START -caster Utf8StorageConverter -timeRange $startTs,$endTs');
{code}
For storing, I am inclined to suggest a new callback user defined function in HBaseStorage as parameter, this will enable to extract timestamp from row key, and set the timestamp at cell level. For example:
{code}
STORE table2 INTO 'table2' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf1:s1 cf2:s2',
'-cb org.apache.pig.backend.hadoop.hbase.TimestampExtractor("\\w+-\\d+-\\w+")');
{code}
It could also be used by setting data with bulk loaded timestamp:
{code}
STORE table2 INTO 'table2' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf1:s1 cf2:s2',
'-cb org.apache.pig.backend.hadoop.hbase.TimestampSetter($ts)');
{code}
Any thoughts?
> Support timestamp in HBaseStorage
> ---------------------------------
>
> Key: PIG-1832
> URL: https://issues.apache.org/jira/browse/PIG-1832
> Project: Pig
> Issue Type: Improvement
> Environment: Java 6, Mac OS X 10.6
> Reporter: Eric Yang
>
> When storing data into HBase using org.apache.pig.backend.hadoop.hbase.HBaseStorage, HBase timestamp field is stored with insertion time of the mapreduce job. It would be nice to have a way to populate timestamp from user data.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1832) Support timestamp in HBaseStorage
Posted by "Vincent BARAT (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072325#comment-13072325 ]
Vincent BARAT commented on PIG-1832:
------------------------------------
It would be definitively nice if timestamp could be also specified when loading data: in the same way the -lt and -gt options work for row keys, it would be nice to be able to specify a timestamp threshold.
> Support timestamp in HBaseStorage
> ---------------------------------
>
> Key: PIG-1832
> URL: https://issues.apache.org/jira/browse/PIG-1832
> Project: Pig
> Issue Type: Improvement
> Environment: Java 6, Mac OS X 10.6
> Reporter: Eric Yang
>
> When storing data into HBase using org.apache.pig.backend.hadoop.hbase.HBaseStorage, HBase timestamp field is stored with insertion time of the mapreduce job. It would be nice to have a way to populate timestamp from user data.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira