You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Andy Sautins <an...@returnpath.net> on 2011/03/28 18:36:42 UTC

passing timestamp into importtsv...

   We have been having a lot of success using the importtsv utility to load data into HBase as described in the wiki (http://hbase.apache.org/bulk-loads.html).  The one issue we have run into is that we would like to assign a specific timestamp to the records associated with the import.  The current ImportTsv.java class sets the timestamp to the current time ( ts = System.currentTimeMillis() ).  We have a patch we have been using that if a system property is  set ( importtsv.timestamp ) to set the timestamp from the property.  If the property is not set to use the current time.  This has been very helpful for us and allows for  more control in setting the timestamps for imported records.

   My question is is this useful functionality in general?  If so I'd be happy to submit a JIRA and patch with the appropriate changes.

   Thanks

   Andy

Re: passing timestamp into importtsv...

Posted by Stack <st...@duboce.net>.
This would be generally useful I'd say.
Thank you Andy,
St.Ack

On Mon, Mar 28, 2011 at 9:36 AM, Andy Sautins
<an...@returnpath.net> wrote:
>
>   We have been having a lot of success using the importtsv utility to load data into HBase as described in the wiki (http://hbase.apache.org/bulk-loads.html).  The one issue we have run into is that we would like to assign a specific timestamp to the records associated with the import.  The current ImportTsv.java class sets the timestamp to the current time ( ts = System.currentTimeMillis() ).  We have a patch we have been using that if a system property is  set ( importtsv.timestamp ) to set the timestamp from the property.  If the property is not set to use the current time.  This has been very helpful for us and allows for  more control in setting the timestamps for imported records.
>
>   My question is is this useful functionality in general?  If so I'd be happy to submit a JIRA and patch with the appropriate changes.
>
>   Thanks
>
>   Andy
>

RE: passing timestamp into importtsv...

Posted by Andy Sautins <an...@returnpath.net>.
   Discouraging setting timestamps seems to make sense.  In our situation we bulk import ever 'x' minutes and if for some reason one of the older imports fails and has to be restarted after a later import happens we would like to import the older records at the appropriate timestamp before the timestamp of the later import.  It sounds like that may be one of the situations that could trigger some internals edges cases, correct?  

   Also, just as a separate note since the timestamp is set in the Mapper if the import has more than one mapper I wouldn't get a consistent timestamp for all the records for a given load.  For our use case it is helpful to be able to identify all records associated with a given import.

   I went ahead and added a JIRA ( HBASE-3705 ) and uploaded the basic patch.  I'll update the documentation as well.  

   Thanks

   Andy

-----Original Message-----
From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of Jean-Daniel Cryans
Sent: Monday, March 28, 2011 10:51 AM
To: user@hbase.apache.org
Subject: Re: passing timestamp into importtsv...

I have two thoughts about it:

1- We generally discourage users setting their own timestamps since it
messes with the internals in some edge cases. Adding this
functionality goes against that.
2- Almost every interface we offer lets users set their own
timestamps, so to be more consistent we should indeed offer it for
importtsv.

So I think you should open a jira and post your patch.

J-D

On Mon, Mar 28, 2011 at 9:36 AM, Andy Sautins
<an...@returnpath.net> wrote:
>
>   We have been having a lot of success using the importtsv utility to load data into HBase as described in the wiki (http://hbase.apache.org/bulk-loads.html).  The one issue we have run into is that we would like to assign a specific timestamp to the records associated with the import.  The current ImportTsv.java class sets the timestamp to the current time ( ts = System.currentTimeMillis() ).  We have a patch we have been using that if a system property is  set ( importtsv.timestamp ) to set the timestamp from the property.  If the property is not set to use the current time.  This has been very helpful for us and allows for  more control in setting the timestamps for imported records.
>
>   My question is is this useful functionality in general?  If so I'd be happy to submit a JIRA and patch with the appropriate changes.
>
>   Thanks
>
>   Andy
>

Re: passing timestamp into importtsv...

Posted by Jean-Daniel Cryans <jd...@apache.org>.
I have two thoughts about it:

1- We generally discourage users setting their own timestamps since it
messes with the internals in some edge cases. Adding this
functionality goes against that.
2- Almost every interface we offer lets users set their own
timestamps, so to be more consistent we should indeed offer it for
importtsv.

So I think you should open a jira and post your patch.

J-D

On Mon, Mar 28, 2011 at 9:36 AM, Andy Sautins
<an...@returnpath.net> wrote:
>
>   We have been having a lot of success using the importtsv utility to load data into HBase as described in the wiki (http://hbase.apache.org/bulk-loads.html).  The one issue we have run into is that we would like to assign a specific timestamp to the records associated with the import.  The current ImportTsv.java class sets the timestamp to the current time ( ts = System.currentTimeMillis() ).  We have a patch we have been using that if a system property is  set ( importtsv.timestamp ) to set the timestamp from the property.  If the property is not set to use the current time.  This has been very helpful for us and allows for  more control in setting the timestamps for imported records.
>
>   My question is is this useful functionality in general?  If so I'd be happy to submit a JIRA and patch with the appropriate changes.
>
>   Thanks
>
>   Andy
>