You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Sam Seigal <se...@yahoo.com> on 2011/08/03 02:13:35 UTC

quick query about importtsv

Hi All,

I am using the importtsv tool to load some data into an hbase cluster. Some
of the row keys + cf:qualifier might occur more than once with a different
value in the files I have generated. I would expect this to just create two
versions of the record with the different values. However, I am only seeing
one version (the latest one) being created in the table. Is this expected
from the tool ? It does not look like from the code ...

My table is set with the default to maintain up to 3 versions.

Is this behavior expected ?

Thank you,

Sam

Re: quick query about importtsv

Posted by "Gan, Xiyun" <ga...@gmail.com>.

H Seigal,
  importtsv tool is not applicable to your case.For advanced usage of
bulkload, please dig into ImportTsv.java and check the JavaDoc for
HFileOutputFormat. And
https://issues.apache.org/jira/browse/HBASE-1861 is helpful if
multi-family support is required.

On Wed, Aug 3, 2011 at 8:13 AM, Sam Seigal <se...@yahoo.com> wrote:
> Hi All,
>
> I am using the importtsv tool to load some data into an hbase cluster. Some
> of the row keys + cf:qualifier might occur more than once with a different
> value in the files I have generated. I would expect this to just create two
> versions of the record with the different values. However, I am only seeing
> one version (the latest one) being created in the table. Is this expected
> from the tool ? It does not look like from the code ...
>
> My table is set with the default to maintain up to 3 versions.
>
> Is this behavior expected ?
>
> Thank you,
>
> Sam
>



-- 
Best wishes
Gan, Xiyun

Re: quick query about importtsv

Posted by Sam Seigal <se...@yahoo.com>.

I just noticed that all keyvalues written from a single map instance for
importtsv have the same version timestamp. This I think will not produce
multiple versions of the same row keys are located in the same mapper chunk.
Why not use a new version timestamp for every put ? Is there a specific
reason to use the same version timestamp for all lines passed into the
mapper ?

On Tue, Aug 2, 2011 at 5:13 PM, Sam Seigal <se...@yahoo.com> wrote:

> Hi All,
>
> I am using the importtsv tool to load some data into an hbase cluster. Some
> of the row keys + cf:qualifier might occur more than once with a different
> value in the files I have generated. I would expect this to just create two
> versions of the record with the different values. However, I am only seeing
> one version (the latest one) being created in the table. Is this expected
> from the tool ? It does not look like from the code ...
>
> My table is set with the default to maintain up to 3 versions.
>
> Is this behavior expected ?
>
> Thank you,
>
> Sam
>