You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Gaetan Deputier <gd...@ividence.com> on 2013/02/08 01:23:29 UTC

Question regarding bulkload : overwriting duplicate records

Hi HBase users,

I am using Hbase 0.92.1 from the cloudera distribution cdh4.1.1.
I am loading bulk files using the ImportTsv job but i have an issue
regarding records having a different cell value.

I guessed that the underlying Map/Reducer sets the timestamp to the
currentTime. Is there a way to inform the Tsv job to read the timestamp
from a column ?

I can still do my own hadoop mapper and then split the lines and treat them
but i was wondering if the issue on the Hbase Jira (HBASE-5564) which is
solving this problem would be released soon.

Regards,

G.

Re: Question regarding bulkload : overwriting duplicate records

Posted by Gaetan Deputier <gd...@ividence.com>.

Thanks, I appreciate. Keep up the good work.

G.


On Thu, Feb 7, 2013 at 5:04 PM, Ted Yu <yu...@gmail.com> wrote:

> I logged HBASE-7793 to backport.
>
> Cheers
>
> On Thu, Feb 7, 2013 at 4:23 PM, Gaetan Deputier <gd...@ividence.com> wrote:
>
> > Hi HBase users,
> >
> > I am using Hbase 0.92.1 from the cloudera distribution cdh4.1.1.
> > I am loading bulk files using the ImportTsv job but i have an issue
> > regarding records having a different cell value.
> >
> > I guessed that the underlying Map/Reducer sets the timestamp to the
> > currentTime. Is there a way to inform the Tsv job to read the timestamp
> > from a column ?
> >
> > I can still do my own hadoop mapper and then split the lines and treat
> them
> > but i was wondering if the issue on the Hbase Jira (HBASE-5564) which is
> > solving this problem would be released soon.
> >
> > Regards,
> >
> > G.
> >
>

Re: Question regarding bulkload : overwriting duplicate records

Posted by Ted Yu <yu...@gmail.com>.

I logged HBASE-7793 to backport.

Cheers

On Thu, Feb 7, 2013 at 4:23 PM, Gaetan Deputier <gd...@ividence.com> wrote:

> Hi HBase users,
>
> I am using Hbase 0.92.1 from the cloudera distribution cdh4.1.1.
> I am loading bulk files using the ImportTsv job but i have an issue
> regarding records having a different cell value.
>
> I guessed that the underlying Map/Reducer sets the timestamp to the
> currentTime. Is there a way to inform the Tsv job to read the timestamp
> from a column ?
>
> I can still do my own hadoop mapper and then split the lines and treat them
> but i was wondering if the issue on the Hbase Jira (HBASE-5564) which is
> solving this problem would be released soon.
>
> Regards,
>
> G.
>