You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Jacques <wh...@gmail.com> on 2011/08/05 23:10:02 UTC

Quick Question about Bulk loading of HFiles & Timestamps

Can someone confirm that bulk loading hfiles keeps cell timestamps from
overwriting each other.

For example:
I run mapreduce A job on Monday.
I run mapreduce B job on Tuesday.

I then run LoadIncrementalHFiles on job B first, followed by A.

Please confirm that at the intersection of outputs A & B will be the values
from B.

Thanks,
Jacques

Re: Quick Question about Bulk loading of HFiles & Timestamps

Posted by Jacques <wh...@gmail.com>.
Perfect.

thanks,
Jacques

On Fri, Aug 5, 2011 at 3:53 PM, Todd Lipcon <to...@cloudera.com> wrote:

> Hi Jacques,
>
> Yes, the timestamps are set at the time the MR job runs, not the time
> they're loaded. So, you'll see the values from the job that wrote its
> output most recently.
>
> You can also specify timestamps explicitly for each KeyValue, if you
> prefer.
>
> -Todd
>
> On Fri, Aug 5, 2011 at 2:10 PM, Jacques <wh...@gmail.com> wrote:
> > Can someone confirm that bulk loading hfiles keeps cell timestamps from
> > overwriting each other.
> >
> > For example:
> > I run mapreduce A job on Monday.
> > I run mapreduce B job on Tuesday.
> >
> > I then run LoadIncrementalHFiles on job B first, followed by A.
> >
> > Please confirm that at the intersection of outputs A & B will be the
> values
> > from B.
> >
> > Thanks,
> > Jacques
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: Quick Question about Bulk loading of HFiles & Timestamps

Posted by Todd Lipcon <to...@cloudera.com>.
Hi Jacques,

Yes, the timestamps are set at the time the MR job runs, not the time
they're loaded. So, you'll see the values from the job that wrote its
output most recently.

You can also specify timestamps explicitly for each KeyValue, if you prefer.

-Todd

On Fri, Aug 5, 2011 at 2:10 PM, Jacques <wh...@gmail.com> wrote:
> Can someone confirm that bulk loading hfiles keeps cell timestamps from
> overwriting each other.
>
> For example:
> I run mapreduce A job on Monday.
> I run mapreduce B job on Tuesday.
>
> I then run LoadIncrementalHFiles on job B first, followed by A.
>
> Please confirm that at the intersection of outputs A & B will be the values
> from B.
>
> Thanks,
> Jacques
>



-- 
Todd Lipcon
Software Engineer, Cloudera