You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Amit Sela <am...@infolinks.com> on 2012/11/12 15:39:19 UTC

scan is slower after bulk load

Hi all,

Anyone has any idea why scanning over specific range in a table is about
20% slower if that data (that specific range) was just inserted into HBase
using bulk load ?

I do the bulk load programmatically with  LoadIncrementalHFiles.

Thanks.

Re: scan is slower after bulk load

Posted by Amit Sela <am...@infolinks.com>.
I gave it a few more shots and it was back to normal...
Bulk loading is faster but more important (for us) it's more stable and
doesn't cause full GC in the region server even if loading it more then
usual.
The map time remains the same. For reduce we chose to write out a sequence
file so it's quite fast, and the bulk load map is extremely fast.
The bulk load reduce is also fast but it depends on the number of regions
in the table. We used our own code so that only specific regions will be
targeted (I think I posted it).

Bottom line - about 30% faster. But I expect it to handle bigger loads
better.
On Nov 22, 2012 11:51 PM, "Asaf Mesika" <as...@gmail.com> wrote:

> Did you end up finding the answer?
> How fast is this method of insertion relative to a simple insert of
> List<Put> ?
>
>
> On 13 בנוב 2012, at 02:29, Bijieshan <bi...@huawei.com> wrote:
>
> > I think one possible reason is block caching. Have you turned the block
> caching off during scanning?
> >
> > Regards,
> >  Jieshan
> > ________________________________________
> > From: Mohammad Tariq [dontariq@gmail.com]
> > Sent: Tuesday, November 13, 2012 1:04
> > To: user@hbase.apache.org
> > Subject: Re: scan is slower after bulk load
> >
> > may be because bulk load writes to the same region thus putting the
> entire
> > load on a single region server.
> >
> > Regards,
> >    Mohammad Tariq
> >
> >
> >
> > On Mon, Nov 12, 2012 at 9:15 PM, Michael Segel <
> michael_segel@hotmail.com>wrote:
> >
> >> Just a guess... have you done any compactions on the table post bulk
> load?
> >>
> >> On Nov 12, 2012, at 8:44 AM, Marcos Ortiz <ml...@uci.cu> wrote:
> >>
> >>> Regards, Amit.
> >>> Did you tuned the RegionServer where you has that data range hosted?
> >>> Why do you say that scans are slower after a bulk load?
> >>> Did you test it before bulk load?
> >>>
> >>> HBase version?
> >>>
> >>> On 11/12/2012 09:39 AM, Amit Sela wrote:
> >>>> Hi all,
> >>>>
> >>>> Anyone has any idea why scanning over specific range in a table is
> about
> >>>> 20% slower if that data (that specific range) was just inserted into
> >> HBase
> >>>> using bulk load ?
> >>>>
> >>>> I do the bulk load programmatically with  LoadIncrementalHFiles.
> >>>>
> >>>> Thanks.
> >>>>
> >>>
> >>> --
> >>>
> >>> Marcos Luis Ortíz Valmaseda
> >>> about.me/marcosortiz <http://about.me/marcosortiz>
> >>> @marcosluis2186 <http://twitter.com/marcosluis2186>
> >>>
> >>>
> >>>
> >>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> >> INFORMATICAS...
> >>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> >>>
> >>> http://www.uci.cu
> >>> http://www.facebook.com/universidad.uci
> >>> http://www.flickr.com/photos/universidad_uci
> >>
> >>
>
>

Re: scan is slower after bulk load

Posted by Asaf Mesika <as...@gmail.com>.
Did you end up finding the answer?
How fast is this method of insertion relative to a simple insert of List<Put> ?


On 13 בנוב 2012, at 02:29, Bijieshan <bi...@huawei.com> wrote:

> I think one possible reason is block caching. Have you turned the block caching off during scanning?
> 
> Regards,
>  Jieshan
> ________________________________________
> From: Mohammad Tariq [dontariq@gmail.com]
> Sent: Tuesday, November 13, 2012 1:04
> To: user@hbase.apache.org
> Subject: Re: scan is slower after bulk load
> 
> may be because bulk load writes to the same region thus putting the entire
> load on a single region server.
> 
> Regards,
>    Mohammad Tariq
> 
> 
> 
> On Mon, Nov 12, 2012 at 9:15 PM, Michael Segel <mi...@hotmail.com>wrote:
> 
>> Just a guess... have you done any compactions on the table post bulk load?
>> 
>> On Nov 12, 2012, at 8:44 AM, Marcos Ortiz <ml...@uci.cu> wrote:
>> 
>>> Regards, Amit.
>>> Did you tuned the RegionServer where you has that data range hosted?
>>> Why do you say that scans are slower after a bulk load?
>>> Did you test it before bulk load?
>>> 
>>> HBase version?
>>> 
>>> On 11/12/2012 09:39 AM, Amit Sela wrote:
>>>> Hi all,
>>>> 
>>>> Anyone has any idea why scanning over specific range in a table is about
>>>> 20% slower if that data (that specific range) was just inserted into
>> HBase
>>>> using bulk load ?
>>>> 
>>>> I do the bulk load programmatically with  LoadIncrementalHFiles.
>>>> 
>>>> Thanks.
>>>> 
>>> 
>>> --
>>> 
>>> Marcos Luis Ortíz Valmaseda
>>> about.me/marcosortiz <http://about.me/marcosortiz>
>>> @marcosluis2186 <http://twitter.com/marcosluis2186>
>>> 
>>> 
>>> 
>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
>> INFORMATICAS...
>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>> 
>>> http://www.uci.cu
>>> http://www.facebook.com/universidad.uci
>>> http://www.flickr.com/photos/universidad_uci
>> 
>> 


RE: scan is slower after bulk load

Posted by Bijieshan <bi...@huawei.com>.
I think one possible reason is block caching. Have you turned the block caching off during scanning?

Regards,
  Jieshan
________________________________________
From: Mohammad Tariq [dontariq@gmail.com]
Sent: Tuesday, November 13, 2012 1:04
To: user@hbase.apache.org
Subject: Re: scan is slower after bulk load

may be because bulk load writes to the same region thus putting the entire
load on a single region server.

Regards,
    Mohammad Tariq



On Mon, Nov 12, 2012 at 9:15 PM, Michael Segel <mi...@hotmail.com>wrote:

> Just a guess... have you done any compactions on the table post bulk load?
>
> On Nov 12, 2012, at 8:44 AM, Marcos Ortiz <ml...@uci.cu> wrote:
>
> > Regards, Amit.
> > Did you tuned the RegionServer where you has that data range hosted?
> > Why do you say that scans are slower after a bulk load?
> > Did you test it before bulk load?
> >
> > HBase version?
> >
> > On 11/12/2012 09:39 AM, Amit Sela wrote:
> >> Hi all,
> >>
> >> Anyone has any idea why scanning over specific range in a table is about
> >> 20% slower if that data (that specific range) was just inserted into
> HBase
> >> using bulk load ?
> >>
> >> I do the bulk load programmatically with  LoadIncrementalHFiles.
> >>
> >> Thanks.
> >>
> >
> > --
> >
> > Marcos Luis Ortíz Valmaseda
> > about.me/marcosortiz <http://about.me/marcosortiz>
> > @marcosluis2186 <http://twitter.com/marcosluis2186>
> >
> >
> >
> > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> INFORMATICAS...
> > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> >
> > http://www.uci.cu
> > http://www.facebook.com/universidad.uci
> > http://www.flickr.com/photos/universidad_uci
>
>

Re: scan is slower after bulk load

Posted by Mohammad Tariq <do...@gmail.com>.
may be because bulk load writes to the same region thus putting the entire
load on a single region server.

Regards,
    Mohammad Tariq



On Mon, Nov 12, 2012 at 9:15 PM, Michael Segel <mi...@hotmail.com>wrote:

> Just a guess... have you done any compactions on the table post bulk load?
>
> On Nov 12, 2012, at 8:44 AM, Marcos Ortiz <ml...@uci.cu> wrote:
>
> > Regards, Amit.
> > Did you tuned the RegionServer where you has that data range hosted?
> > Why do you say that scans are slower after a bulk load?
> > Did you test it before bulk load?
> >
> > HBase version?
> >
> > On 11/12/2012 09:39 AM, Amit Sela wrote:
> >> Hi all,
> >>
> >> Anyone has any idea why scanning over specific range in a table is about
> >> 20% slower if that data (that specific range) was just inserted into
> HBase
> >> using bulk load ?
> >>
> >> I do the bulk load programmatically with  LoadIncrementalHFiles.
> >>
> >> Thanks.
> >>
> >
> > --
> >
> > Marcos Luis Ortíz Valmaseda
> > about.me/marcosortiz <http://about.me/marcosortiz>
> > @marcosluis2186 <http://twitter.com/marcosluis2186>
> >
> >
> >
> > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> INFORMATICAS...
> > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> >
> > http://www.uci.cu
> > http://www.facebook.com/universidad.uci
> > http://www.flickr.com/photos/universidad_uci
>
>

Re: scan is slower after bulk load

Posted by Michael Segel <mi...@hotmail.com>.
Just a guess... have you done any compactions on the table post bulk load? 

On Nov 12, 2012, at 8:44 AM, Marcos Ortiz <ml...@uci.cu> wrote:

> Regards, Amit.
> Did you tuned the RegionServer where you has that data range hosted?
> Why do you say that scans are slower after a bulk load?
> Did you test it before bulk load?
> 
> HBase version?
> 
> On 11/12/2012 09:39 AM, Amit Sela wrote:
>> Hi all,
>> 
>> Anyone has any idea why scanning over specific range in a table is about
>> 20% slower if that data (that specific range) was just inserted into HBase
>> using bulk load ?
>> 
>> I do the bulk load programmatically with  LoadIncrementalHFiles.
>> 
>> Thanks.
>> 
> 
> -- 
> 
> Marcos Luis Ortíz Valmaseda
> about.me/marcosortiz <http://about.me/marcosortiz>
> @marcosluis2186 <http://twitter.com/marcosluis2186>
> 
> 
> 
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> 
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci


Re: scan is slower after bulk load

Posted by Marcos Ortiz <ml...@uci.cu>.
Regards, Amit.
Did you tuned the RegionServer where you has that data range hosted?
Why do you say that scans are slower after a bulk load?
Did you test it before bulk load?

HBase version?

On 11/12/2012 09:39 AM, Amit Sela wrote:
> Hi all,
>
> Anyone has any idea why scanning over specific range in a table is about
> 20% slower if that data (that specific range) was just inserted into HBase
> using bulk load ?
>
> I do the bulk load programmatically with  LoadIncrementalHFiles.
>
> Thanks.
>

-- 

Marcos Luis Ortíz Valmaseda
about.me/marcosortiz <http://about.me/marcosortiz>
@marcosluis2186 <http://twitter.com/marcosluis2186>



10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci