You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Bryan Keller <br...@gmail.com> on 2013/07/01 06:23:52 UTC

Re: Poor HBase map-reduce scan performance

I'll attach my patch to HBASE-8369 tomorrow.

On Jun 28, 2013, at 10:56 AM, lars hofhansl <la...@apache.org> wrote:

> If we can make a clean patch with minimal impact to existing code I would be supportive of a backport to 0.94.
> 
> -- Lars
> 
> 
> 
> ----- Original Message -----
> From: Bryan Keller <br...@gmail.com>
> To: user@hbase.apache.org; lars hofhansl <la...@apache.org>
> Cc: 
> Sent: Tuesday, June 25, 2013 1:56 AM
> Subject: Re: Poor HBase map-reduce scan performance
> 
> I tweaked Enis's snapshot input format and backported it to 0.94.6 and have snapshot scanning functional on my system. Performance is dramatically better, as expected i suppose. I'm seeing about 3.6x faster performance vs TableInputFormat. Also, HBase doesn't get bogged down during a scan as the regionserver is being bypassed. I'm very excited by this. There are some issues with file permissions and library dependencies but nothing that can't be worked out.
> 
> On Jun 5, 2013, at 6:03 PM, lars hofhansl <la...@apache.org> wrote:
> 
>> That's exactly the kind of pre-fetching I was investigating a bit ago (made a patch, but ran out of time).
>> This pre-fetching is strictly client only, where the client keeps the server busy while it is processing the previous batch, but filling up a 2nd buffer.
>> 
>> 
>> -- Lars
>> 
>> 
>> 
>> ________________________________
>> From: Sandy Pratt <pr...@adobe.com>
>> To: "user@hbase.apache.org" <us...@hbase.apache.org> 
>> Sent: Wednesday, June 5, 2013 10:58 AM
>> Subject: Re: Poor HBase map-reduce scan performance
>> 
>> 
>> Yong,
>> 
>> As a thought experiment, imagine how it impacts the throughput of TCP to
>> keep the window size at 1.  That means there's only one packet in flight
>> at a time, and total throughput is a fraction of what it could be.
>> 
>> That's effectively what happens with RPC.  The server sends a batch, then
>> does nothing while it waits for the client to ask for more.  During that
>> time, the pipe between them is empty.  Increasing the batch size can help
>> a bit, in essence creating a really huge packet, but the problem remains.
>> There will always be stalls in the pipe.
>> 
>> What you want is for the window size to be large enough that the pipe is
>> saturated.  A streaming API accomplishes that by stuffing data down the
>> network pipe as quickly as possible.
>> 
>> Sandy
>> 
>> On 6/5/13 7:55 AM, "yonghu" <yo...@gmail.com> wrote:
>> 
>>> Can anyone explain why client + rpc + server will decrease the performance
>>> of scanning? I mean the Regionserver and Tasktracker are the same node
>>> when
>>> you use MapReduce to scan the HBase table. So, in my understanding, there
>>> will be no rpc cost.
>>> 
>>> Thanks!
>>> 
>>> Yong
>>> 
>>> 
>>> On Wed, Jun 5, 2013 at 10:09 AM, Sandy Pratt <pr...@adobe.com> wrote:
>>> 
>>>> https://issues.apache.org/jira/browse/HBASE-8691
>>>> 
>>>> 
>>>> On 6/4/13 6:11 PM, "Sandy Pratt" <pr...@adobe.com> wrote:
>>>> 
>>>>> Haven't had a chance to write a JIRA yet, but I thought I'd pop in here
>>>>> with an update in the meantime.
>>>>> 
>>>>> I tried a number of different approaches to eliminate latency and
>>>>> "bubbles" in the scan pipeline, and eventually arrived at adding a
>>>>> streaming scan API to the region server, along with refactoring the
>>>> scan
>>>>> interface into an event-drive message receiver interface.  In so
>>>> doing, I
>>>>> was able to take scan speed on my cluster from 59,537 records/sec with
>>>> the
>>>>> classic scanner to 222,703 records per second with my new scan API.
>>>>> Needless to say, I'm pleased ;)
>>>>> 
>>>>> More details forthcoming when I get a chance.
>>>>> 
>>>>> Thanks,
>>>>> Sandy
>>>>> 
>>>>> On 5/23/13 3:47 PM, "Ted Yu" <yu...@gmail.com> wrote:
>>>>> 
>>>>>> Thanks for the update, Sandy.
>>>>>> 
>>>>>> If you can open a JIRA and attach your producer / consumer scanner
>>>> there,
>>>>>> that would be great.
>>>>>> 
>>>>>> On Thu, May 23, 2013 at 3:42 PM, Sandy Pratt <pr...@adobe.com>
>>>> wrote:
>>>>>> 
>>>>>>> I wrote myself a Scanner wrapper that uses a producer/consumer
>>>> queue to
>>>>>>> keep the client fed with a full buffer as much as possible.  When
>>>>>>> scanning
>>>>>>> my table with scanner caching at 100 records, I see about a 24%
>>>> uplift
>>>>>>> in
>>>>>>> performance (~35k records/sec with the ClientScanner and ~44k
>>>>>>> records/sec
>>>>>>> with my P/C scanner).  However, when I set scanner caching to 5000,
>>>>>>> it's
>>>>>>> more of a wash compared to the standard ClientScanner: ~53k
>>>> records/sec
>>>>>>> with the ClientScanner and ~60k records/sec with the P/C scanner.
>>>>>>> 
>>>>>>> I'm not sure what to make of those results.  I think next I'll shut
>>>>>>> down
>>>>>>> HBase and read the HFiles directly, to see if there's a drop off in
>>>>>>> performance between reading them directly vs. via the RegionServer.
>>>>>>> 
>>>>>>> I still think that to really solve this there needs to be sliding
>>>>>>> window
>>>>>>> of records in flight between disk and RS, and between RS and client.
>>>>>>> I'm
>>>>>>> thinking there's probably a single batch of records in flight
>>>> between
>>>>>>> RS
>>>>>>> and client at the moment.
>>>>>>> 
>>>>>>> Sandy
>>>>>>> 
>>>>>>> On 5/23/13 8:45 AM, "Bryan Keller" <br...@gmail.com> wrote:
>>>>>>> 
>>>>>>>> I am considering scanning a snapshot instead of the table. I
>>>> believe
>>>>>>> this
>>>>>>>> is what the ExportSnapshot class does. If I could use the scanning
>>>>>>> code
>>>>>>>> from ExportSnapshot then I will be able to scan the HDFS files
>>>>>>> directly
>>>>>>>> and bypass the regionservers. This could potentially give me a huge
>>>>>>> boost
>>>>>>>> in performance for full table scans. However, it doesn't really
>>>>>>> address
>>>>>>>> the poor scan performance against a table.
>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>

Re: Poor HBase map-reduce scan performance

Posted by Bryan Keller <br...@gmail.com>.

I attached my patch to the JIRA issue, in case anyone is interested. It can pretty easily be used on its own without patching HBase. I am currently doing this.


On Jul 1, 2013, at 2:23 PM, Enis Söztutar <en...@gmail.com> wrote:

> Bryan,
> 
> 3.6x improvement seems exciting. The ballpark difference between HBase scan
> and hdfs scan is in that order, so it is expected I guess.
> 
> I plan to get back to the trunk patch, add more tests etc next week. In the
> mean time, if you have any changes to the patch, pls attach the patch.
> 
> Enis
> 
> 
> On Mon, Jul 1, 2013 at 3:59 AM, lars hofhansl <la...@apache.org> wrote:
> 
>> Absolutely.
>> 
>> 
>> 
>> ----- Original Message -----
>> From: Ted Yu <yu...@gmail.com>
>> To: user@hbase.apache.org
>> Cc:
>> Sent: Sunday, June 30, 2013 9:32 PM
>> Subject: Re: Poor HBase map-reduce scan performance
>> 
>> Looking at the tail of HBASE-8369, there were some comments which are yet
>> to be addressed.
>> 
>> I think trunk patch should be finalized before backporting.
>> 
>> Cheers
>> 
>> On Mon, Jul 1, 2013 at 12:23 PM, Bryan Keller <br...@gmail.com> wrote:
>> 
>>> I'll attach my patch to HBASE-8369 tomorrow.
>>> 
>>> On Jun 28, 2013, at 10:56 AM, lars hofhansl <la...@apache.org> wrote:
>>> 
>>>> If we can make a clean patch with minimal impact to existing code I
>>> would be supportive of a backport to 0.94.
>>>> 
>>>> -- Lars
>>>> 
>>>> 
>>>> 
>>>> ----- Original Message -----
>>>> From: Bryan Keller <br...@gmail.com>
>>>> To: user@hbase.apache.org; lars hofhansl <la...@apache.org>
>>>> Cc:
>>>> Sent: Tuesday, June 25, 2013 1:56 AM
>>>> Subject: Re: Poor HBase map-reduce scan performance
>>>> 
>>>> I tweaked Enis's snapshot input format and backported it to 0.94.6 and
>>> have snapshot scanning functional on my system. Performance is
>> dramatically
>>> better, as expected i suppose. I'm seeing about 3.6x faster performance
>> vs
>>> TableInputFormat. Also, HBase doesn't get bogged down during a scan as
>> the
>>> regionserver is being bypassed. I'm very excited by this. There are some
>>> issues with file permissions and library dependencies but nothing that
>>> can't be worked out.
>>>> 
>>>> On Jun 5, 2013, at 6:03 PM, lars hofhansl <la...@apache.org> wrote:
>>>> 
>>>>> That's exactly the kind of pre-fetching I was investigating a bit ago
>>> (made a patch, but ran out of time).
>>>>> This pre-fetching is strictly client only, where the client keeps the
>>> server busy while it is processing the previous batch, but filling up a
>> 2nd
>>> buffer.
>>>>> 
>>>>> 
>>>>> -- Lars
>>>>> 
>>>>> 
>>>>> 
>>>>> ________________________________
>>>>> From: Sandy Pratt <pr...@adobe.com>
>>>>> To: "user@hbase.apache.org" <us...@hbase.apache.org>
>>>>> Sent: Wednesday, June 5, 2013 10:58 AM
>>>>> Subject: Re: Poor HBase map-reduce scan performance
>>>>> 
>>>>> 
>>>>> Yong,
>>>>> 
>>>>> As a thought experiment, imagine how it impacts the throughput of TCP
>> to
>>>>> keep the window size at 1.  That means there's only one packet in
>> flight
>>>>> at a time, and total throughput is a fraction of what it could be.
>>>>> 
>>>>> That's effectively what happens with RPC.  The server sends a batch,
>>> then
>>>>> does nothing while it waits for the client to ask for more.  During
>> that
>>>>> time, the pipe between them is empty.  Increasing the batch size can
>>> help
>>>>> a bit, in essence creating a really huge packet, but the problem
>>> remains.
>>>>> There will always be stalls in the pipe.
>>>>> 
>>>>> What you want is for the window size to be large enough that the pipe
>> is
>>>>> saturated.  A streaming API accomplishes that by stuffing data down
>> the
>>>>> network pipe as quickly as possible.
>>>>> 
>>>>> Sandy
>>>>> 
>>>>> On 6/5/13 7:55 AM, "yonghu" <yo...@gmail.com> wrote:
>>>>> 
>>>>>> Can anyone explain why client + rpc + server will decrease the
>>> performance
>>>>>> of scanning? I mean the Regionserver and Tasktracker are the same
>> node
>>>>>> when
>>>>>> you use MapReduce to scan the HBase table. So, in my understanding,
>>> there
>>>>>> will be no rpc cost.
>>>>>> 
>>>>>> Thanks!
>>>>>> 
>>>>>> Yong
>>>>>> 
>>>>>> 
>>>>>> On Wed, Jun 5, 2013 at 10:09 AM, Sandy Pratt <pr...@adobe.com>
>>> wrote:
>>>>>> 
>>>>>>> https://issues.apache.org/jira/browse/HBASE-8691
>>>>>>> 
>>>>>>> 
>>>>>>> On 6/4/13 6:11 PM, "Sandy Pratt" <pr...@adobe.com> wrote:
>>>>>>> 
>>>>>>>> Haven't had a chance to write a JIRA yet, but I thought I'd pop in
>>> here
>>>>>>>> with an update in the meantime.
>>>>>>>> 
>>>>>>>> I tried a number of different approaches to eliminate latency and
>>>>>>>> "bubbles" in the scan pipeline, and eventually arrived at adding a
>>>>>>>> streaming scan API to the region server, along with refactoring the
>>>>>>> scan
>>>>>>>> interface into an event-drive message receiver interface.  In so
>>>>>>> doing, I
>>>>>>>> was able to take scan speed on my cluster from 59,537 records/sec
>>> with
>>>>>>> the
>>>>>>>> classic scanner to 222,703 records per second with my new scan API.
>>>>>>>> Needless to say, I'm pleased ;)
>>>>>>>> 
>>>>>>>> More details forthcoming when I get a chance.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Sandy
>>>>>>>> 
>>>>>>>> On 5/23/13 3:47 PM, "Ted Yu" <yu...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>>> Thanks for the update, Sandy.
>>>>>>>>> 
>>>>>>>>> If you can open a JIRA and attach your producer / consumer scanner
>>>>>>> there,
>>>>>>>>> that would be great.
>>>>>>>>> 
>>>>>>>>> On Thu, May 23, 2013 at 3:42 PM, Sandy Pratt <pr...@adobe.com>
>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> I wrote myself a Scanner wrapper that uses a producer/consumer
>>>>>>> queue to
>>>>>>>>>> keep the client fed with a full buffer as much as possible.  When
>>>>>>>>>> scanning
>>>>>>>>>> my table with scanner caching at 100 records, I see about a 24%
>>>>>>> uplift
>>>>>>>>>> in
>>>>>>>>>> performance (~35k records/sec with the ClientScanner and ~44k
>>>>>>>>>> records/sec
>>>>>>>>>> with my P/C scanner).  However, when I set scanner caching to
>> 5000,
>>>>>>>>>> it's
>>>>>>>>>> more of a wash compared to the standard ClientScanner: ~53k
>>>>>>> records/sec
>>>>>>>>>> with the ClientScanner and ~60k records/sec with the P/C scanner.
>>>>>>>>>> 
>>>>>>>>>> I'm not sure what to make of those results.  I think next I'll
>> shut
>>>>>>>>>> down
>>>>>>>>>> HBase and read the HFiles directly, to see if there's a drop off
>> in
>>>>>>>>>> performance between reading them directly vs. via the
>> RegionServer.
>>>>>>>>>> 
>>>>>>>>>> I still think that to really solve this there needs to be sliding
>>>>>>>>>> window
>>>>>>>>>> of records in flight between disk and RS, and between RS and
>>> client.
>>>>>>>>>> I'm
>>>>>>>>>> thinking there's probably a single batch of records in flight
>>>>>>> between
>>>>>>>>>> RS
>>>>>>>>>> and client at the moment.
>>>>>>>>>> 
>>>>>>>>>> Sandy
>>>>>>>>>> 
>>>>>>>>>> On 5/23/13 8:45 AM, "Bryan Keller" <br...@gmail.com> wrote:
>>>>>>>>>> 
>>>>>>>>>>> I am considering scanning a snapshot instead of the table. I
>>>>>>> believe
>>>>>>>>>> this
>>>>>>>>>>> is what the ExportSnapshot class does. If I could use the
>> scanning
>>>>>>>>>> code
>>>>>>>>>>> from ExportSnapshot then I will be able to scan the HDFS files
>>>>>>>>>> directly
>>>>>>>>>>> and bypass the regionservers. This could potentially give me a
>>> huge
>>>>>>>>>> boost
>>>>>>>>>>> in performance for full table scans. However, it doesn't really
>>>>>>>>>> address
>>>>>>>>>>> the poor scan performance against a table.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>> 
>>> 
>>> 
>> 
>>

Re: Poor HBase map-reduce scan performance

Posted by Enis Söztutar <en...@gmail.com>.

Bryan,

3.6x improvement seems exciting. The ballpark difference between HBase scan
and hdfs scan is in that order, so it is expected I guess.

I plan to get back to the trunk patch, add more tests etc next week. In the
mean time, if you have any changes to the patch, pls attach the patch.

Enis


On Mon, Jul 1, 2013 at 3:59 AM, lars hofhansl <la...@apache.org> wrote:

> Absolutely.
>
>
>
> ----- Original Message -----
> From: Ted Yu <yu...@gmail.com>
> To: user@hbase.apache.org
> Cc:
> Sent: Sunday, June 30, 2013 9:32 PM
> Subject: Re: Poor HBase map-reduce scan performance
>
> Looking at the tail of HBASE-8369, there were some comments which are yet
> to be addressed.
>
> I think trunk patch should be finalized before backporting.
>
> Cheers
>
> On Mon, Jul 1, 2013 at 12:23 PM, Bryan Keller <br...@gmail.com> wrote:
>
> > I'll attach my patch to HBASE-8369 tomorrow.
> >
> > On Jun 28, 2013, at 10:56 AM, lars hofhansl <la...@apache.org> wrote:
> >
> > > If we can make a clean patch with minimal impact to existing code I
> > would be supportive of a backport to 0.94.
> > >
> > > -- Lars
> > >
> > >
> > >
> > > ----- Original Message -----
> > > From: Bryan Keller <br...@gmail.com>
> > > To: user@hbase.apache.org; lars hofhansl <la...@apache.org>
> > > Cc:
> > > Sent: Tuesday, June 25, 2013 1:56 AM
> > > Subject: Re: Poor HBase map-reduce scan performance
> > >
> > > I tweaked Enis's snapshot input format and backported it to 0.94.6 and
> > have snapshot scanning functional on my system. Performance is
> dramatically
> > better, as expected i suppose. I'm seeing about 3.6x faster performance
> vs
> > TableInputFormat. Also, HBase doesn't get bogged down during a scan as
> the
> > regionserver is being bypassed. I'm very excited by this. There are some
> > issues with file permissions and library dependencies but nothing that
> > can't be worked out.
> > >
> > > On Jun 5, 2013, at 6:03 PM, lars hofhansl <la...@apache.org> wrote:
> > >
> > >> That's exactly the kind of pre-fetching I was investigating a bit ago
> > (made a patch, but ran out of time).
> > >> This pre-fetching is strictly client only, where the client keeps the
> > server busy while it is processing the previous batch, but filling up a
> 2nd
> > buffer.
> > >>
> > >>
> > >> -- Lars
> > >>
> > >>
> > >>
> > >> ________________________________
> > >> From: Sandy Pratt <pr...@adobe.com>
> > >> To: "user@hbase.apache.org" <us...@hbase.apache.org>
> > >> Sent: Wednesday, June 5, 2013 10:58 AM
> > >> Subject: Re: Poor HBase map-reduce scan performance
> > >>
> > >>
> > >> Yong,
> > >>
> > >> As a thought experiment, imagine how it impacts the throughput of TCP
> to
> > >> keep the window size at 1.  That means there's only one packet in
> flight
> > >> at a time, and total throughput is a fraction of what it could be.
> > >>
> > >> That's effectively what happens with RPC.  The server sends a batch,
> > then
> > >> does nothing while it waits for the client to ask for more.  During
> that
> > >> time, the pipe between them is empty.  Increasing the batch size can
> > help
> > >> a bit, in essence creating a really huge packet, but the problem
> > remains.
> > >> There will always be stalls in the pipe.
> > >>
> > >> What you want is for the window size to be large enough that the pipe
> is
> > >> saturated.  A streaming API accomplishes that by stuffing data down
> the
> > >> network pipe as quickly as possible.
> > >>
> > >> Sandy
> > >>
> > >> On 6/5/13 7:55 AM, "yonghu" <yo...@gmail.com> wrote:
> > >>
> > >>> Can anyone explain why client + rpc + server will decrease the
> > performance
> > >>> of scanning? I mean the Regionserver and Tasktracker are the same
> node
> > >>> when
> > >>> you use MapReduce to scan the HBase table. So, in my understanding,
> > there
> > >>> will be no rpc cost.
> > >>>
> > >>> Thanks!
> > >>>
> > >>> Yong
> > >>>
> > >>>
> > >>> On Wed, Jun 5, 2013 at 10:09 AM, Sandy Pratt <pr...@adobe.com>
> > wrote:
> > >>>
> > >>>> https://issues.apache.org/jira/browse/HBASE-8691
> > >>>>
> > >>>>
> > >>>> On 6/4/13 6:11 PM, "Sandy Pratt" <pr...@adobe.com> wrote:
> > >>>>
> > >>>>> Haven't had a chance to write a JIRA yet, but I thought I'd pop in
> > here
> > >>>>> with an update in the meantime.
> > >>>>>
> > >>>>> I tried a number of different approaches to eliminate latency and
> > >>>>> "bubbles" in the scan pipeline, and eventually arrived at adding a
> > >>>>> streaming scan API to the region server, along with refactoring the
> > >>>> scan
> > >>>>> interface into an event-drive message receiver interface.  In so
> > >>>> doing, I
> > >>>>> was able to take scan speed on my cluster from 59,537 records/sec
> > with
> > >>>> the
> > >>>>> classic scanner to 222,703 records per second with my new scan API.
> > >>>>> Needless to say, I'm pleased ;)
> > >>>>>
> > >>>>> More details forthcoming when I get a chance.
> > >>>>>
> > >>>>> Thanks,
> > >>>>> Sandy
> > >>>>>
> > >>>>> On 5/23/13 3:47 PM, "Ted Yu" <yu...@gmail.com> wrote:
> > >>>>>
> > >>>>>> Thanks for the update, Sandy.
> > >>>>>>
> > >>>>>> If you can open a JIRA and attach your producer / consumer scanner
> > >>>> there,
> > >>>>>> that would be great.
> > >>>>>>
> > >>>>>> On Thu, May 23, 2013 at 3:42 PM, Sandy Pratt <pr...@adobe.com>
> > >>>> wrote:
> > >>>>>>
> > >>>>>>> I wrote myself a Scanner wrapper that uses a producer/consumer
> > >>>> queue to
> > >>>>>>> keep the client fed with a full buffer as much as possible.  When
> > >>>>>>> scanning
> > >>>>>>> my table with scanner caching at 100 records, I see about a 24%
> > >>>> uplift
> > >>>>>>> in
> > >>>>>>> performance (~35k records/sec with the ClientScanner and ~44k
> > >>>>>>> records/sec
> > >>>>>>> with my P/C scanner).  However, when I set scanner caching to
> 5000,
> > >>>>>>> it's
> > >>>>>>> more of a wash compared to the standard ClientScanner: ~53k
> > >>>> records/sec
> > >>>>>>> with the ClientScanner and ~60k records/sec with the P/C scanner.
> > >>>>>>>
> > >>>>>>> I'm not sure what to make of those results.  I think next I'll
> shut
> > >>>>>>> down
> > >>>>>>> HBase and read the HFiles directly, to see if there's a drop off
> in
> > >>>>>>> performance between reading them directly vs. via the
> RegionServer.
> > >>>>>>>
> > >>>>>>> I still think that to really solve this there needs to be sliding
> > >>>>>>> window
> > >>>>>>> of records in flight between disk and RS, and between RS and
> > client.
> > >>>>>>> I'm
> > >>>>>>> thinking there's probably a single batch of records in flight
> > >>>> between
> > >>>>>>> RS
> > >>>>>>> and client at the moment.
> > >>>>>>>
> > >>>>>>> Sandy
> > >>>>>>>
> > >>>>>>> On 5/23/13 8:45 AM, "Bryan Keller" <br...@gmail.com> wrote:
> > >>>>>>>
> > >>>>>>>> I am considering scanning a snapshot instead of the table. I
> > >>>> believe
> > >>>>>>> this
> > >>>>>>>> is what the ExportSnapshot class does. If I could use the
> scanning
> > >>>>>>> code
> > >>>>>>>> from ExportSnapshot then I will be able to scan the HDFS files
> > >>>>>>> directly
> > >>>>>>>> and bypass the regionservers. This could potentially give me a
> > huge
> > >>>>>>> boost
> > >>>>>>>> in performance for full table scans. However, it doesn't really
> > >>>>>>> address
> > >>>>>>>> the poor scan performance against a table.
> > >>>>>>>
> > >>>>>>>
> > >>>>>
> > >>>>
> > >
> >
> >
>
>

Re: Poor HBase map-reduce scan performance

Posted by lars hofhansl <la...@apache.org>.

Absolutely.



----- Original Message -----
From: Ted Yu <yu...@gmail.com>
To: user@hbase.apache.org
Cc: 
Sent: Sunday, June 30, 2013 9:32 PM
Subject: Re: Poor HBase map-reduce scan performance

Looking at the tail of HBASE-8369, there were some comments which are yet
to be addressed.

I think trunk patch should be finalized before backporting.

Cheers

On Mon, Jul 1, 2013 at 12:23 PM, Bryan Keller <br...@gmail.com> wrote:

> I'll attach my patch to HBASE-8369 tomorrow.
>
> On Jun 28, 2013, at 10:56 AM, lars hofhansl <la...@apache.org> wrote:
>
> > If we can make a clean patch with minimal impact to existing code I
> would be supportive of a backport to 0.94.
> >
> > -- Lars
> >
> >
> >
> > ----- Original Message -----
> > From: Bryan Keller <br...@gmail.com>
> > To: user@hbase.apache.org; lars hofhansl <la...@apache.org>
> > Cc:
> > Sent: Tuesday, June 25, 2013 1:56 AM
> > Subject: Re: Poor HBase map-reduce scan performance
> >
> > I tweaked Enis's snapshot input format and backported it to 0.94.6 and
> have snapshot scanning functional on my system. Performance is dramatically
> better, as expected i suppose. I'm seeing about 3.6x faster performance vs
> TableInputFormat. Also, HBase doesn't get bogged down during a scan as the
> regionserver is being bypassed. I'm very excited by this. There are some
> issues with file permissions and library dependencies but nothing that
> can't be worked out.
> >
> > On Jun 5, 2013, at 6:03 PM, lars hofhansl <la...@apache.org> wrote:
> >
> >> That's exactly the kind of pre-fetching I was investigating a bit ago
> (made a patch, but ran out of time).
> >> This pre-fetching is strictly client only, where the client keeps the
> server busy while it is processing the previous batch, but filling up a 2nd
> buffer.
> >>
> >>
> >> -- Lars
> >>
> >>
> >>
> >> ________________________________
> >> From: Sandy Pratt <pr...@adobe.com>
> >> To: "user@hbase.apache.org" <us...@hbase.apache.org>
> >> Sent: Wednesday, June 5, 2013 10:58 AM
> >> Subject: Re: Poor HBase map-reduce scan performance
> >>
> >>
> >> Yong,
> >>
> >> As a thought experiment, imagine how it impacts the throughput of TCP to
> >> keep the window size at 1.  That means there's only one packet in flight
> >> at a time, and total throughput is a fraction of what it could be.
> >>
> >> That's effectively what happens with RPC.  The server sends a batch,
> then
> >> does nothing while it waits for the client to ask for more.  During that
> >> time, the pipe between them is empty.  Increasing the batch size can
> help
> >> a bit, in essence creating a really huge packet, but the problem
> remains.
> >> There will always be stalls in the pipe.
> >>
> >> What you want is for the window size to be large enough that the pipe is
> >> saturated.  A streaming API accomplishes that by stuffing data down the
> >> network pipe as quickly as possible.
> >>
> >> Sandy
> >>
> >> On 6/5/13 7:55 AM, "yonghu" <yo...@gmail.com> wrote:
> >>
> >>> Can anyone explain why client + rpc + server will decrease the
> performance
> >>> of scanning? I mean the Regionserver and Tasktracker are the same node
> >>> when
> >>> you use MapReduce to scan the HBase table. So, in my understanding,
> there
> >>> will be no rpc cost.
> >>>
> >>> Thanks!
> >>>
> >>> Yong
> >>>
> >>>
> >>> On Wed, Jun 5, 2013 at 10:09 AM, Sandy Pratt <pr...@adobe.com>
> wrote:
> >>>
> >>>> https://issues.apache.org/jira/browse/HBASE-8691
> >>>>
> >>>>
> >>>> On 6/4/13 6:11 PM, "Sandy Pratt" <pr...@adobe.com> wrote:
> >>>>
> >>>>> Haven't had a chance to write a JIRA yet, but I thought I'd pop in
> here
> >>>>> with an update in the meantime.
> >>>>>
> >>>>> I tried a number of different approaches to eliminate latency and
> >>>>> "bubbles" in the scan pipeline, and eventually arrived at adding a
> >>>>> streaming scan API to the region server, along with refactoring the
> >>>> scan
> >>>>> interface into an event-drive message receiver interface.  In so
> >>>> doing, I
> >>>>> was able to take scan speed on my cluster from 59,537 records/sec
> with
> >>>> the
> >>>>> classic scanner to 222,703 records per second with my new scan API.
> >>>>> Needless to say, I'm pleased ;)
> >>>>>
> >>>>> More details forthcoming when I get a chance.
> >>>>>
> >>>>> Thanks,
> >>>>> Sandy
> >>>>>
> >>>>> On 5/23/13 3:47 PM, "Ted Yu" <yu...@gmail.com> wrote:
> >>>>>
> >>>>>> Thanks for the update, Sandy.
> >>>>>>
> >>>>>> If you can open a JIRA and attach your producer / consumer scanner
> >>>> there,
> >>>>>> that would be great.
> >>>>>>
> >>>>>> On Thu, May 23, 2013 at 3:42 PM, Sandy Pratt <pr...@adobe.com>
> >>>> wrote:
> >>>>>>
> >>>>>>> I wrote myself a Scanner wrapper that uses a producer/consumer
> >>>> queue to
> >>>>>>> keep the client fed with a full buffer as much as possible.  When
> >>>>>>> scanning
> >>>>>>> my table with scanner caching at 100 records, I see about a 24%
> >>>> uplift
> >>>>>>> in
> >>>>>>> performance (~35k records/sec with the ClientScanner and ~44k
> >>>>>>> records/sec
> >>>>>>> with my P/C scanner).  However, when I set scanner caching to 5000,
> >>>>>>> it's
> >>>>>>> more of a wash compared to the standard ClientScanner: ~53k
> >>>> records/sec
> >>>>>>> with the ClientScanner and ~60k records/sec with the P/C scanner.
> >>>>>>>
> >>>>>>> I'm not sure what to make of those results.  I think next I'll shut
> >>>>>>> down
> >>>>>>> HBase and read the HFiles directly, to see if there's a drop off in
> >>>>>>> performance between reading them directly vs. via the RegionServer.
> >>>>>>>
> >>>>>>> I still think that to really solve this there needs to be sliding
> >>>>>>> window
> >>>>>>> of records in flight between disk and RS, and between RS and
> client.
> >>>>>>> I'm
> >>>>>>> thinking there's probably a single batch of records in flight
> >>>> between
> >>>>>>> RS
> >>>>>>> and client at the moment.
> >>>>>>>
> >>>>>>> Sandy
> >>>>>>>
> >>>>>>> On 5/23/13 8:45 AM, "Bryan Keller" <br...@gmail.com> wrote:
> >>>>>>>
> >>>>>>>> I am considering scanning a snapshot instead of the table. I
> >>>> believe
> >>>>>>> this
> >>>>>>>> is what the ExportSnapshot class does. If I could use the scanning
> >>>>>>> code
> >>>>>>>> from ExportSnapshot then I will be able to scan the HDFS files
> >>>>>>> directly
> >>>>>>>> and bypass the regionservers. This could potentially give me a
> huge
> >>>>>>> boost
> >>>>>>>> in performance for full table scans. However, it doesn't really
> >>>>>>> address
> >>>>>>>> the poor scan performance against a table.
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >
>
>

Re: Poor HBase map-reduce scan performance

Posted by Ted Yu <yu...@gmail.com>.

Looking at the tail of HBASE-8369, there were some comments which are yet
to be addressed.

I think trunk patch should be finalized before backporting.

Cheers

On Mon, Jul 1, 2013 at 12:23 PM, Bryan Keller <br...@gmail.com> wrote:

> I'll attach my patch to HBASE-8369 tomorrow.
>
> On Jun 28, 2013, at 10:56 AM, lars hofhansl <la...@apache.org> wrote:
>
> > If we can make a clean patch with minimal impact to existing code I
> would be supportive of a backport to 0.94.
> >
> > -- Lars
> >
> >
> >
> > ----- Original Message -----
> > From: Bryan Keller <br...@gmail.com>
> > To: user@hbase.apache.org; lars hofhansl <la...@apache.org>
> > Cc:
> > Sent: Tuesday, June 25, 2013 1:56 AM
> > Subject: Re: Poor HBase map-reduce scan performance
> >
> > I tweaked Enis's snapshot input format and backported it to 0.94.6 and
> have snapshot scanning functional on my system. Performance is dramatically
> better, as expected i suppose. I'm seeing about 3.6x faster performance vs
> TableInputFormat. Also, HBase doesn't get bogged down during a scan as the
> regionserver is being bypassed. I'm very excited by this. There are some
> issues with file permissions and library dependencies but nothing that
> can't be worked out.
> >
> > On Jun 5, 2013, at 6:03 PM, lars hofhansl <la...@apache.org> wrote:
> >
> >> That's exactly the kind of pre-fetching I was investigating a bit ago
> (made a patch, but ran out of time).
> >> This pre-fetching is strictly client only, where the client keeps the
> server busy while it is processing the previous batch, but filling up a 2nd
> buffer.
> >>
> >>
> >> -- Lars
> >>
> >>
> >>
> >> ________________________________
> >> From: Sandy Pratt <pr...@adobe.com>
> >> To: "user@hbase.apache.org" <us...@hbase.apache.org>
> >> Sent: Wednesday, June 5, 2013 10:58 AM
> >> Subject: Re: Poor HBase map-reduce scan performance
> >>
> >>
> >> Yong,
> >>
> >> As a thought experiment, imagine how it impacts the throughput of TCP to
> >> keep the window size at 1.  That means there's only one packet in flight
> >> at a time, and total throughput is a fraction of what it could be.
> >>
> >> That's effectively what happens with RPC.  The server sends a batch,
> then
> >> does nothing while it waits for the client to ask for more.  During that
> >> time, the pipe between them is empty.  Increasing the batch size can
> help
> >> a bit, in essence creating a really huge packet, but the problem
> remains.
> >> There will always be stalls in the pipe.
> >>
> >> What you want is for the window size to be large enough that the pipe is
> >> saturated.  A streaming API accomplishes that by stuffing data down the
> >> network pipe as quickly as possible.
> >>
> >> Sandy
> >>
> >> On 6/5/13 7:55 AM, "yonghu" <yo...@gmail.com> wrote:
> >>
> >>> Can anyone explain why client + rpc + server will decrease the
> performance
> >>> of scanning? I mean the Regionserver and Tasktracker are the same node
> >>> when
> >>> you use MapReduce to scan the HBase table. So, in my understanding,
> there
> >>> will be no rpc cost.
> >>>
> >>> Thanks!
> >>>
> >>> Yong
> >>>
> >>>
> >>> On Wed, Jun 5, 2013 at 10:09 AM, Sandy Pratt <pr...@adobe.com>
> wrote:
> >>>
> >>>> https://issues.apache.org/jira/browse/HBASE-8691
> >>>>
> >>>>
> >>>> On 6/4/13 6:11 PM, "Sandy Pratt" <pr...@adobe.com> wrote:
> >>>>
> >>>>> Haven't had a chance to write a JIRA yet, but I thought I'd pop in
> here
> >>>>> with an update in the meantime.
> >>>>>
> >>>>> I tried a number of different approaches to eliminate latency and
> >>>>> "bubbles" in the scan pipeline, and eventually arrived at adding a
> >>>>> streaming scan API to the region server, along with refactoring the
> >>>> scan
> >>>>> interface into an event-drive message receiver interface.  In so
> >>>> doing, I
> >>>>> was able to take scan speed on my cluster from 59,537 records/sec
> with
> >>>> the
> >>>>> classic scanner to 222,703 records per second with my new scan API.
> >>>>> Needless to say, I'm pleased ;)
> >>>>>
> >>>>> More details forthcoming when I get a chance.
> >>>>>
> >>>>> Thanks,
> >>>>> Sandy
> >>>>>
> >>>>> On 5/23/13 3:47 PM, "Ted Yu" <yu...@gmail.com> wrote:
> >>>>>
> >>>>>> Thanks for the update, Sandy.
> >>>>>>
> >>>>>> If you can open a JIRA and attach your producer / consumer scanner
> >>>> there,
> >>>>>> that would be great.
> >>>>>>
> >>>>>> On Thu, May 23, 2013 at 3:42 PM, Sandy Pratt <pr...@adobe.com>
> >>>> wrote:
> >>>>>>
> >>>>>>> I wrote myself a Scanner wrapper that uses a producer/consumer
> >>>> queue to
> >>>>>>> keep the client fed with a full buffer as much as possible.  When
> >>>>>>> scanning
> >>>>>>> my table with scanner caching at 100 records, I see about a 24%
> >>>> uplift
> >>>>>>> in
> >>>>>>> performance (~35k records/sec with the ClientScanner and ~44k
> >>>>>>> records/sec
> >>>>>>> with my P/C scanner).  However, when I set scanner caching to 5000,
> >>>>>>> it's
> >>>>>>> more of a wash compared to the standard ClientScanner: ~53k
> >>>> records/sec
> >>>>>>> with the ClientScanner and ~60k records/sec with the P/C scanner.
> >>>>>>>
> >>>>>>> I'm not sure what to make of those results.  I think next I'll shut
> >>>>>>> down
> >>>>>>> HBase and read the HFiles directly, to see if there's a drop off in
> >>>>>>> performance between reading them directly vs. via the RegionServer.
> >>>>>>>
> >>>>>>> I still think that to really solve this there needs to be sliding
> >>>>>>> window
> >>>>>>> of records in flight between disk and RS, and between RS and
> client.
> >>>>>>> I'm
> >>>>>>> thinking there's probably a single batch of records in flight
> >>>> between
> >>>>>>> RS
> >>>>>>> and client at the moment.
> >>>>>>>
> >>>>>>> Sandy
> >>>>>>>
> >>>>>>> On 5/23/13 8:45 AM, "Bryan Keller" <br...@gmail.com> wrote:
> >>>>>>>
> >>>>>>>> I am considering scanning a snapshot instead of the table. I
> >>>> believe
> >>>>>>> this
> >>>>>>>> is what the ExportSnapshot class does. If I could use the scanning
> >>>>>>> code
> >>>>>>>> from ExportSnapshot then I will be able to scan the HDFS files
> >>>>>>> directly
> >>>>>>>> and bypass the regionservers. This could potentially give me a
> huge
> >>>>>>> boost
> >>>>>>>> in performance for full table scans. However, it doesn't really
> >>>>>>> address
> >>>>>>>> the poor scan performance against a table.
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >
>
>