You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by James Taylor <jt...@salesforce.com> on 2013/02/09 02:49:46 UTC

independent scans to same region processed serially

Wanted to check with folks and see if they've seen an issue around this 
before digging in deeper. I'm on 0.94.2. If I execute in parallel 
multiple scans to different parts of the same region, they appear to be 
processed serially. It's actually faster from the client side to execute 
a single serial scan than it is to execute multiple parallel scans to 
different segments of the region. I do have region observer coprocessors 
for the table I'm scanning, but my code is not doing any synchronization.

Is there a known limitation in this area? Anyone else see anything similar?

     James

Re: independent scans to same region processed serially

Posted by James Taylor <jt...@salesforce.com>.

Filed https://issues.apache.org/jira/browse/HBASE-7805
Test case attached
It occurs only if the table has a region observer coprocessor.

     James

On 02/09/2013 11:04 AM, lars hofhansl wrote:
> If I execute in parallel multiple scans to different parts of the same region, they appear to be processed serially. It's actually faster from the client side to execute a single serial scan than it is to execute multiple parallel scans to different segments of the region. I do have region observer coprocessors for the table I'm scanning, but my code is not doing any synchronization.
> >>

Re: independent scans to same region processed serially

Posted by lars hofhansl <la...@apache.org>.

If you had something that'd be great. Preferrable with a local/single region server.
(Maybe time to take this private :) )


-- Lars



----- Original Message -----
From: James Taylor <jt...@salesforce.com>
To: "user@hbase.apache.org" <us...@hbase.apache.org>; lars hofhansl <la...@apache.org>
Cc: 
Sent: Saturday, February 9, 2013 9:28 AM
Subject: Re: independent scans to same region processed serially

Ok, thanks. Are you able to repro easily, or would you like me to put something together?

James

On Feb 9, 2013, at 9:02 AM, "lars hofhansl" <la...@apache.org> wrote:

> I looked through the code. Nothing obvious jumps out.
> We can sit together on Monday and run it through a profiler.
> 
> -- Lars
> 
> 
> 
> ----- Original Message -----
> From: James Taylor <jt...@salesforce.com>
> To: "user@hbase.apache.org" <us...@hbase.apache.org>; lars hofhansl <la...@apache.org>
> Cc: 
> Sent: Friday, February 8, 2013 9:52 PM
> Subject: Re: independent scans to same region processed serially
> 
> All data is the blockcache and there are plenty of handlers. To repro, 
> you could:
> - create a table pre-split into, for example, three regions
> - execute serially a scan on the middle region
> - execute two parallel scans each on half of the middle region
> - you'd expect the parallel scan to execute near twice as fast, but 
> we're seeing it execute slower than the serial scan.
> We're using the same HConnection with different HTable instances for 
> each scan.
> 
>      James
> 
> On 02/08/2013 06:51 PM, lars hofhansl wrote:
>> Is your data all in the blockcache, otherwise you might have run into HBASE-7336 (https://issues.apache.org/jira/browse/HBASE-7336).Fixed 0.94.4.
>> I assume you have enough handlers, etc. (i.e. does the same happen if issue multiple scan request across different region of the same region server?)
>> 
>> 
>> -- Lars
>> 
>> 
>> 
>> ________________________________
>>    From: James Taylor <jt...@salesforce.com>
>> To: HBase User <us...@hbase.apache.org>
>> Sent: Friday, February 8, 2013 5:49 PM
>> Subject: independent scans to same region processed serially
>>  
>> Wanted to check with folks and see if they've seen an issue around this before digging in deeper. I'm on 0.94.2. If I execute in parallel multiple scans to different parts of the same region, they appear to be processed serially. It's actually faster from the client side to execute a single serial scan than it is to execute multiple parallel scans to different segments of the region. I do have region observer coprocessors for the table I'm scanning, but my code is not doing any synchronization.
>> 
>> Is there a known limitation in this area? Anyone else see anything similar?
>> 
>>       James

Re: independent scans to same region processed serially

Posted by James Taylor <jt...@salesforce.com>.

Ok, thanks. Are you able to repro easily, or would you like me to put something together?

James

On Feb 9, 2013, at 9:02 AM, "lars hofhansl" <la...@apache.org> wrote:

> I looked through the code. Nothing obvious jumps out.
> We can sit together on Monday and run it through a profiler.
> 
> -- Lars
> 
> 
> 
> ----- Original Message -----
> From: James Taylor <jt...@salesforce.com>
> To: "user@hbase.apache.org" <us...@hbase.apache.org>; lars hofhansl <la...@apache.org>
> Cc: 
> Sent: Friday, February 8, 2013 9:52 PM
> Subject: Re: independent scans to same region processed serially
> 
> All data is the blockcache and there are plenty of handlers. To repro, 
> you could:
> - create a table pre-split into, for example, three regions
> - execute serially a scan on the middle region
> - execute two parallel scans each on half of the middle region
> - you'd expect the parallel scan to execute near twice as fast, but 
> we're seeing it execute slower than the serial scan.
> We're using the same HConnection with different HTable instances for 
> each scan.
> 
>      James
> 
> On 02/08/2013 06:51 PM, lars hofhansl wrote:
>> Is your data all in the blockcache, otherwise you might have run into HBASE-7336 (https://issues.apache.org/jira/browse/HBASE-7336).Fixed 0.94.4.
>> I assume you have enough handlers, etc. (i.e. does the same happen if issue multiple scan request across different region of the same region server?)
>> 
>> 
>> -- Lars
>> 
>> 
>> 
>> ________________________________
>>    From: James Taylor <jt...@salesforce.com>
>> To: HBase User <us...@hbase.apache.org>
>> Sent: Friday, February 8, 2013 5:49 PM
>> Subject: independent scans to same region processed serially
>>   
>> Wanted to check with folks and see if they've seen an issue around this before digging in deeper. I'm on 0.94.2. If I execute in parallel multiple scans to different parts of the same region, they appear to be processed serially. It's actually faster from the client side to execute a single serial scan than it is to execute multiple parallel scans to different segments of the region. I do have region observer coprocessors for the table I'm scanning, but my code is not doing any synchronization.
>> 
>> Is there a known limitation in this area? Anyone else see anything similar?
>> 
>>       James

Re: independent scans to same region processed serially

Posted by lars hofhansl <la...@apache.org>.

I looked through the code. Nothing obvious jumps out.
We can sit together on Monday and run it through a profiler.

-- Lars



----- Original Message -----
From: James Taylor <jt...@salesforce.com>
To: "user@hbase.apache.org" <us...@hbase.apache.org>; lars hofhansl <la...@apache.org>
Cc: 
Sent: Friday, February 8, 2013 9:52 PM
Subject: Re: independent scans to same region processed serially

All data is the blockcache and there are plenty of handlers. To repro, 
you could:
- create a table pre-split into, for example, three regions
- execute serially a scan on the middle region
- execute two parallel scans each on half of the middle region
- you'd expect the parallel scan to execute near twice as fast, but 
we're seeing it execute slower than the serial scan.
We're using the same HConnection with different HTable instances for 
each scan.

     James

On 02/08/2013 06:51 PM, lars hofhansl wrote:
> Is your data all in the blockcache, otherwise you might have run into HBASE-7336 (https://issues.apache.org/jira/browse/HBASE-7336).Fixed 0.94.4.
> I assume you have enough handlers, etc. (i.e. does the same happen if issue multiple scan request across different region of the same region server?)
>
>
> -- Lars
>
>
>
> ________________________________
>   From: James Taylor <jt...@salesforce.com>
> To: HBase User <us...@hbase.apache.org>
> Sent: Friday, February 8, 2013 5:49 PM
> Subject: independent scans to same region processed serially
>  
> Wanted to check with folks and see if they've seen an issue around this before digging in deeper. I'm on 0.94.2. If I execute in parallel multiple scans to different parts of the same region, they appear to be processed serially. It's actually faster from the client side to execute a single serial scan than it is to execute multiple parallel scans to different segments of the region. I do have region observer coprocessors for the table I'm scanning, but my code is not doing any synchronization.
>
> Is there a known limitation in this area? Anyone else see anything similar?
>
>      James

Re: independent scans to same region processed serially

Posted by lars hofhansl <la...@apache.org>.

HBASE-7336 only deal with parallel read on the same HFile, since each HFile only has a single reader.
For scans you want to do seek+read (as opposed to positional reads), the problem with seek+read is that is that can only be done with the single thread.
So HBASE-7336 just switches the read to a positional read if the reader is already locked. (somewhat of a hack)


-- Lars

________________________________
From: ramkrishna vasudevan <ra...@gmail.com>
To: user@hbase.apache.org 
Sent: Saturday, February 9, 2013 2:48 AM
Subject: Re: independent scans to same region processed serially

What do you see in the thread dump?  May be HBASE-7336 deals with scans
hitting the same block of data. But i see from your mail that the scans are
independent of each other and they scan different data but in the same
Region.

Regards
Ram

On Sat, Feb 9, 2013 at 11:22 AM, James Taylor <jt...@salesforce.com>wrote:

> All data is the blockcache and there are plenty of handlers. To repro, you
> could:
> - create a table pre-split into, for example, three regions
> - execute serially a scan on the middle region
> - execute two parallel scans each on half of the middle region
> - you'd expect the parallel scan to execute near twice as fast, but we're
> seeing it execute slower than the serial scan.
> We're using the same HConnection with different HTable instances for each
> scan.
>
>     James
>
>
> On 02/08/2013 06:51 PM, lars hofhansl wrote:
>
>> Is your data all in the blockcache, otherwise you might have run into
>> HBASE-7336 (https://issues.apache.org/**jira/browse/HBASE-7336).Fixed<https://issues.apache.org/jira/browse/HBASE-7336).Fixed>0.94.4.
>> I assume you have enough handlers, etc. (i.e. does the same happen if
>> issue multiple scan request across different region of the same region
>> server?)
>>
>>
>> -- Lars
>>
>>
>>
>> ______________________________**__
>>   From: James Taylor <jt...@salesforce.com>
>> To: HBase User <us...@hbase.apache.org>
>> Sent: Friday, February 8, 2013 5:49 PM
>> Subject: independent scans to same region processed serially
>>   Wanted to check with folks and see if they've seen an issue around this
>> before digging in deeper. I'm on 0.94.2. If I execute in parallel multiple
>> scans to different parts of the same region, they appear to be processed
>> serially. It's actually faster from the client side to execute a single
>> serial scan than it is to execute multiple parallel scans to different
>> segments of the region. I do have region observer coprocessors for the
>> table I'm scanning, but my code is not doing any synchronization.
>>
>> Is there a known limitation in this area? Anyone else see anything
>> similar?
>>
>>      James
>>
>
>

Re: independent scans to same region processed serially

Posted by ramkrishna vasudevan <ra...@gmail.com>.

What do you see in the thread dump?  May be HBASE-7336 deals with scans
hitting the same block of data. But i see from your mail that the scans are
independent of each other and they scan different data but in the same
Region.

Regards
Ram

On Sat, Feb 9, 2013 at 11:22 AM, James Taylor <jt...@salesforce.com>wrote:

> All data is the blockcache and there are plenty of handlers. To repro, you
> could:
> - create a table pre-split into, for example, three regions
> - execute serially a scan on the middle region
> - execute two parallel scans each on half of the middle region
> - you'd expect the parallel scan to execute near twice as fast, but we're
> seeing it execute slower than the serial scan.
> We're using the same HConnection with different HTable instances for each
> scan.
>
>     James
>
>
> On 02/08/2013 06:51 PM, lars hofhansl wrote:
>
>> Is your data all in the blockcache, otherwise you might have run into
>> HBASE-7336 (https://issues.apache.org/**jira/browse/HBASE-7336).Fixed<https://issues.apache.org/jira/browse/HBASE-7336).Fixed>0.94.4.
>> I assume you have enough handlers, etc. (i.e. does the same happen if
>> issue multiple scan request across different region of the same region
>> server?)
>>
>>
>> -- Lars
>>
>>
>>
>> ______________________________**__
>>   From: James Taylor <jt...@salesforce.com>
>> To: HBase User <us...@hbase.apache.org>
>> Sent: Friday, February 8, 2013 5:49 PM
>> Subject: independent scans to same region processed serially
>>   Wanted to check with folks and see if they've seen an issue around this
>> before digging in deeper. I'm on 0.94.2. If I execute in parallel multiple
>> scans to different parts of the same region, they appear to be processed
>> serially. It's actually faster from the client side to execute a single
>> serial scan than it is to execute multiple parallel scans to different
>> segments of the region. I do have region observer coprocessors for the
>> table I'm scanning, but my code is not doing any synchronization.
>>
>> Is there a known limitation in this area? Anyone else see anything
>> similar?
>>
>>      James
>>
>
>

Re: independent scans to same region processed serially

Posted by James Taylor <jt...@salesforce.com>.

All data is the blockcache and there are plenty of handlers. To repro, 
you could:
- create a table pre-split into, for example, three regions
- execute serially a scan on the middle region
- execute two parallel scans each on half of the middle region
- you'd expect the parallel scan to execute near twice as fast, but 
we're seeing it execute slower than the serial scan.
We're using the same HConnection with different HTable instances for 
each scan.

     James

On 02/08/2013 06:51 PM, lars hofhansl wrote:
> Is your data all in the blockcache, otherwise you might have run into HBASE-7336 (https://issues.apache.org/jira/browse/HBASE-7336).Fixed 0.94.4.
> I assume you have enough handlers, etc. (i.e. does the same happen if issue multiple scan request across different region of the same region server?)
>
>
> -- Lars
>
>
>
> ________________________________
>   From: James Taylor <jt...@salesforce.com>
> To: HBase User <us...@hbase.apache.org>
> Sent: Friday, February 8, 2013 5:49 PM
> Subject: independent scans to same region processed serially
>   
> Wanted to check with folks and see if they've seen an issue around this before digging in deeper. I'm on 0.94.2. If I execute in parallel multiple scans to different parts of the same region, they appear to be processed serially. It's actually faster from the client side to execute a single serial scan than it is to execute multiple parallel scans to different segments of the region. I do have region observer coprocessors for the table I'm scanning, but my code is not doing any synchronization.
>
> Is there a known limitation in this area? Anyone else see anything similar?
>
>      James

Re: independent scans to same region processed serially

Posted by lars hofhansl <la...@apache.org>.

Is your data all in the blockcache, otherwise you might have run into HBASE-7336 (https://issues.apache.org/jira/browse/HBASE-7336).Fixed 0.94.4.
I assume you have enough handlers, etc. (i.e. does the same happen if issue multiple scan request across different region of the same region server?)


-- Lars



________________________________
 From: James Taylor <jt...@salesforce.com>
To: HBase User <us...@hbase.apache.org> 
Sent: Friday, February 8, 2013 5:49 PM
Subject: independent scans to same region processed serially
 
Wanted to check with folks and see if they've seen an issue around this before digging in deeper. I'm on 0.94.2. If I execute in parallel multiple scans to different parts of the same region, they appear to be processed serially. It's actually faster from the client side to execute a single serial scan than it is to execute multiple parallel scans to different segments of the region. I do have region observer coprocessors for the table I'm scanning, but my code is not doing any synchronization.

Is there a known limitation in this area? Anyone else see anything similar?

    James

Re: independent scans to same region processed serially

Posted by Marcos Ortiz <ml...@uci.cu>.

Regards, James,
Hari Kumar, from Ericsson Labs, in Data && Knowledge blog talked about 
these issues:
http://labs.ericsson.com/blog/hbase-performance-tuners

It would be nice to talk with him to convince him to share its knowledge 
here in the list, or in the
next HBaseCon


On 02/08/2013 08:49 PM, James Taylor wrote:
> Wanted to check with folks and see if they've seen an issue around 
> this before digging in deeper. I'm on 0.94.2. If I execute in parallel 
> multiple scans to different parts of the same region, they appear to 
> be processed serially. It's actually faster from the client side to 
> execute a single serial scan than it is to execute multiple parallel 
> scans to different segments of the region. I do have region observer 
> coprocessors for the table I'm scanning, but my code is not doing any 
> synchronization.
>
> Is there a known limitation in this area? Anyone else see anything 
> similar?
>
>     James

-- 
Marcos Ortiz Valmaseda,
Product Manager && Data Scientist at UCI
Blog: http://marcosluis2186.posterous.com
Twitter: @marcosluis2186 <http://twitter.com/marcosluis2186>