You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Shaosu Liu <Sh...@turn.com> on 2014/04/24 21:02:30 UTC

concurrent scan optimization

Hi,

I want to large amount of concurrent scan (10k), each fetching 200 - 500 keys, each key is around 500 - 1k bytes.

I have 5 regions servers using offheap cache, data is fully cached and all machines are on the same rack with gigabit connection. So GC or disk should not slow down me much. 

Scan caching is set to 10000, each region server has 200 handlers.

Currently it takes around 30 seconds to process 10k scan concurrently. If I use asynchbase, it takes around 18seconds. It should be able to possible do this around 1 - 2 seconds. 

Could anybody shed light on what I am doing wrong here?

Thanks,
~Shaosu Liu

RE: concurrent scan optimization

Posted by Shaosu Liu <Sh...@turn.com>.

Exactly what happened. I had an error in the calculation. I was trying to make sure hbase can saturate the hardware and that is the case I want.

So the performance is expected.

Thanks,
~Shaosu Liu

________________________________________
From: Vladimir Rodionov [vrodionov@carrieriq.com]
Sent: Thursday, April 24, 2014 1:38 PM
To: user@hbase.apache.org
Subject: RE: concurrent scan optimization

10k scanners fetching 200-500 keys of size 0.5-1kb each

This is somewhere between 1GB and 5GB of data and 2.5 - 12.5  seconds in theory - not 1-2 sec.
If your clients are off the rack than everything depends on your rack's upstream link capacity.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Shaosu Liu [Shaosu.Liu@turn.com]
Sent: Thursday, April 24, 2014 12:02 PM
To: user@hbase.apache.org
Subject: concurrent scan optimization

Hi,

I want to large amount of concurrent scan (10k), each fetching 200 - 500 keys, each key is around 500 - 1k bytes.

I have 5 regions servers using offheap cache, data is fully cached and all machines are on the same rack with gigabit connection. So GC or disk should not slow down me much.

Scan caching is set to 10000, each region server has 200 handlers.

Currently it takes around 30 seconds to process 10k scan concurrently. If I use asynchbase, it takes around 18seconds. It should be able to possible do this around 1 - 2 seconds.

Could anybody shed light on what I am doing wrong here?

Thanks,
~Shaosu Liu

Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.

RE: concurrent scan optimization

Posted by Vladimir Rodionov <vr...@carrieriq.com>.

10k scanners fetching 200-500 keys of size 0.5-1kb each

This is somewhere between 1GB and 5GB of data and 2.5 - 12.5  seconds in theory - not 1-2 sec.
If your clients are off the rack than everything depends on your rack's upstream link capacity.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Shaosu Liu [Shaosu.Liu@turn.com]
Sent: Thursday, April 24, 2014 12:02 PM
To: user@hbase.apache.org
Subject: concurrent scan optimization

Hi,

I want to large amount of concurrent scan (10k), each fetching 200 - 500 keys, each key is around 500 - 1k bytes.

I have 5 regions servers using offheap cache, data is fully cached and all machines are on the same rack with gigabit connection. So GC or disk should not slow down me much.

Scan caching is set to 10000, each region server has 200 handlers.

Currently it takes around 30 seconds to process 10k scan concurrently. If I use asynchbase, it takes around 18seconds. It should be able to possible do this around 1 - 2 seconds.

Could anybody shed light on what I am doing wrong here?

Thanks,
~Shaosu Liu

Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.