You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Vinay Kashyap <vi...@gmail.com> on 2013/10/22 13:47:22 UTC

High CPU utilization in few Region servers during read

Hi,

I am running HBase 0.94.6 (cdh-4.4.0) with 25 region servers.
I am testing a scenario to read and write only from/to RAM.

I have the following settings
Table precreated with 25 regions.
HFile size - 48 GB
MemStore size - 72 GB
Heap size - 96 GB

These settings are to avoid any flushes to the disk. Data need not be
persisted.

I am able to achieve a load throughput of 75K ops per region server.
While reading 23 region servers are serving requests with throughput of 55K
ops, but randomly 2 of the region servers always end up serving few 100 ops.

In these 2 region servers the CPU usage is very high and close to 100%
continuously bringing down the overall throughput. I did not observe any
long GC pauses in this time.

I also tried applying the patch for HBASE-9428 issue, but still faced the
same problem.
Thread dump for the affected region server is at
http://pastebin.com/JGx9gXnm

Any hints on how to solve this.?

Thanks and regards
Vinay S Kashyap

Re: Fwd: High CPU utilization in few Region servers during read

Posted by Ted Yu <yu...@gmail.com>.

Vinay:
Can you get jstack of the 2 region servers during load and pastebin them ?

Thanks


On Fri, Oct 25, 2013 at 7:53 AM, Vinay Kashyap <vi...@gmail.com> wrote:

> Hi Lars,
>
> Yes, I understand that it is not advisable to configure memstore with such
> a huge value, but I wanted to test HBase for the scalability of HBase when
> data is completely in memory and also I want to avoid disk access as I have
> single disk configured with RAID-1, which is not an optimized for HDFS.
>
> I have disabled write to WAL also.
>
> Few more observations are
> 1. Out of 25 region servers, 23 region servers, serve around 60K ops till
> completion, but only 2 region servers, start with not more than 17k ops.
> 2. These 2 region servers endup with less than 100 ops at the final stages
> dragging the overall time taken to a bigger value (say 1500 seconds) where
> all the other region servers are finished.( say in 200 seconds).
> 3. I verified no other processes are running in these 2 region servers to
> put the load.
> 4. If the number of regions are increased, say table precreated with 50,
> 100, 200 etc..the load on the these region servers are reduced and serve
> more requests. ( But still with a notable difference with other region
> servers )
>
> So, I wanted to understand, what extra work the 2 region servers are packed
> up with to see a reduced performance like this.
>
>
> Thanks and regards
> Vinay Kashyap
>
>
>
>
> On Fri, Oct 25, 2013 at 2:16 PM, lars hofhansl <la...@apache.org> wrote:
>
> > No this is different.
> >
> > All your data is in the memstore still.
> >
> > The memstore is organized as a skip list, nobody has ever tested that
> with
> > 72gb. 256mb, 512mb, 1gb, sure... 72gb... no way.
> > Same with a 96gb of java heap. Not with Oracle or OpenJDK and an
> > application specifically for such large heaps.
> > I would keep it under 30gb.
> >
> >
> > I think what you want is the following:
> > 1. disable WAL writes (you don't care if you lose data)
> > 2. lower your memstore size, so you'll see some flushes and eventually
> > compactions.
> >
> > 3. Don't give the JVM more the 30g or so
> > 4. Flush your memstores to disk. They'll end up in the block cache that
> way
> >
> > Currently we can't fill the block cache without flushing to disk.
> >
> > Maybe HBase is not the right solution. If you need a large ephemeral
> > in-memory store, maybe look at memcache?
> >
> >
> > -- Lars
> >
> >
> >
> > ________________________________
> >  From: Vinay Kashyap <vi...@gmail.com>
> > To: user@hbase.apache.org
> > Sent: Wednesday, October 23, 2013 5:57 PM
> > Subject: Fwd: High CPU utilization in few Region servers during read
> >
> >
> > From the thread dump looks like so many threads are stuck at
> >
> > org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1535)
> >
> > org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1523)
> >
> >
> >
> java.util.concurrent.ConcurrentSkipListMap$ComparableUsingComparator.compareTo(ConcurrentSkipListMap.java:647)
> >
> >
> >
> java.util.concurrent.ConcurrentSkipListMap.findNear(ConcurrentSkipListMap.java:1346)
> >
> > Is this similar to HBASE-9428 issue.??
> >
> > Waiting for some help regarding this.. :)
> >
> >
> > Thanks and regards
> > Vinay S Kashyap
> >
> >
> >
> > ---------- Forwarded message ----------
> > From: Vinay Kashyap <vi...@gmail.com>
> > Date: Tue, Oct 22, 2013 at 8:47 PM
> > Subject: High CPU utilization in few Region servers during read
> > To: user@hbase.apache.org
> >
> >
> > Hi,
> >
> > I am running HBase 0.94.6 (cdh-4.4.0) with 25 region servers.
> > I am testing a scenario to read and write only from/to RAM.
> >
> > I have the following settings
> > Table precreated with 25 regions.
> > HFile size - 48 GB
> > MemStore size - 72 GB
> > Heap size - 96 GB
> >
> > These settings are to avoid any flushes to the disk. Data need not be
> > persisted.
> >
> > I am able to achieve a load throughput of 75K ops per region server.
> > While reading 23 region servers are serving requests with throughput of
> 55K
> > ops, but randomly 2 of the region servers always end up serving few 100
> > ops.
> >
> > In these 2 region servers the CPU usage is very high and close to 100%
> > continuously bringing down the overall throughput. I did not observe any
> > long GC pauses in this time.
> >
> > I also tried applying the patch for HBASE-9428 issue, but still faced the
> > same problem.
> > Thread dump for the affected region server is at
> > http://pastebin.com/JGx9gXnm
> >
> > Any hints on how to solve this.?
> >
> > Thanks and regards
> > Vinay S Kashyap
> >
>

Re: Fwd: High CPU utilization in few Region servers during read

Posted by Vinay Kashyap <vi...@gmail.com>.

Hi Lars,

Yes, I understand that it is not advisable to configure memstore with such
a huge value, but I wanted to test HBase for the scalability of HBase when
data is completely in memory and also I want to avoid disk access as I have
single disk configured with RAID-1, which is not an optimized for HDFS.

I have disabled write to WAL also.

Few more observations are
1. Out of 25 region servers, 23 region servers, serve around 60K ops till
completion, but only 2 region servers, start with not more than 17k ops.
2. These 2 region servers endup with less than 100 ops at the final stages
dragging the overall time taken to a bigger value (say 1500 seconds) where
all the other region servers are finished.( say in 200 seconds).
3. I verified no other processes are running in these 2 region servers to
put the load.
4. If the number of regions are increased, say table precreated with 50,
100, 200 etc..the load on the these region servers are reduced and serve
more requests. ( But still with a notable difference with other region
servers )

So, I wanted to understand, what extra work the 2 region servers are packed
up with to see a reduced performance like this.


Thanks and regards
Vinay Kashyap




On Fri, Oct 25, 2013 at 2:16 PM, lars hofhansl <la...@apache.org> wrote:

> No this is different.
>
> All your data is in the memstore still.
>
> The memstore is organized as a skip list, nobody has ever tested that with
> 72gb. 256mb, 512mb, 1gb, sure... 72gb... no way.
> Same with a 96gb of java heap. Not with Oracle or OpenJDK and an
> application specifically for such large heaps.
> I would keep it under 30gb.
>
>
> I think what you want is the following:
> 1. disable WAL writes (you don't care if you lose data)
> 2. lower your memstore size, so you'll see some flushes and eventually
> compactions.
>
> 3. Don't give the JVM more the 30g or so
> 4. Flush your memstores to disk. They'll end up in the block cache that way
>
> Currently we can't fill the block cache without flushing to disk.
>
> Maybe HBase is not the right solution. If you need a large ephemeral
> in-memory store, maybe look at memcache?
>
>
> -- Lars
>
>
>
> ________________________________
>  From: Vinay Kashyap <vi...@gmail.com>
> To: user@hbase.apache.org
> Sent: Wednesday, October 23, 2013 5:57 PM
> Subject: Fwd: High CPU utilization in few Region servers during read
>
>
> From the thread dump looks like so many threads are stuck at
>
> org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1535)
>
> org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1523)
>
>
> java.util.concurrent.ConcurrentSkipListMap$ComparableUsingComparator.compareTo(ConcurrentSkipListMap.java:647)
>
>
> java.util.concurrent.ConcurrentSkipListMap.findNear(ConcurrentSkipListMap.java:1346)
>
> Is this similar to HBASE-9428 issue.??
>
> Waiting for some help regarding this.. :)
>
>
> Thanks and regards
> Vinay S Kashyap
>
>
>
> ---------- Forwarded message ----------
> From: Vinay Kashyap <vi...@gmail.com>
> Date: Tue, Oct 22, 2013 at 8:47 PM
> Subject: High CPU utilization in few Region servers during read
> To: user@hbase.apache.org
>
>
> Hi,
>
> I am running HBase 0.94.6 (cdh-4.4.0) with 25 region servers.
> I am testing a scenario to read and write only from/to RAM.
>
> I have the following settings
> Table precreated with 25 regions.
> HFile size - 48 GB
> MemStore size - 72 GB
> Heap size - 96 GB
>
> These settings are to avoid any flushes to the disk. Data need not be
> persisted.
>
> I am able to achieve a load throughput of 75K ops per region server.
> While reading 23 region servers are serving requests with throughput of 55K
> ops, but randomly 2 of the region servers always end up serving few 100
> ops.
>
> In these 2 region servers the CPU usage is very high and close to 100%
> continuously bringing down the overall throughput. I did not observe any
> long GC pauses in this time.
>
> I also tried applying the patch for HBASE-9428 issue, but still faced the
> same problem.
> Thread dump for the affected region server is at
> http://pastebin.com/JGx9gXnm
>
> Any hints on how to solve this.?
>
> Thanks and regards
> Vinay S Kashyap
>

Re: Fwd: High CPU utilization in few Region servers during read

Posted by lars hofhansl <la...@apache.org>.

No this is different.

All your data is in the memstore still.

The memstore is organized as a skip list, nobody has ever tested that with 72gb. 256mb, 512mb, 1gb, sure... 72gb... no way.
Same with a 96gb of java heap. Not with Oracle or OpenJDK and an application specifically for such large heaps.
I would keep it under 30gb.


I think what you want is the following:
1. disable WAL writes (you don't care if you lose data)
2. lower your memstore size, so you'll see some flushes and eventually compactions.

3. Don't give the JVM more the 30g or so
4. Flush your memstores to disk. They'll end up in the block cache that way

Currently we can't fill the block cache without flushing to disk.

Maybe HBase is not the right solution. If you need a large ephemeral in-memory store, maybe look at memcache?


-- Lars



________________________________
 From: Vinay Kashyap <vi...@gmail.com>
To: user@hbase.apache.org 
Sent: Wednesday, October 23, 2013 5:57 PM
Subject: Fwd: High CPU utilization in few Region servers during read
 

>From the thread dump looks like so many threads are stuck at

org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1535)

org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1523)

java.util.concurrent.ConcurrentSkipListMap$ComparableUsingComparator.compareTo(ConcurrentSkipListMap.java:647)

java.util.concurrent.ConcurrentSkipListMap.findNear(ConcurrentSkipListMap.java:1346)

Is this similar to HBASE-9428 issue.??

Waiting for some help regarding this.. :)


Thanks and regards
Vinay S Kashyap



---------- Forwarded message ----------
From: Vinay Kashyap <vi...@gmail.com>
Date: Tue, Oct 22, 2013 at 8:47 PM
Subject: High CPU utilization in few Region servers during read
To: user@hbase.apache.org


Hi,

I am running HBase 0.94.6 (cdh-4.4.0) with 25 region servers.
I am testing a scenario to read and write only from/to RAM.

I have the following settings
Table precreated with 25 regions.
HFile size - 48 GB
MemStore size - 72 GB
Heap size - 96 GB

These settings are to avoid any flushes to the disk. Data need not be
persisted.

I am able to achieve a load throughput of 75K ops per region server.
While reading 23 region servers are serving requests with throughput of 55K
ops, but randomly 2 of the region servers always end up serving few 100 ops.

In these 2 region servers the CPU usage is very high and close to 100%
continuously bringing down the overall throughput. I did not observe any
long GC pauses in this time.

I also tried applying the patch for HBASE-9428 issue, but still faced the
same problem.
Thread dump for the affected region server is at
http://pastebin.com/JGx9gXnm

Any hints on how to solve this.?

Thanks and regards
Vinay S Kashyap

Fwd: High CPU utilization in few Region servers during read

Posted by Vinay Kashyap <vi...@gmail.com>.

>From the thread dump looks like so many threads are stuck at

org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1535)

org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1523)

java.util.concurrent.ConcurrentSkipListMap$ComparableUsingComparator.compareTo(ConcurrentSkipListMap.java:647)

java.util.concurrent.ConcurrentSkipListMap.findNear(ConcurrentSkipListMap.java:1346)

Is this similar to HBASE-9428 issue.??

Waiting for some help regarding this.. :)


Thanks and regards
Vinay S Kashyap


---------- Forwarded message ----------
From: Vinay Kashyap <vi...@gmail.com>
Date: Tue, Oct 22, 2013 at 8:47 PM
Subject: High CPU utilization in few Region servers during read
To: user@hbase.apache.org


Hi,

I am running HBase 0.94.6 (cdh-4.4.0) with 25 region servers.
I am testing a scenario to read and write only from/to RAM.

I have the following settings
Table precreated with 25 regions.
HFile size - 48 GB
MemStore size - 72 GB
Heap size - 96 GB

These settings are to avoid any flushes to the disk. Data need not be
persisted.

I am able to achieve a load throughput of 75K ops per region server.
While reading 23 region servers are serving requests with throughput of 55K
ops, but randomly 2 of the region servers always end up serving few 100 ops.

In these 2 region servers the CPU usage is very high and close to 100%
continuously bringing down the overall throughput. I did not observe any
long GC pauses in this time.

I also tried applying the patch for HBASE-9428 issue, but still faced the
same problem.
Thread dump for the affected region server is at
http://pastebin.com/JGx9gXnm

Any hints on how to solve this.?

Thanks and regards
Vinay S Kashyap