You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by Gagan Agrawal <ga...@xebia.com> on 2014/07/13 07:48:32 UTC

RE: 100% CPU Utilization in HBase Region Server with Phoenix

Hi,
We are using Phoenix 4 and Hbase 0.98. We have just started loading HBase with 300 million records per hour via Phoenix MR Bulk loader. After around 8-9 hours (with 2.5 billion records in HBase) we have found that CPU utilization on all of our region servers is almost 100% making cluster unusable. All of our queries are now hanged and we are not getting any response. I took thread dump for one of the region server (attached) and found following w.r.t phoenix. It also seems that similar issue has been raised by someone else https://issues.apache.org/jira/browse/PHOENIX-1081 . Not sure if it is exactly same or not. But looks similar. Can you please look into this and let us know the cause for same.

Thread 73953: (state = IN_JAVA)
- org.apache.phoenix.filter.SkipScanFilter.navigate(byte[], int, int, org.apache.phoenix.filter.SkipScanFilter$Terminate) @bci=630, line=341 (Compiled frame; information may be imprecise)
- org.apache.phoenix.filter.SkipScanFilter.filterKeyValue(org.apache.hadoop.hbase.Cell) @bci=22, line=116 (Compiled frame)
- org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(org.apache.hadoop.hbase.KeyValue) @bci=594, line=392 (Compiled frame)
- org.apache.hadoop.hbase.regionserver.StoreScanner.next(java.util.List, int) @bci=240, line=469 (Compiled frame)
- org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(java.util.List, int) @bci=20, line=140 (Compiled frame)
- org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(java.util.List, org.apache.hadoop.hbase.regionserver.KeyValueHeap, int, byte[], int, short) @bci=10, line=3848 (Compiled frame)
- org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(java.util.List, int) @bci=253, line=3928 (Compiled frame)
- org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(java.util.List, int) @bci=12, line=3796 (Compiled frame)
- org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(java.util.List) @bci=6, line=3787 (Compiled frame)
- org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver.scanUnordered(org.apache.hadoop.hbase.coprocessor.ObserverContext, org.apache.hadoop.hbase.client.Scan, org.apache.hadoop.hbase.regionserver.RegionScanner, java.util.List, org.apache.phoenix.expression.aggregator.ServerAggregators, long) @bci=230, line=386 (Compiled frame)
- org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver.doPostScannerOpen(org.apache.hadoop.hbase.coprocessor.ObserverContext, org.apache.hadoop.hbase.client.Scan, org.apache.hadoop.hbase.regionserver.RegionScanner) @bci=184, line=133 (Interpreted frame)
- org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(org.apache.hadoop.hbase.coprocessor.ObserverContext, org.apache.hadoop.hbase.client.Scan, org.apache.hadoop.hbase.regionserver.RegionScanner) @bci=4, line=66 (Interpreted frame)
- org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postScannerOpen(org.apache.hadoop.hbase.client.Scan, org.apache.hadoop.hbase.regionserver.RegionScanner) @bci=86, line=1663 (Compiled frame)
- org.apache.hadoop.hbase.regionserver.HRegionServer.scan(com.google.protobuf.RpcController, org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest) @bci=459, line=3093 (Compiled frame)
- org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(com.google.protobuf.Descriptors$MethodDescriptor, com.google.protobuf.RpcController, com.google.protobuf.Message) @bci=103, line=28861 (Interpreted frame)
- org.apache.hadoop.hbase.ipc.RpcServer.call(com.google.protobuf.BlockingService, com.google.protobuf.Descriptors$MethodDescriptor, com.google.protobuf.Message, org.apache.hadoop.hbase.CellScanner, long, org.apache.hadoop.hbase.monitoring.MonitoredRPCHandler) @bci=59, line=2008 (Interpreted frame)
- org.apache.hadoop.hbase.ipc.CallRunner.run() @bci=257, line=92 (Interpreted frame)
- org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(java.util.concurrent.BlockingQueue) @bci=18, line=160 (Interpreted frame)
- org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(org.apache.hadoop.hbase.ipc.SimpleRpcScheduler, java.util.concurrent.BlockingQueue) @bci=2, line=38 (Interpreted frame)
- org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run() @bci=8, line=110 (Interpreted frame)
- java.lang.Thread.run() @bci=11, line=744 (Interpreted frame)

Thanks and Regards,
Gagan Agrawal

Re: 100% CPU Utilization in HBase Region Server with Phoenix

Posted by James Taylor <ja...@apache.org>.
Hi Gagan,
Sorry you're running into problems. You may have hit a bug in skip
scan. The skip scan filter acts as a finite state machine. If you can
isolate the row *before* this state and the incoming KeyValue that
causes this issue, then we'll have the information we need to fix it.
If you could package it as a unit test like this one, that would be
great ideal: https://github.com/apache/phoenix/blob/master/phoenix-core/src/test/java/org/apache/phoenix/filter/SkipScanFilterTest.java

As a quick test, you can disable use of the skip scan by adding the
following hint to your queries:
    SELECT /*+ RANGE_SCAN */  ...
This will force a range scan to be performed instead of a skip scan.
Performance will likely be less (substantially if you're doing point
look ups).

Thanks,
James

On Sun, Jul 13, 2014 at 7:48 AM, Gagan Agrawal <ga...@xebia.com> wrote:
> Hi,
>
> We are using Phoenix 4 and Hbase 0.98. We have just started loading HBase
> with 300 million records per hour via Phoenix MR Bulk loader. After around
> 8-9 hours (with 2.5 billion records in HBase) we have found that CPU
> utilization on all of our region servers is almost 100% making cluster
> unusable. All of our queries are now hanged and we are not getting any
> response. I took thread dump for one of the region server (attached) and
> found following w.r.t phoenix. It also seems that similar issue has been
> raised by someone else https://issues.apache.org/jira/browse/PHOENIX-1081 .
> Not sure if it is exactly same or not. But looks similar. Can you please
> look into this and let us know the cause for same.
>
>
>
> Thread 73953: (state = IN_JAVA)
>
> - org.apache.phoenix.filter.SkipScanFilter.navigate(byte[], int, int,
> org.apache.phoenix.filter.SkipScanFilter$Terminate) @bci=630, line=341
> (Compiled frame; information may be imprecise)
>
> -
> org.apache.phoenix.filter.SkipScanFilter.filterKeyValue(org.apache.hadoop.hbase.Cell)
> @bci=22, line=116 (Compiled frame)
>
> -
> org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(org.apache.hadoop.hbase.KeyValue)
> @bci=594, line=392 (Compiled frame)
>
> - org.apache.hadoop.hbase.regionserver.StoreScanner.next(java.util.List,
> int) @bci=240, line=469 (Compiled frame)
>
> - org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(java.util.List,
> int) @bci=20, line=140 (Compiled frame)
>
> -
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(java.util.List,
> org.apache.hadoop.hbase.regionserver.KeyValueHeap, int, byte[], int, short)
> @bci=10, line=3848 (Compiled frame)
>
> -
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(java.util.List,
> int) @bci=253, line=3928 (Compiled frame)
>
> -
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(java.util.List,
> int) @bci=12, line=3796 (Compiled frame)
>
> -
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(java.util.List)
> @bci=6, line=3787 (Compiled frame)
>
> -
> org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver.scanUnordered(org.apache.hadoop.hbase.coprocessor.ObserverContext,
> org.apache.hadoop.hbase.client.Scan,
> org.apache.hadoop.hbase.regionserver.RegionScanner, java.util.List,
> org.apache.phoenix.expression.aggregator.ServerAggregators, long) @bci=230,
> line=386 (Compiled frame)
>
> -
> org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver.doPostScannerOpen(org.apache.hadoop.hbase.coprocessor.ObserverContext,
> org.apache.hadoop.hbase.client.Scan,
> org.apache.hadoop.hbase.regionserver.RegionScanner) @bci=184, line=133
> (Interpreted frame)
>
> -
> org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(org.apache.hadoop.hbase.coprocessor.ObserverContext,
> org.apache.hadoop.hbase.client.Scan,
> org.apache.hadoop.hbase.regionserver.RegionScanner) @bci=4, line=66
> (Interpreted frame)
>
> -
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postScannerOpen(org.apache.hadoop.hbase.client.Scan,
> org.apache.hadoop.hbase.regionserver.RegionScanner) @bci=86, line=1663
> (Compiled frame)
>
> -
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(com.google.protobuf.RpcController,
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)
> @bci=459, line=3093 (Compiled frame)
>
> -
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(com.google.protobuf.Descriptors$MethodDescriptor,
> com.google.protobuf.RpcController, com.google.protobuf.Message) @bci=103,
> line=28861 (Interpreted frame)
>
> -
> org.apache.hadoop.hbase.ipc.RpcServer.call(com.google.protobuf.BlockingService,
> com.google.protobuf.Descriptors$MethodDescriptor,
> com.google.protobuf.Message, org.apache.hadoop.hbase.CellScanner, long,
> org.apache.hadoop.hbase.monitoring.MonitoredRPCHandler) @bci=59, line=2008
> (Interpreted frame)
>
> - org.apache.hadoop.hbase.ipc.CallRunner.run() @bci=257, line=92
> (Interpreted frame)
>
> -
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(java.util.concurrent.BlockingQueue)
> @bci=18, line=160 (Interpreted frame)
>
> -
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(org.apache.hadoop.hbase.ipc.SimpleRpcScheduler,
> java.util.concurrent.BlockingQueue) @bci=2, line=38 (Interpreted frame)
>
> - org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run() @bci=8, line=110
> (Interpreted frame)
>
> - java.lang.Thread.run() @bci=11, line=744 (Interpreted frame)
>
>
>
> Thanks and Regards,
>
> Gagan Agrawal