You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Nanheng Wu <na...@gmail.com> on 2011/03/02 02:07:09 UTC

What's the region server doing?

My cluster (10 nodes, hbase-0.20.6 + hadoop 0.20.2) is very very slow
for any operation like disable table or delete. Master's thread dump
says they are blocked by the metaScanner thread. When I looked at the
log file on the .META RS there are no outputs at all! (INFO debug
level). J-D has been helping me on this, we pretty much figured out
that RegionManager.metaScanner is the culprit, because it's taking
around 25 minutes to scan 8K rows. What I don't get is what the region
server is actually doing during this time. There's no request at all
on the cluster, no RS splits either because we just use a MR job to
output HFiles and never write again.
J-D has been really really helpful, but I feel like I took too much of
his time. Below is the thread dump of the .META RS during the time
when disables command are blocked on meta scanner, can someone help me
figure out what the server is doing, is it running any thread at all?
Thank you!

http://pastebin.com/CZQAywq3

Re: What's the region server doing?

Posted by Stack <st...@duboce.net>.

Can you upgrade to 0.90.x Nanheng?  It doesn't even scan the .META.
like 0.20.x does.
St.Ack

Re: What's the region server doing?

Posted by Nanheng Wu <na...@gmail.com>.

Alright, let's do that. Quick dinner first. Thanks JD!

On Tue, Mar 1, 2011 at 5:48 PM, Jean-Daniel Cryans <jd...@apache.org> wrote:
> next() is the call to get the next row from a scan.
>
> Maybe you aren't looking at the right region server? If you'd like to
> speed up this debugging session, feel free to drop by the #hbase
> channel on freenode, then we could report the results on the mailing
> list.
>
> J-D
>
> On Tue, Mar 1, 2011 at 5:43 PM, Nanheng Wu <na...@gmail.com> wrote:
>> And what's "next?" .... and what's next?
>>
>> On Tue, Mar 1, 2011 at 5:41 PM, Nanheng Wu <na...@gmail.com> wrote:
>>> I just took the stack track of both master and the meta RS. the
>>> master's still waiting for that thread which called "next", but no IPC
>>> Server handler on the RS has that call. Is that possible? Or have I
>>> just stared at this thing for too long?
>>>
>>> On Tue, Mar 1, 2011 at 5:32 PM, Jean-Daniel Cryans <jd...@apache.org> wrote:
>>>> Yes, and on the other side (which is the region server that hosts
>>>> .META.) you should be able to see that call. Well, not that specific
>>>> one, but one of them :)
>>>>
>>>> J-D
>>>>
>>>> On Tue, Mar 1, 2011 at 5:30 PM, Nanheng Wu <na...@gmail.com> wrote:
>>>>> You said "next", I don't know if this related at all but from the
>>>>> master's thread dump, it says the disable is blocked by this thread
>>>>> below, and it calling next:
>>>>>
>>>>> Thread 27 (RegionManager.metaScanner):
>>>>>  State: WAITING
>>>>>  Blocked count: 69503
>>>>>  Waited count: 68805
>>>>>  Waiting on org.apache.hadoop.hbase.ipc.HBaseClient$Call@42fcac6
>>>>>  Stack:
>>>>>    java.lang.Object.wait(Native Method)
>>>>>    java.lang.Object.wait(Object.java:485)
>>>>>    org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:722)
>>>>>    org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:333)
>>>>>    $Proxy1.next(Unknown Source)
>>>>>    org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:179)
>>>>>    org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:73)
>>>>>    org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
>>>>>    org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:153)
>>>>>    org.apache.hadoop.hbase.Chore.run(Chore.java:68)
>>>>>
>>>>> On Tue, Mar 1, 2011 at 5:22 PM, Nanheng Wu <na...@gmail.com> wrote:
>>>>>> Thanks man I'll try that and post back when I find something. BTW, I
>>>>>> ran the script to set the memstore flush size on .META., now I am
>>>>>> seeing a lot less writing to HDFS from the .META RS and less
>>>>>> compaction, unfortunately it's still low. :(
>>>>>>
>>>>>> On Tue, Mar 1, 2011 at 5:15 PM, Jean-Daniel Cryans <jd...@apache.org> wrote:
>>>>>>> In that specific jstack it's doing nothing at all, but keep in mind
>>>>>>> that it's only a snapshot of a precise moment in time. Try jstack'ing
>>>>>>> a few times and at some point you should see the threads named like
>>>>>>> "IPC Server handler xx on 60020" (where xx is a number) showing bigger
>>>>>>> stack traces with HRegionServer doing stuff like get, next, put, etc
>>>>>>>
>>>>>>> You should also try scanning '.META.' from the shell and if it's slow,
>>>>>>> do the jstack'ing at the same time.
>>>>>>>
>>>>>>> J-D
>>>>>>>
>>>>>>> On Tue, Mar 1, 2011 at 5:07 PM, Nanheng Wu <na...@gmail.com> wrote:
>>>>>>>> My cluster (10 nodes, hbase-0.20.6 + hadoop 0.20.2) is very very slow
>>>>>>>> for any operation like disable table or delete. Master's thread dump
>>>>>>>> says they are blocked by the metaScanner thread. When I looked at the
>>>>>>>> log file on the .META RS there are no outputs at all! (INFO debug
>>>>>>>> level). J-D has been helping me on this, we pretty much figured out
>>>>>>>> that RegionManager.metaScanner is the culprit, because it's taking
>>>>>>>> around 25 minutes to scan 8K rows. What I don't get is what the region
>>>>>>>> server is actually doing during this time. There's no request at all
>>>>>>>> on the cluster, no RS splits either because we just use a MR job to
>>>>>>>> output HFiles and never write again.
>>>>>>>> J-D has been really really helpful, but I feel like I took too much of
>>>>>>>> his time. Below is the thread dump of the .META RS during the time
>>>>>>>> when disables command are blocked on meta scanner, can someone help me
>>>>>>>> figure out what the server is doing, is it running any thread at all?
>>>>>>>> Thank you!
>>>>>>>>
>>>>>>>> http://pastebin.com/CZQAywq3
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: What's the region server doing?

Posted by Jean-Daniel Cryans <jd...@apache.org>.

next() is the call to get the next row from a scan.

Maybe you aren't looking at the right region server? If you'd like to
speed up this debugging session, feel free to drop by the #hbase
channel on freenode, then we could report the results on the mailing
list.

J-D

On Tue, Mar 1, 2011 at 5:43 PM, Nanheng Wu <na...@gmail.com> wrote:
> And what's "next?" .... and what's next?
>
> On Tue, Mar 1, 2011 at 5:41 PM, Nanheng Wu <na...@gmail.com> wrote:
>> I just took the stack track of both master and the meta RS. the
>> master's still waiting for that thread which called "next", but no IPC
>> Server handler on the RS has that call. Is that possible? Or have I
>> just stared at this thing for too long?
>>
>> On Tue, Mar 1, 2011 at 5:32 PM, Jean-Daniel Cryans <jd...@apache.org> wrote:
>>> Yes, and on the other side (which is the region server that hosts
>>> .META.) you should be able to see that call. Well, not that specific
>>> one, but one of them :)
>>>
>>> J-D
>>>
>>> On Tue, Mar 1, 2011 at 5:30 PM, Nanheng Wu <na...@gmail.com> wrote:
>>>> You said "next", I don't know if this related at all but from the
>>>> master's thread dump, it says the disable is blocked by this thread
>>>> below, and it calling next:
>>>>
>>>> Thread 27 (RegionManager.metaScanner):
>>>>  State: WAITING
>>>>  Blocked count: 69503
>>>>  Waited count: 68805
>>>>  Waiting on org.apache.hadoop.hbase.ipc.HBaseClient$Call@42fcac6
>>>>  Stack:
>>>>    java.lang.Object.wait(Native Method)
>>>>    java.lang.Object.wait(Object.java:485)
>>>>    org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:722)
>>>>    org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:333)
>>>>    $Proxy1.next(Unknown Source)
>>>>    org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:179)
>>>>    org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:73)
>>>>    org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
>>>>    org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:153)
>>>>    org.apache.hadoop.hbase.Chore.run(Chore.java:68)
>>>>
>>>> On Tue, Mar 1, 2011 at 5:22 PM, Nanheng Wu <na...@gmail.com> wrote:
>>>>> Thanks man I'll try that and post back when I find something. BTW, I
>>>>> ran the script to set the memstore flush size on .META., now I am
>>>>> seeing a lot less writing to HDFS from the .META RS and less
>>>>> compaction, unfortunately it's still low. :(
>>>>>
>>>>> On Tue, Mar 1, 2011 at 5:15 PM, Jean-Daniel Cryans <jd...@apache.org> wrote:
>>>>>> In that specific jstack it's doing nothing at all, but keep in mind
>>>>>> that it's only a snapshot of a precise moment in time. Try jstack'ing
>>>>>> a few times and at some point you should see the threads named like
>>>>>> "IPC Server handler xx on 60020" (where xx is a number) showing bigger
>>>>>> stack traces with HRegionServer doing stuff like get, next, put, etc
>>>>>>
>>>>>> You should also try scanning '.META.' from the shell and if it's slow,
>>>>>> do the jstack'ing at the same time.
>>>>>>
>>>>>> J-D
>>>>>>
>>>>>> On Tue, Mar 1, 2011 at 5:07 PM, Nanheng Wu <na...@gmail.com> wrote:
>>>>>>> My cluster (10 nodes, hbase-0.20.6 + hadoop 0.20.2) is very very slow
>>>>>>> for any operation like disable table or delete. Master's thread dump
>>>>>>> says they are blocked by the metaScanner thread. When I looked at the
>>>>>>> log file on the .META RS there are no outputs at all! (INFO debug
>>>>>>> level). J-D has been helping me on this, we pretty much figured out
>>>>>>> that RegionManager.metaScanner is the culprit, because it's taking
>>>>>>> around 25 minutes to scan 8K rows. What I don't get is what the region
>>>>>>> server is actually doing during this time. There's no request at all
>>>>>>> on the cluster, no RS splits either because we just use a MR job to
>>>>>>> output HFiles and never write again.
>>>>>>> J-D has been really really helpful, but I feel like I took too much of
>>>>>>> his time. Below is the thread dump of the .META RS during the time
>>>>>>> when disables command are blocked on meta scanner, can someone help me
>>>>>>> figure out what the server is doing, is it running any thread at all?
>>>>>>> Thank you!
>>>>>>>
>>>>>>> http://pastebin.com/CZQAywq3
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: What's the region server doing?

Posted by Nanheng Wu <na...@gmail.com>.

And what's "next?" .... and what's next?

On Tue, Mar 1, 2011 at 5:41 PM, Nanheng Wu <na...@gmail.com> wrote:
> I just took the stack track of both master and the meta RS. the
> master's still waiting for that thread which called "next", but no IPC
> Server handler on the RS has that call. Is that possible? Or have I
> just stared at this thing for too long?
>
> On Tue, Mar 1, 2011 at 5:32 PM, Jean-Daniel Cryans <jd...@apache.org> wrote:
>> Yes, and on the other side (which is the region server that hosts
>> .META.) you should be able to see that call. Well, not that specific
>> one, but one of them :)
>>
>> J-D
>>
>> On Tue, Mar 1, 2011 at 5:30 PM, Nanheng Wu <na...@gmail.com> wrote:
>>> You said "next", I don't know if this related at all but from the
>>> master's thread dump, it says the disable is blocked by this thread
>>> below, and it calling next:
>>>
>>> Thread 27 (RegionManager.metaScanner):
>>>  State: WAITING
>>>  Blocked count: 69503
>>>  Waited count: 68805
>>>  Waiting on org.apache.hadoop.hbase.ipc.HBaseClient$Call@42fcac6
>>>  Stack:
>>>    java.lang.Object.wait(Native Method)
>>>    java.lang.Object.wait(Object.java:485)
>>>    org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:722)
>>>    org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:333)
>>>    $Proxy1.next(Unknown Source)
>>>    org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:179)
>>>    org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:73)
>>>    org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
>>>    org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:153)
>>>    org.apache.hadoop.hbase.Chore.run(Chore.java:68)
>>>
>>> On Tue, Mar 1, 2011 at 5:22 PM, Nanheng Wu <na...@gmail.com> wrote:
>>>> Thanks man I'll try that and post back when I find something. BTW, I
>>>> ran the script to set the memstore flush size on .META., now I am
>>>> seeing a lot less writing to HDFS from the .META RS and less
>>>> compaction, unfortunately it's still low. :(
>>>>
>>>> On Tue, Mar 1, 2011 at 5:15 PM, Jean-Daniel Cryans <jd...@apache.org> wrote:
>>>>> In that specific jstack it's doing nothing at all, but keep in mind
>>>>> that it's only a snapshot of a precise moment in time. Try jstack'ing
>>>>> a few times and at some point you should see the threads named like
>>>>> "IPC Server handler xx on 60020" (where xx is a number) showing bigger
>>>>> stack traces with HRegionServer doing stuff like get, next, put, etc
>>>>>
>>>>> You should also try scanning '.META.' from the shell and if it's slow,
>>>>> do the jstack'ing at the same time.
>>>>>
>>>>> J-D
>>>>>
>>>>> On Tue, Mar 1, 2011 at 5:07 PM, Nanheng Wu <na...@gmail.com> wrote:
>>>>>> My cluster (10 nodes, hbase-0.20.6 + hadoop 0.20.2) is very very slow
>>>>>> for any operation like disable table or delete. Master's thread dump
>>>>>> says they are blocked by the metaScanner thread. When I looked at the
>>>>>> log file on the .META RS there are no outputs at all! (INFO debug
>>>>>> level). J-D has been helping me on this, we pretty much figured out
>>>>>> that RegionManager.metaScanner is the culprit, because it's taking
>>>>>> around 25 minutes to scan 8K rows. What I don't get is what the region
>>>>>> server is actually doing during this time. There's no request at all
>>>>>> on the cluster, no RS splits either because we just use a MR job to
>>>>>> output HFiles and never write again.
>>>>>> J-D has been really really helpful, but I feel like I took too much of
>>>>>> his time. Below is the thread dump of the .META RS during the time
>>>>>> when disables command are blocked on meta scanner, can someone help me
>>>>>> figure out what the server is doing, is it running any thread at all?
>>>>>> Thank you!
>>>>>>
>>>>>> http://pastebin.com/CZQAywq3
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: What's the region server doing?

Posted by Nanheng Wu <na...@gmail.com>.

I just took the stack track of both master and the meta RS. the
master's still waiting for that thread which called "next", but no IPC
Server handler on the RS has that call. Is that possible? Or have I
just stared at this thing for too long?

On Tue, Mar 1, 2011 at 5:32 PM, Jean-Daniel Cryans <jd...@apache.org> wrote:
> Yes, and on the other side (which is the region server that hosts
> .META.) you should be able to see that call. Well, not that specific
> one, but one of them :)
>
> J-D
>
> On Tue, Mar 1, 2011 at 5:30 PM, Nanheng Wu <na...@gmail.com> wrote:
>> You said "next", I don't know if this related at all but from the
>> master's thread dump, it says the disable is blocked by this thread
>> below, and it calling next:
>>
>> Thread 27 (RegionManager.metaScanner):
>>  State: WAITING
>>  Blocked count: 69503
>>  Waited count: 68805
>>  Waiting on org.apache.hadoop.hbase.ipc.HBaseClient$Call@42fcac6
>>  Stack:
>>    java.lang.Object.wait(Native Method)
>>    java.lang.Object.wait(Object.java:485)
>>    org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:722)
>>    org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:333)
>>    $Proxy1.next(Unknown Source)
>>    org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:179)
>>    org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:73)
>>    org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
>>    org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:153)
>>    org.apache.hadoop.hbase.Chore.run(Chore.java:68)
>>
>> On Tue, Mar 1, 2011 at 5:22 PM, Nanheng Wu <na...@gmail.com> wrote:
>>> Thanks man I'll try that and post back when I find something. BTW, I
>>> ran the script to set the memstore flush size on .META., now I am
>>> seeing a lot less writing to HDFS from the .META RS and less
>>> compaction, unfortunately it's still low. :(
>>>
>>> On Tue, Mar 1, 2011 at 5:15 PM, Jean-Daniel Cryans <jd...@apache.org> wrote:
>>>> In that specific jstack it's doing nothing at all, but keep in mind
>>>> that it's only a snapshot of a precise moment in time. Try jstack'ing
>>>> a few times and at some point you should see the threads named like
>>>> "IPC Server handler xx on 60020" (where xx is a number) showing bigger
>>>> stack traces with HRegionServer doing stuff like get, next, put, etc
>>>>
>>>> You should also try scanning '.META.' from the shell and if it's slow,
>>>> do the jstack'ing at the same time.
>>>>
>>>> J-D
>>>>
>>>> On Tue, Mar 1, 2011 at 5:07 PM, Nanheng Wu <na...@gmail.com> wrote:
>>>>> My cluster (10 nodes, hbase-0.20.6 + hadoop 0.20.2) is very very slow
>>>>> for any operation like disable table or delete. Master's thread dump
>>>>> says they are blocked by the metaScanner thread. When I looked at the
>>>>> log file on the .META RS there are no outputs at all! (INFO debug
>>>>> level). J-D has been helping me on this, we pretty much figured out
>>>>> that RegionManager.metaScanner is the culprit, because it's taking
>>>>> around 25 minutes to scan 8K rows. What I don't get is what the region
>>>>> server is actually doing during this time. There's no request at all
>>>>> on the cluster, no RS splits either because we just use a MR job to
>>>>> output HFiles and never write again.
>>>>> J-D has been really really helpful, but I feel like I took too much of
>>>>> his time. Below is the thread dump of the .META RS during the time
>>>>> when disables command are blocked on meta scanner, can someone help me
>>>>> figure out what the server is doing, is it running any thread at all?
>>>>> Thank you!
>>>>>
>>>>> http://pastebin.com/CZQAywq3
>>>>>
>>>>
>>>
>>
>

Re: What's the region server doing?

Posted by Jean-Daniel Cryans <jd...@apache.org>.

Yes, and on the other side (which is the region server that hosts
.META.) you should be able to see that call. Well, not that specific
one, but one of them :)

J-D

On Tue, Mar 1, 2011 at 5:30 PM, Nanheng Wu <na...@gmail.com> wrote:
> You said "next", I don't know if this related at all but from the
> master's thread dump, it says the disable is blocked by this thread
> below, and it calling next:
>
> Thread 27 (RegionManager.metaScanner):
>  State: WAITING
>  Blocked count: 69503
>  Waited count: 68805
>  Waiting on org.apache.hadoop.hbase.ipc.HBaseClient$Call@42fcac6
>  Stack:
>    java.lang.Object.wait(Native Method)
>    java.lang.Object.wait(Object.java:485)
>    org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:722)
>    org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:333)
>    $Proxy1.next(Unknown Source)
>    org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:179)
>    org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:73)
>    org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
>    org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:153)
>    org.apache.hadoop.hbase.Chore.run(Chore.java:68)
>
> On Tue, Mar 1, 2011 at 5:22 PM, Nanheng Wu <na...@gmail.com> wrote:
>> Thanks man I'll try that and post back when I find something. BTW, I
>> ran the script to set the memstore flush size on .META., now I am
>> seeing a lot less writing to HDFS from the .META RS and less
>> compaction, unfortunately it's still low. :(
>>
>> On Tue, Mar 1, 2011 at 5:15 PM, Jean-Daniel Cryans <jd...@apache.org> wrote:
>>> In that specific jstack it's doing nothing at all, but keep in mind
>>> that it's only a snapshot of a precise moment in time. Try jstack'ing
>>> a few times and at some point you should see the threads named like
>>> "IPC Server handler xx on 60020" (where xx is a number) showing bigger
>>> stack traces with HRegionServer doing stuff like get, next, put, etc
>>>
>>> You should also try scanning '.META.' from the shell and if it's slow,
>>> do the jstack'ing at the same time.
>>>
>>> J-D
>>>
>>> On Tue, Mar 1, 2011 at 5:07 PM, Nanheng Wu <na...@gmail.com> wrote:
>>>> My cluster (10 nodes, hbase-0.20.6 + hadoop 0.20.2) is very very slow
>>>> for any operation like disable table or delete. Master's thread dump
>>>> says they are blocked by the metaScanner thread. When I looked at the
>>>> log file on the .META RS there are no outputs at all! (INFO debug
>>>> level). J-D has been helping me on this, we pretty much figured out
>>>> that RegionManager.metaScanner is the culprit, because it's taking
>>>> around 25 minutes to scan 8K rows. What I don't get is what the region
>>>> server is actually doing during this time. There's no request at all
>>>> on the cluster, no RS splits either because we just use a MR job to
>>>> output HFiles and never write again.
>>>> J-D has been really really helpful, but I feel like I took too much of
>>>> his time. Below is the thread dump of the .META RS during the time
>>>> when disables command are blocked on meta scanner, can someone help me
>>>> figure out what the server is doing, is it running any thread at all?
>>>> Thank you!
>>>>
>>>> http://pastebin.com/CZQAywq3
>>>>
>>>
>>
>

Re: What's the region server doing?

Posted by Nanheng Wu <na...@gmail.com>.

You said "next", I don't know if this related at all but from the
master's thread dump, it says the disable is blocked by this thread
below, and it calling next:

Thread 27 (RegionManager.metaScanner):
  State: WAITING
  Blocked count: 69503
  Waited count: 68805
  Waiting on org.apache.hadoop.hbase.ipc.HBaseClient$Call@42fcac6
  Stack:
    java.lang.Object.wait(Native Method)
    java.lang.Object.wait(Object.java:485)
    org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:722)
    org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:333)
    $Proxy1.next(Unknown Source)
    org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:179)
    org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:73)
    org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
    org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:153)
    org.apache.hadoop.hbase.Chore.run(Chore.java:68)

On Tue, Mar 1, 2011 at 5:22 PM, Nanheng Wu <na...@gmail.com> wrote:
> Thanks man I'll try that and post back when I find something. BTW, I
> ran the script to set the memstore flush size on .META., now I am
> seeing a lot less writing to HDFS from the .META RS and less
> compaction, unfortunately it's still low. :(
>
> On Tue, Mar 1, 2011 at 5:15 PM, Jean-Daniel Cryans <jd...@apache.org> wrote:
>> In that specific jstack it's doing nothing at all, but keep in mind
>> that it's only a snapshot of a precise moment in time. Try jstack'ing
>> a few times and at some point you should see the threads named like
>> "IPC Server handler xx on 60020" (where xx is a number) showing bigger
>> stack traces with HRegionServer doing stuff like get, next, put, etc
>>
>> You should also try scanning '.META.' from the shell and if it's slow,
>> do the jstack'ing at the same time.
>>
>> J-D
>>
>> On Tue, Mar 1, 2011 at 5:07 PM, Nanheng Wu <na...@gmail.com> wrote:
>>> My cluster (10 nodes, hbase-0.20.6 + hadoop 0.20.2) is very very slow
>>> for any operation like disable table or delete. Master's thread dump
>>> says they are blocked by the metaScanner thread. When I looked at the
>>> log file on the .META RS there are no outputs at all! (INFO debug
>>> level). J-D has been helping me on this, we pretty much figured out
>>> that RegionManager.metaScanner is the culprit, because it's taking
>>> around 25 minutes to scan 8K rows. What I don't get is what the region
>>> server is actually doing during this time. There's no request at all
>>> on the cluster, no RS splits either because we just use a MR job to
>>> output HFiles and never write again.
>>> J-D has been really really helpful, but I feel like I took too much of
>>> his time. Below is the thread dump of the .META RS during the time
>>> when disables command are blocked on meta scanner, can someone help me
>>> figure out what the server is doing, is it running any thread at all?
>>> Thank you!
>>>
>>> http://pastebin.com/CZQAywq3
>>>
>>
>

Re: What's the region server doing?

Posted by Nanheng Wu <na...@gmail.com>.

Thanks man I'll try that and post back when I find something. BTW, I
ran the script to set the memstore flush size on .META., now I am
seeing a lot less writing to HDFS from the .META RS and less
compaction, unfortunately it's still low. :(

On Tue, Mar 1, 2011 at 5:15 PM, Jean-Daniel Cryans <jd...@apache.org> wrote:
> In that specific jstack it's doing nothing at all, but keep in mind
> that it's only a snapshot of a precise moment in time. Try jstack'ing
> a few times and at some point you should see the threads named like
> "IPC Server handler xx on 60020" (where xx is a number) showing bigger
> stack traces with HRegionServer doing stuff like get, next, put, etc
>
> You should also try scanning '.META.' from the shell and if it's slow,
> do the jstack'ing at the same time.
>
> J-D
>
> On Tue, Mar 1, 2011 at 5:07 PM, Nanheng Wu <na...@gmail.com> wrote:
>> My cluster (10 nodes, hbase-0.20.6 + hadoop 0.20.2) is very very slow
>> for any operation like disable table or delete. Master's thread dump
>> says they are blocked by the metaScanner thread. When I looked at the
>> log file on the .META RS there are no outputs at all! (INFO debug
>> level). J-D has been helping me on this, we pretty much figured out
>> that RegionManager.metaScanner is the culprit, because it's taking
>> around 25 minutes to scan 8K rows. What I don't get is what the region
>> server is actually doing during this time. There's no request at all
>> on the cluster, no RS splits either because we just use a MR job to
>> output HFiles and never write again.
>> J-D has been really really helpful, but I feel like I took too much of
>> his time. Below is the thread dump of the .META RS during the time
>> when disables command are blocked on meta scanner, can someone help me
>> figure out what the server is doing, is it running any thread at all?
>> Thank you!
>>
>> http://pastebin.com/CZQAywq3
>>
>

Re: What's the region server doing?

Posted by Jean-Daniel Cryans <jd...@apache.org>.

In that specific jstack it's doing nothing at all, but keep in mind
that it's only a snapshot of a precise moment in time. Try jstack'ing
a few times and at some point you should see the threads named like
"IPC Server handler xx on 60020" (where xx is a number) showing bigger
stack traces with HRegionServer doing stuff like get, next, put, etc

You should also try scanning '.META.' from the shell and if it's slow,
do the jstack'ing at the same time.

J-D

On Tue, Mar 1, 2011 at 5:07 PM, Nanheng Wu <na...@gmail.com> wrote:
> My cluster (10 nodes, hbase-0.20.6 + hadoop 0.20.2) is very very slow
> for any operation like disable table or delete. Master's thread dump
> says they are blocked by the metaScanner thread. When I looked at the
> log file on the .META RS there are no outputs at all! (INFO debug
> level). J-D has been helping me on this, we pretty much figured out
> that RegionManager.metaScanner is the culprit, because it's taking
> around 25 minutes to scan 8K rows. What I don't get is what the region
> server is actually doing during this time. There's no request at all
> on the cluster, no RS splits either because we just use a MR job to
> output HFiles and never write again.
> J-D has been really really helpful, but I feel like I took too much of
> his time. Below is the thread dump of the .META RS during the time
> when disables command are blocked on meta scanner, can someone help me
> figure out what the server is doing, is it running any thread at all?
> Thank you!
>
> http://pastebin.com/CZQAywq3
>