You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Ted Yu <yu...@gmail.com> on 2010/03/13 05:24:14 UTC

slow response in hbase shell

Hi,

> We sometimes saw over 5 second delay running get in hbase 0.20.1 shell:
>
> hbase(main):002:0> get 'ruletable', 'ca.tsn.www'
> 0 row(s) in 10.1330 seconds
>
> From our 3 region servers there are a lot of such messages:
>


> 2010-03-12 00:00:00,996 INFO  [regionserver/10.10.31.135:60020]
> regionserver.HRegionServer(493): MSG_REGION_CLOSE:
> crawltable,com.pandora.www:http\x2Finclude\x2FlyricsAdEmbed.html\x3Fgenre\x3Delectronica\x26artist\x3DR273847\x26webname\x3D\x26sz\x3D2000x8\x26ord\x3D125823029226371645\x26tile\x3D3\x26_id\x3Dbottom_leaderboard_container,1266944566406:
> Overloaded
> 2010-03-12 00:00:00,997 INFO  [regionserver/10.10.31.135:60020]
> regionserver.HRegionServer(493): MSG_REGION_CLOSE:
> domaincrawltable,,1268098564908: Overloaded
>
> grep Overloaded hbase-hbaseadmin-regionserver.log | wc
>   40428  343638 11523230
>
> grep Overloaded hbase-hbaseadmin-regionserver.log | wc
>   40430  343655 11307703
>
> grep Overloaded hbase-hbaseadmin-regionserver.log | wc
>   40466  343961 11340379
>
> Was the slow response due to the load balancing ?
>

The strange thing was that after several quick responses I would see:

hbase(main):004:0> get 'ruletable', 'com.about.acne'
COLUMN
CELL
 lpm_1.0:category            timestamp=1268347483823, value=
http://acne.about.com\t9002:0.86580       086\thost\n
1 row(s) in 0.0040 seconds
hbase(main):005:0> get 'ruletable', 'com.about.acne'
COLUMN
CELL
 lpm_1.0:category            timestamp=1268347483823, value=
http://acne.about.com\t9002:0.86580       086\thost\n
1 row(s) in 0.0040 seconds
hbase(main):006:0> get 'ruletable', 'com.about.acne'
NativeException: org.apache.hadoop.hbase.client.RetriesExhaustedException:
Trying to contact region server 10.10.31.136:60020 for region
ruletable,,1268083966723, row 'com.about.acne', but failed after 5 attempts.
Exceptions:
org.apache.hadoop.hbase.NotServingRegionException:
org.apache.hadoop.hbase.NotServingRegionException: ruletable,,1268083966723
        at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2307)
        at
org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1784)
        at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
        at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)

But 006:60010/master.jsp refreshes quickly and shows all three
regionservers.
Don't know why hbase shell encountered NSRE.

>
> Thanks

RE: slow response in hbase shell

Posted by Jonathan Gray <jl...@streamy.com>.
Ted,

It's hard to tell without having full logs, but it seems that this is caught
in a reassignment loop.  I see things getting unassigned and reassigned over
and over.

Are there any weird network or dns issues on this cluster?  It shows avg
load at 7 not 13, so seems that there could be some extra regionserver or
something else wonky.

Have you tried restarting your cluster?

> -----Original Message-----
> From: Ted Yu [mailto:yuzhihong@gmail.com]
> Sent: Saturday, March 13, 2010 1:59 PM
> To: hbase-user@hadoop.apache.org
> Subject: Re: slow response in hbase shell
> 
> Here is region server log:
> http://pastebin.com/6d1QgNXR
> 
> There was only one exception around the time I performed get:
> 2010-03-13 03:38:30,325 INFO  [regionserver/10.10.31.137:60020.worker]
> regionserver.HRegion(342): region
> domaincrawltable,com.intensedebate.s:http\x2Fsignup,1268098564908/60229
> 3480
> available; sequence id is 225656738
> 2010-03-13 03:38:30,476 ERROR [IPC Server handler 23 on 60020]
> regionserver.HRegionServer(844):
> org.apache.hadoop.hbase.NotServingRegionException:
> ruletable,com.hoovers.www/companyindex/New_Mexico/Corrales/Telecommunic
> ations-1.html,1268084284374
>     at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSer
> ver.java:2307)
>     at
> org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.ja
> va:1784)
>     at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso
> rImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
>     at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:91
> 5)
> 
> Here is master server log which doesn't have exception at all:
> http://pastebin.com/80949RK2
> 
> On Sat, Mar 13, 2010 at 10:10 AM, Jonathan Gray <jl...@streamy.com>
> wrote:
> 
> > Ted,
> >
> >
> >
> > Your attachments didn't come through.  Try putting them up on the web
> or
> > pastebin somewhere.
> >
> >
> >
> > What's happening in the RegionServer logs between the time that the
> get
> > works and the get doesn't work?
> >
> >
> >
> > Also, I recommend upgrading to 0.20.3, there are critical fixes.
> >
> >
> >
> > From: Ted Yu [mailto:yuzhihong@gmail.com]
> > Sent: Saturday, March 13, 2010 3:48 AM
> > To: hbase-user@hadoop.apache.org
> > Subject: Re: slow response in hbase shell
> >
> >
> >
> > We use hbase 0.20.1
> > There are 3 region servers. 13 regions on each server.
> >
> > I don't see any exception in master log.
> >
> > I was able to run 10 successful get commands before hitting the
> following:
> > I am attaching (partial) master log and region server log from
> 10.10.31.137
> >
> > hbase(main):014:0> get 'ruletable', 'com.about.acne'
> > COLUMN                       CELL
> >  lpm_1.0:category            timestamp=1268347483823,
> > value=http://acne.about.com\t9002:0.86580086\thost\n
> > 1 row(s) in 0.0120 seconds
> > hbase(main):015:0> get 'ruletable', 'com.about.acne'
> > NativeException:
> org.apache.hadoop.hbase.client.RetriesExhaustedException:
> > Trying to contact region server 10.10.31.137:60020 for region
> > ruletable,,1268083966723, row 'com.about.acne', but failed after 5
> > attempts.
> > Exceptions:
> > org.apache.hadoop.hbase.NotServingRegionException:
> > org.apache.hadoop.hbase.NotServingRegionException:
> ruletable,,1268083966723
> >        at
> >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSer
> ver.j
> > ava:2307)
> >        at
> >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.ja
> va:17
> > 84)
> >        at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
> >        at
> >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso
> rImpl
> > .java:25)
> >        at java.lang.reflect.Method.invoke(Method.java:597)
> >        at
> > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
> >        at
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:91
> 5)
> >
> >
> >
> > On Fri, Mar 12, 2010 at 9:08 PM, Jonathan Gray <jl...@streamy.com>
> wrote:
> >
> > Seems like something weird is going on with your regionservers and
> > balancing.
> >
> > Can you post big snippets from the regionserver and master logs?
> Have you
> > checked to see what's going on?  Is there repetitive balancing going
> on
> > that
> > never seems to reach steady-state?
> >
> > How many regions and how many nodes on which version of HBase?
> >
> >
> > > -----Original Message-----
> > > From: Ted Yu [mailto:yuzhihong@gmail.com]
> > > Sent: Friday, March 12, 2010 8:24 PM
> > > To: hbase-user@hadoop.apache.org
> > > Subject: slow response in hbase shell
> > >
> > > Hi,
> > >
> > > > We sometimes saw over 5 second delay running get in hbase 0.20.1
> > > shell:
> > > >
> > > > hbase(main):002:0> get 'ruletable', 'ca.tsn.www'
> > > > 0 row(s) in 10.1330 seconds
> > > >
> > > > From our 3 region servers there are a lot of such messages:
> > > >
> > >
> > >
> > > > 2010-03-12 00:00:00,996 INFO  [regionserver/10.10.31.135:60020]
> > > > regionserver.HRegionServer(493): MSG_REGION_CLOSE:
> > > >
> > >
> crawltable,com.pandora.www:http\x2Finclude\x2FlyricsAdEmbed.html\x3Fgen
> > >
> re\x3Delectronica\x26artist\x3DR273847\x26webname\x3D\x26sz\x3D2000x8\x
> > >
> 26ord\x3D125823029226371645\x26tile\x3D3\x26_id\x3Dbottom_leaderboard_c
> > > ontainer,1266944566406:
> > > > Overloaded
> > > > 2010-03-12 00:00:00,997 INFO  [regionserver/10.10.31.135:60020]
> > > > regionserver.HRegionServer(493): MSG_REGION_CLOSE:
> > > > domaincrawltable,,1268098564908: Overloaded
> > > >
> > > > grep Overloaded hbase-hbaseadmin-regionserver.log | wc
> > > >   40428  343638 11523230
> > > >
> > > > grep Overloaded hbase-hbaseadmin-regionserver.log | wc
> > > >   40430  343655 11307703
> > > >
> > > > grep Overloaded hbase-hbaseadmin-regionserver.log | wc
> > > >   40466  343961 11340379
> > > >
> > > > Was the slow response due to the load balancing ?
> > > >
> > >
> > > The strange thing was that after several quick responses I would
> see:
> > >
> > > hbase(main):004:0> get 'ruletable', 'com.about.acne'
> > > COLUMN
> > > CELL
> > >  lpm_1.0:category            timestamp=1268347483823, value=
> > > http://acne.about.com\t9002:0.86580       086\thost\n
> > > 1 row(s) in 0.0040 seconds
> > > hbase(main):005:0> get 'ruletable', 'com.about.acne'
> > > COLUMN
> > > CELL
> > >  lpm_1.0:category            timestamp=1268347483823, value=
> > > http://acne.about.com\t9002:0.86580       086\thost\n
> > > 1 row(s) in 0.0040 seconds
> > > hbase(main):006:0> get 'ruletable', 'com.about.acne'
> > > NativeException:
> > > org.apache.hadoop.hbase.client.RetriesExhaustedException:
> > > Trying to contact region server 10.10.31.136:60020 for region
> > > ruletable,,1268083966723, row 'com.about.acne', but failed after 5
> > > attempts.
> > > Exceptions:
> > > org.apache.hadoop.hbase.NotServingRegionException:
> > > org.apache.hadoop.hbase.NotServingRegionException:
> > > ruletable,,1268083966723
> > >         at
> > >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSer
> > > ver.java:2307)
> > >         at
> > >
> org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.ja
> > > va:1784)
> > >         at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown
> Source)
> > >         at
> > >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso
> > > rImpl.java:25)
> > >         at java.lang.reflect.Method.invoke(Method.java:597)
> > >         at
> > > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
> > >         at
> > >
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:91
> > > 5)
> > >
> > > But 006:60010/master.jsp refreshes quickly and shows all three
> > > regionservers.
> > > Don't know why hbase shell encountered NSRE.
> > >
> > > >
> > > > Thanks
> >
> >
> >
> >


Re: slow response in hbase shell

Posted by Ted Yu <yu...@gmail.com>.
Here is region server log:
http://pastebin.com/6d1QgNXR

There was only one exception around the time I performed get:
2010-03-13 03:38:30,325 INFO  [regionserver/10.10.31.137:60020.worker]
regionserver.HRegion(342): region
domaincrawltable,com.intensedebate.s:http\x2Fsignup,1268098564908/602293480
available; sequence id is 225656738
2010-03-13 03:38:30,476 ERROR [IPC Server handler 23 on 60020]
regionserver.HRegionServer(844):
org.apache.hadoop.hbase.NotServingRegionException:
ruletable,com.hoovers.www/companyindex/New_Mexico/Corrales/Telecommunications-1.html,1268084284374
    at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2307)
    at
org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1784)
    at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
    at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)

Here is master server log which doesn't have exception at all:
http://pastebin.com/80949RK2

On Sat, Mar 13, 2010 at 10:10 AM, Jonathan Gray <jl...@streamy.com> wrote:

> Ted,
>
>
>
> Your attachments didn't come through.  Try putting them up on the web or
> pastebin somewhere.
>
>
>
> What's happening in the RegionServer logs between the time that the get
> works and the get doesn't work?
>
>
>
> Also, I recommend upgrading to 0.20.3, there are critical fixes.
>
>
>
> From: Ted Yu [mailto:yuzhihong@gmail.com]
> Sent: Saturday, March 13, 2010 3:48 AM
> To: hbase-user@hadoop.apache.org
> Subject: Re: slow response in hbase shell
>
>
>
> We use hbase 0.20.1
> There are 3 region servers. 13 regions on each server.
>
> I don't see any exception in master log.
>
> I was able to run 10 successful get commands before hitting the following:
> I am attaching (partial) master log and region server log from 10.10.31.137
>
> hbase(main):014:0> get 'ruletable', 'com.about.acne'
> COLUMN                       CELL
>  lpm_1.0:category            timestamp=1268347483823,
> value=http://acne.about.com\t9002:0.86580086\thost\n
> 1 row(s) in 0.0120 seconds
> hbase(main):015:0> get 'ruletable', 'com.about.acne'
> NativeException: org.apache.hadoop.hbase.client.RetriesExhaustedException:
> Trying to contact region server 10.10.31.137:60020 for region
> ruletable,,1268083966723, row 'com.about.acne', but failed after 5
> attempts.
> Exceptions:
> org.apache.hadoop.hbase.NotServingRegionException:
> org.apache.hadoop.hbase.NotServingRegionException: ruletable,,1268083966723
>        at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.j
> ava:2307)
>        at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:17
> 84)
>        at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
>        at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
> .java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
>        at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
>
>
>
> On Fri, Mar 12, 2010 at 9:08 PM, Jonathan Gray <jl...@streamy.com> wrote:
>
> Seems like something weird is going on with your regionservers and
> balancing.
>
> Can you post big snippets from the regionserver and master logs?  Have you
> checked to see what's going on?  Is there repetitive balancing going on
> that
> never seems to reach steady-state?
>
> How many regions and how many nodes on which version of HBase?
>
>
> > -----Original Message-----
> > From: Ted Yu [mailto:yuzhihong@gmail.com]
> > Sent: Friday, March 12, 2010 8:24 PM
> > To: hbase-user@hadoop.apache.org
> > Subject: slow response in hbase shell
> >
> > Hi,
> >
> > > We sometimes saw over 5 second delay running get in hbase 0.20.1
> > shell:
> > >
> > > hbase(main):002:0> get 'ruletable', 'ca.tsn.www'
> > > 0 row(s) in 10.1330 seconds
> > >
> > > From our 3 region servers there are a lot of such messages:
> > >
> >
> >
> > > 2010-03-12 00:00:00,996 INFO  [regionserver/10.10.31.135:60020]
> > > regionserver.HRegionServer(493): MSG_REGION_CLOSE:
> > >
> > crawltable,com.pandora.www:http\x2Finclude\x2FlyricsAdEmbed.html\x3Fgen
> > re\x3Delectronica\x26artist\x3DR273847\x26webname\x3D\x26sz\x3D2000x8\x
> > 26ord\x3D125823029226371645\x26tile\x3D3\x26_id\x3Dbottom_leaderboard_c
> > ontainer,1266944566406:
> > > Overloaded
> > > 2010-03-12 00:00:00,997 INFO  [regionserver/10.10.31.135:60020]
> > > regionserver.HRegionServer(493): MSG_REGION_CLOSE:
> > > domaincrawltable,,1268098564908: Overloaded
> > >
> > > grep Overloaded hbase-hbaseadmin-regionserver.log | wc
> > >   40428  343638 11523230
> > >
> > > grep Overloaded hbase-hbaseadmin-regionserver.log | wc
> > >   40430  343655 11307703
> > >
> > > grep Overloaded hbase-hbaseadmin-regionserver.log | wc
> > >   40466  343961 11340379
> > >
> > > Was the slow response due to the load balancing ?
> > >
> >
> > The strange thing was that after several quick responses I would see:
> >
> > hbase(main):004:0> get 'ruletable', 'com.about.acne'
> > COLUMN
> > CELL
> >  lpm_1.0:category            timestamp=1268347483823, value=
> > http://acne.about.com\t9002:0.86580       086\thost\n
> > 1 row(s) in 0.0040 seconds
> > hbase(main):005:0> get 'ruletable', 'com.about.acne'
> > COLUMN
> > CELL
> >  lpm_1.0:category            timestamp=1268347483823, value=
> > http://acne.about.com\t9002:0.86580       086\thost\n
> > 1 row(s) in 0.0040 seconds
> > hbase(main):006:0> get 'ruletable', 'com.about.acne'
> > NativeException:
> > org.apache.hadoop.hbase.client.RetriesExhaustedException:
> > Trying to contact region server 10.10.31.136:60020 for region
> > ruletable,,1268083966723, row 'com.about.acne', but failed after 5
> > attempts.
> > Exceptions:
> > org.apache.hadoop.hbase.NotServingRegionException:
> > org.apache.hadoop.hbase.NotServingRegionException:
> > ruletable,,1268083966723
> >         at
> > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSer
> > ver.java:2307)
> >         at
> > org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.ja
> > va:1784)
> >         at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)
> >         at
> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso
> > rImpl.java:25)
> >         at java.lang.reflect.Method.invoke(Method.java:597)
> >         at
> > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
> >         at
> > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:91
> > 5)
> >
> > But 006:60010/master.jsp refreshes quickly and shows all three
> > regionservers.
> > Don't know why hbase shell encountered NSRE.
> >
> > >
> > > Thanks
>
>
>
>

RE: slow response in hbase shell

Posted by Jonathan Gray <jl...@streamy.com>.
Ted,

 

Your attachments didn't come through.  Try putting them up on the web or
pastebin somewhere.

 

What's happening in the RegionServer logs between the time that the get
works and the get doesn't work?

 

Also, I recommend upgrading to 0.20.3, there are critical fixes.

 

From: Ted Yu [mailto:yuzhihong@gmail.com] 
Sent: Saturday, March 13, 2010 3:48 AM
To: hbase-user@hadoop.apache.org
Subject: Re: slow response in hbase shell

 

We use hbase 0.20.1
There are 3 region servers. 13 regions on each server.

I don't see any exception in master log.

I was able to run 10 successful get commands before hitting the following:
I am attaching (partial) master log and region server log from 10.10.31.137

hbase(main):014:0> get 'ruletable', 'com.about.acne'
COLUMN                       CELL
 lpm_1.0:category            timestamp=1268347483823,
value=http://acne.about.com\t9002:0.86580086\thost\n
1 row(s) in 0.0120 seconds
hbase(main):015:0> get 'ruletable', 'com.about.acne'
NativeException: org.apache.hadoop.hbase.client.RetriesExhaustedException:
Trying to contact region server 10.10.31.137:60020 for region
ruletable,,1268083966723, row 'com.about.acne', but failed after 5 attempts.
Exceptions:
org.apache.hadoop.hbase.NotServingRegionException:
org.apache.hadoop.hbase.NotServingRegionException: ruletable,,1268083966723
        at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.j
ava:2307)
        at
org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:17
84)
        at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
        at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)



On Fri, Mar 12, 2010 at 9:08 PM, Jonathan Gray <jl...@streamy.com> wrote:

Seems like something weird is going on with your regionservers and
balancing.

Can you post big snippets from the regionserver and master logs?  Have you
checked to see what's going on?  Is there repetitive balancing going on that
never seems to reach steady-state?

How many regions and how many nodes on which version of HBase?


> -----Original Message-----
> From: Ted Yu [mailto:yuzhihong@gmail.com]
> Sent: Friday, March 12, 2010 8:24 PM
> To: hbase-user@hadoop.apache.org
> Subject: slow response in hbase shell
>
> Hi,
>
> > We sometimes saw over 5 second delay running get in hbase 0.20.1
> shell:
> >
> > hbase(main):002:0> get 'ruletable', 'ca.tsn.www'
> > 0 row(s) in 10.1330 seconds
> >
> > From our 3 region servers there are a lot of such messages:
> >
>
>
> > 2010-03-12 00:00:00,996 INFO  [regionserver/10.10.31.135:60020]
> > regionserver.HRegionServer(493): MSG_REGION_CLOSE:
> >
> crawltable,com.pandora.www:http\x2Finclude\x2FlyricsAdEmbed.html\x3Fgen
> re\x3Delectronica\x26artist\x3DR273847\x26webname\x3D\x26sz\x3D2000x8\x
> 26ord\x3D125823029226371645\x26tile\x3D3\x26_id\x3Dbottom_leaderboard_c
> ontainer,1266944566406:
> > Overloaded
> > 2010-03-12 00:00:00,997 INFO  [regionserver/10.10.31.135:60020]
> > regionserver.HRegionServer(493): MSG_REGION_CLOSE:
> > domaincrawltable,,1268098564908: Overloaded
> >
> > grep Overloaded hbase-hbaseadmin-regionserver.log | wc
> >   40428  343638 11523230
> >
> > grep Overloaded hbase-hbaseadmin-regionserver.log | wc
> >   40430  343655 11307703
> >
> > grep Overloaded hbase-hbaseadmin-regionserver.log | wc
> >   40466  343961 11340379
> >
> > Was the slow response due to the load balancing ?
> >
>
> The strange thing was that after several quick responses I would see:
>
> hbase(main):004:0> get 'ruletable', 'com.about.acne'
> COLUMN
> CELL
>  lpm_1.0:category            timestamp=1268347483823, value=
> http://acne.about.com\t9002:0.86580       086\thost\n
> 1 row(s) in 0.0040 seconds
> hbase(main):005:0> get 'ruletable', 'com.about.acne'
> COLUMN
> CELL
>  lpm_1.0:category            timestamp=1268347483823, value=
> http://acne.about.com\t9002:0.86580       086\thost\n
> 1 row(s) in 0.0040 seconds
> hbase(main):006:0> get 'ruletable', 'com.about.acne'
> NativeException:
> org.apache.hadoop.hbase.client.RetriesExhaustedException:
> Trying to contact region server 10.10.31.136:60020 for region
> ruletable,,1268083966723, row 'com.about.acne', but failed after 5
> attempts.
> Exceptions:
> org.apache.hadoop.hbase.NotServingRegionException:
> org.apache.hadoop.hbase.NotServingRegionException:
> ruletable,,1268083966723
>         at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSer
> ver.java:2307)
>         at
> org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.ja
> va:1784)
>         at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso
> rImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
>         at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:91
> 5)
>
> But 006:60010/master.jsp refreshes quickly and shows all three
> regionservers.
> Don't know why hbase shell encountered NSRE.
>
> >
> > Thanks

 


Re: slow response in hbase shell

Posted by Stack <st...@duboce.net>.
As JG says, please update to 0.20.3.   Its a waste of everyone's time
spending our time debugging only to find the issue already resolved in
a later release.

Also as JG suggests, there is something up w/ your region balancer.
This has been going on for a while now going by the messages here and
those you've been sending me off list.  At a guess, the occasional
long delay is because a region is because region is being redeployed
elsewhere (even the redeploy is faster in 0.20.3 -- another reason so
to update).

St.Ack

On Sat, Mar 13, 2010 at 3:48 AM, Ted Yu <yu...@gmail.com> wrote:
> We use hbase 0.20.1
> There are 3 region servers. 13 regions on each server.
>
> I don't see any exception in master log.
>
> I was able to run 10 successful get commands before hitting the following:
> I am attaching (partial) master log and region server log from 10.10.31.137
>
> hbase(main):014:0> get 'ruletable', 'com.about.acne'
> COLUMN                       CELL
>  lpm_1.0:category            timestamp=1268347483823,
> value=http://acne.about.com\t9002:0.86580086\thost\n
> 1 row(s) in 0.0120 seconds
> hbase(main):015:0> get 'ruletable', 'com.about.acne'
> NativeException: org.apache.hadoop.hbase.client.RetriesExhaustedException:
> Trying to contact region server 10.10.31.137:60020 for region
> ruletable,,1268083966723, row 'com.about.acne', but failed after 5 attempts.
> Exceptions:
> org.apache.hadoop.hbase.NotServingRegionException:
> org.apache.hadoop.hbase.NotServingRegionException: ruletable,,1268083966723
>         at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2307)
>         at
> org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1784)
>         at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
>         at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
>
>
> On Fri, Mar 12, 2010 at 9:08 PM, Jonathan Gray <jl...@streamy.com> wrote:
>>
>> Seems like something weird is going on with your regionservers and
>> balancing.
>>
>> Can you post big snippets from the regionserver and master logs?  Have you
>> checked to see what's going on?  Is there repetitive balancing going on
>> that
>> never seems to reach steady-state?
>>
>> How many regions and how many nodes on which version of HBase?
>>
>> > -----Original Message-----
>> > From: Ted Yu [mailto:yuzhihong@gmail.com]
>> > Sent: Friday, March 12, 2010 8:24 PM
>> > To: hbase-user@hadoop.apache.org
>> > Subject: slow response in hbase shell
>> >
>> > Hi,
>> >
>> > > We sometimes saw over 5 second delay running get in hbase 0.20.1
>> > shell:
>> > >
>> > > hbase(main):002:0> get 'ruletable', 'ca.tsn.www'
>> > > 0 row(s) in 10.1330 seconds
>> > >
>> > > From our 3 region servers there are a lot of such messages:
>> > >
>> >
>> >
>> > > 2010-03-12 00:00:00,996 INFO  [regionserver/10.10.31.135:60020]
>> > > regionserver.HRegionServer(493): MSG_REGION_CLOSE:
>> > >
>> > crawltable,com.pandora.www:http\x2Finclude\x2FlyricsAdEmbed.html\x3Fgen
>> > re\x3Delectronica\x26artist\x3DR273847\x26webname\x3D\x26sz\x3D2000x8\x
>> > 26ord\x3D125823029226371645\x26tile\x3D3\x26_id\x3Dbottom_leaderboard_c
>> > ontainer,1266944566406:
>> > > Overloaded
>> > > 2010-03-12 00:00:00,997 INFO  [regionserver/10.10.31.135:60020]
>> > > regionserver.HRegionServer(493): MSG_REGION_CLOSE:
>> > > domaincrawltable,,1268098564908: Overloaded
>> > >
>> > > grep Overloaded hbase-hbaseadmin-regionserver.log | wc
>> > >   40428  343638 11523230
>> > >
>> > > grep Overloaded hbase-hbaseadmin-regionserver.log | wc
>> > >   40430  343655 11307703
>> > >
>> > > grep Overloaded hbase-hbaseadmin-regionserver.log | wc
>> > >   40466  343961 11340379
>> > >
>> > > Was the slow response due to the load balancing ?
>> > >
>> >
>> > The strange thing was that after several quick responses I would see:
>> >
>> > hbase(main):004:0> get 'ruletable', 'com.about.acne'
>> > COLUMN
>> > CELL
>> >  lpm_1.0:category            timestamp=1268347483823, value=
>> > http://acne.about.com\t9002:0.86580       086\thost\n
>> > 1 row(s) in 0.0040 seconds
>> > hbase(main):005:0> get 'ruletable', 'com.about.acne'
>> > COLUMN
>> > CELL
>> >  lpm_1.0:category            timestamp=1268347483823, value=
>> > http://acne.about.com\t9002:0.86580       086\thost\n
>> > 1 row(s) in 0.0040 seconds
>> > hbase(main):006:0> get 'ruletable', 'com.about.acne'
>> > NativeException:
>> > org.apache.hadoop.hbase.client.RetriesExhaustedException:
>> > Trying to contact region server 10.10.31.136:60020 for region
>> > ruletable,,1268083966723, row 'com.about.acne', but failed after 5
>> > attempts.
>> > Exceptions:
>> > org.apache.hadoop.hbase.NotServingRegionException:
>> > org.apache.hadoop.hbase.NotServingRegionException:
>> > ruletable,,1268083966723
>> >         at
>> > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSer
>> > ver.java:2307)
>> >         at
>> > org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.ja
>> > va:1784)
>> >         at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)
>> >         at
>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso
>> > rImpl.java:25)
>> >         at java.lang.reflect.Method.invoke(Method.java:597)
>> >         at
>> > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
>> >         at
>> > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:91
>> > 5)
>> >
>> > But 006:60010/master.jsp refreshes quickly and shows all three
>> > regionservers.
>> > Don't know why hbase shell encountered NSRE.
>> >
>> > >
>> > > Thanks
>>
>
>

Re: slow response in hbase shell

Posted by Ted Yu <yu...@gmail.com>.
We use hbase 0.20.1
There are 3 region servers. 13 regions on each server.

I don't see any exception in master log.

I was able to run 10 successful get commands before hitting the following:
I am attaching (partial) master log and region server log from 10.10.31.137

hbase(main):014:0> get 'ruletable', 'com.about.acne'
COLUMN                       CELL
 lpm_1.0:category            timestamp=1268347483823, value=
http://acne.about.com\t9002:0.86580086\thost\n
1 row(s) in 0.0120 seconds
hbase(main):015:0> get 'ruletable', 'com.about.acne'
NativeException: org.apache.hadoop.hbase.client.RetriesExhaustedException:
Trying to contact region server 10.10.31.137:60020 for region
ruletable,,1268083966723, row 'com.about.acne', but failed after 5 attempts.
Exceptions:
org.apache.hadoop.hbase.NotServingRegionException:
org.apache.hadoop.hbase.NotServingRegionException: ruletable,,1268083966723
        at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2307)
        at
org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1784)
        at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
        at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)


On Fri, Mar 12, 2010 at 9:08 PM, Jonathan Gray <jl...@streamy.com> wrote:

> Seems like something weird is going on with your regionservers and
> balancing.
>
> Can you post big snippets from the regionserver and master logs?  Have you
> checked to see what's going on?  Is there repetitive balancing going on
> that
> never seems to reach steady-state?
>
> How many regions and how many nodes on which version of HBase?
>
> > -----Original Message-----
> > From: Ted Yu [mailto:yuzhihong@gmail.com]
> > Sent: Friday, March 12, 2010 8:24 PM
> > To: hbase-user@hadoop.apache.org
> > Subject: slow response in hbase shell
> >
> > Hi,
> >
> > > We sometimes saw over 5 second delay running get in hbase 0.20.1
> > shell:
> > >
> > > hbase(main):002:0> get 'ruletable', 'ca.tsn.www'
> > > 0 row(s) in 10.1330 seconds
> > >
> > > From our 3 region servers there are a lot of such messages:
> > >
> >
> >
> > > 2010-03-12 00:00:00,996 INFO  [regionserver/10.10.31.135:60020]
> > > regionserver.HRegionServer(493): MSG_REGION_CLOSE:
> > >
> > crawltable,com.pandora.www:http\x2Finclude\x2FlyricsAdEmbed.html\x3Fgen
> > re\x3Delectronica\x26artist\x3DR273847\x26webname\x3D\x26sz\x3D2000x8\x
> > 26ord\x3D125823029226371645\x26tile\x3D3\x26_id\x3Dbottom_leaderboard_c
> > ontainer,1266944566406:
> > > Overloaded
> > > 2010-03-12 00:00:00,997 INFO  [regionserver/10.10.31.135:60020]
> > > regionserver.HRegionServer(493): MSG_REGION_CLOSE:
> > > domaincrawltable,,1268098564908: Overloaded
> > >
> > > grep Overloaded hbase-hbaseadmin-regionserver.log | wc
> > >   40428  343638 11523230
> > >
> > > grep Overloaded hbase-hbaseadmin-regionserver.log | wc
> > >   40430  343655 11307703
> > >
> > > grep Overloaded hbase-hbaseadmin-regionserver.log | wc
> > >   40466  343961 11340379
> > >
> > > Was the slow response due to the load balancing ?
> > >
> >
> > The strange thing was that after several quick responses I would see:
> >
> > hbase(main):004:0> get 'ruletable', 'com.about.acne'
> > COLUMN
> > CELL
> >  lpm_1.0:category            timestamp=1268347483823, value=
> > http://acne.about.com\t9002:0.86580       086\thost\n
> > 1 row(s) in 0.0040 seconds
> > hbase(main):005:0> get 'ruletable', 'com.about.acne'
> > COLUMN
> > CELL
> >  lpm_1.0:category            timestamp=1268347483823, value=
> > http://acne.about.com\t9002:0.86580       086\thost\n
> > 1 row(s) in 0.0040 seconds
> > hbase(main):006:0> get 'ruletable', 'com.about.acne'
> > NativeException:
> > org.apache.hadoop.hbase.client.RetriesExhaustedException:
> > Trying to contact region server 10.10.31.136:60020 for region
> > ruletable,,1268083966723, row 'com.about.acne', but failed after 5
> > attempts.
> > Exceptions:
> > org.apache.hadoop.hbase.NotServingRegionException:
> > org.apache.hadoop.hbase.NotServingRegionException:
> > ruletable,,1268083966723
> >         at
> > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSer
> > ver.java:2307)
> >         at
> > org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.ja
> > va:1784)
> >         at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)
> >         at
> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso
> > rImpl.java:25)
> >         at java.lang.reflect.Method.invoke(Method.java:597)
> >         at
> > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
> >         at
> > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:91
> > 5)
> >
> > But 006:60010/master.jsp refreshes quickly and shows all three
> > regionservers.
> > Don't know why hbase shell encountered NSRE.
> >
> > >
> > > Thanks
>
>

RE: slow response in hbase shell

Posted by Jonathan Gray <jl...@streamy.com>.
Seems like something weird is going on with your regionservers and
balancing.

Can you post big snippets from the regionserver and master logs?  Have you
checked to see what's going on?  Is there repetitive balancing going on that
never seems to reach steady-state?

How many regions and how many nodes on which version of HBase?

> -----Original Message-----
> From: Ted Yu [mailto:yuzhihong@gmail.com]
> Sent: Friday, March 12, 2010 8:24 PM
> To: hbase-user@hadoop.apache.org
> Subject: slow response in hbase shell
> 
> Hi,
> 
> > We sometimes saw over 5 second delay running get in hbase 0.20.1
> shell:
> >
> > hbase(main):002:0> get 'ruletable', 'ca.tsn.www'
> > 0 row(s) in 10.1330 seconds
> >
> > From our 3 region servers there are a lot of such messages:
> >
> 
> 
> > 2010-03-12 00:00:00,996 INFO  [regionserver/10.10.31.135:60020]
> > regionserver.HRegionServer(493): MSG_REGION_CLOSE:
> >
> crawltable,com.pandora.www:http\x2Finclude\x2FlyricsAdEmbed.html\x3Fgen
> re\x3Delectronica\x26artist\x3DR273847\x26webname\x3D\x26sz\x3D2000x8\x
> 26ord\x3D125823029226371645\x26tile\x3D3\x26_id\x3Dbottom_leaderboard_c
> ontainer,1266944566406:
> > Overloaded
> > 2010-03-12 00:00:00,997 INFO  [regionserver/10.10.31.135:60020]
> > regionserver.HRegionServer(493): MSG_REGION_CLOSE:
> > domaincrawltable,,1268098564908: Overloaded
> >
> > grep Overloaded hbase-hbaseadmin-regionserver.log | wc
> >   40428  343638 11523230
> >
> > grep Overloaded hbase-hbaseadmin-regionserver.log | wc
> >   40430  343655 11307703
> >
> > grep Overloaded hbase-hbaseadmin-regionserver.log | wc
> >   40466  343961 11340379
> >
> > Was the slow response due to the load balancing ?
> >
> 
> The strange thing was that after several quick responses I would see:
> 
> hbase(main):004:0> get 'ruletable', 'com.about.acne'
> COLUMN
> CELL
>  lpm_1.0:category            timestamp=1268347483823, value=
> http://acne.about.com\t9002:0.86580       086\thost\n
> 1 row(s) in 0.0040 seconds
> hbase(main):005:0> get 'ruletable', 'com.about.acne'
> COLUMN
> CELL
>  lpm_1.0:category            timestamp=1268347483823, value=
> http://acne.about.com\t9002:0.86580       086\thost\n
> 1 row(s) in 0.0040 seconds
> hbase(main):006:0> get 'ruletable', 'com.about.acne'
> NativeException:
> org.apache.hadoop.hbase.client.RetriesExhaustedException:
> Trying to contact region server 10.10.31.136:60020 for region
> ruletable,,1268083966723, row 'com.about.acne', but failed after 5
> attempts.
> Exceptions:
> org.apache.hadoop.hbase.NotServingRegionException:
> org.apache.hadoop.hbase.NotServingRegionException:
> ruletable,,1268083966723
>         at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSer
> ver.java:2307)
>         at
> org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.ja
> va:1784)
>         at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso
> rImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
>         at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:91
> 5)
> 
> But 006:60010/master.jsp refreshes quickly and shows all three
> regionservers.
> Don't know why hbase shell encountered NSRE.
> 
> >
> > Thanks