You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Ted Yu <yu...@gmail.com> on 2010/03/13 05:24:14 UTC
slow response in hbase shell
Hi,
> We sometimes saw over 5 second delay running get in hbase 0.20.1 shell:
>
> hbase(main):002:0> get 'ruletable', 'ca.tsn.www'
> 0 row(s) in 10.1330 seconds
>
> From our 3 region servers there are a lot of such messages:
>
> 2010-03-12 00:00:00,996 INFO [regionserver/10.10.31.135:60020]
> regionserver.HRegionServer(493): MSG_REGION_CLOSE:
> crawltable,com.pandora.www:http\x2Finclude\x2FlyricsAdEmbed.html\x3Fgenre\x3Delectronica\x26artist\x3DR273847\x26webname\x3D\x26sz\x3D2000x8\x26ord\x3D125823029226371645\x26tile\x3D3\x26_id\x3Dbottom_leaderboard_container,1266944566406:
> Overloaded
> 2010-03-12 00:00:00,997 INFO [regionserver/10.10.31.135:60020]
> regionserver.HRegionServer(493): MSG_REGION_CLOSE:
> domaincrawltable,,1268098564908: Overloaded
>
> grep Overloaded hbase-hbaseadmin-regionserver.log | wc
> 40428 343638 11523230
>
> grep Overloaded hbase-hbaseadmin-regionserver.log | wc
> 40430 343655 11307703
>
> grep Overloaded hbase-hbaseadmin-regionserver.log | wc
> 40466 343961 11340379
>
> Was the slow response due to the load balancing ?
>
The strange thing was that after several quick responses I would see:
hbase(main):004:0> get 'ruletable', 'com.about.acne'
COLUMN
CELL
lpm_1.0:category timestamp=1268347483823, value=
http://acne.about.com\t9002:0.86580 086\thost\n
1 row(s) in 0.0040 seconds
hbase(main):005:0> get 'ruletable', 'com.about.acne'
COLUMN
CELL
lpm_1.0:category timestamp=1268347483823, value=
http://acne.about.com\t9002:0.86580 086\thost\n
1 row(s) in 0.0040 seconds
hbase(main):006:0> get 'ruletable', 'com.about.acne'
NativeException: org.apache.hadoop.hbase.client.RetriesExhaustedException:
Trying to contact region server 10.10.31.136:60020 for region
ruletable,,1268083966723, row 'com.about.acne', but failed after 5 attempts.
Exceptions:
org.apache.hadoop.hbase.NotServingRegionException:
org.apache.hadoop.hbase.NotServingRegionException: ruletable,,1268083966723
at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2307)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1784)
at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
But 006:60010/master.jsp refreshes quickly and shows all three
regionservers.
Don't know why hbase shell encountered NSRE.
>
> Thanks
RE: slow response in hbase shell
Posted by Jonathan Gray <jl...@streamy.com>.
Ted,
It's hard to tell without having full logs, but it seems that this is caught
in a reassignment loop. I see things getting unassigned and reassigned over
and over.
Are there any weird network or dns issues on this cluster? It shows avg
load at 7 not 13, so seems that there could be some extra regionserver or
something else wonky.
Have you tried restarting your cluster?
> -----Original Message-----
> From: Ted Yu [mailto:yuzhihong@gmail.com]
> Sent: Saturday, March 13, 2010 1:59 PM
> To: hbase-user@hadoop.apache.org
> Subject: Re: slow response in hbase shell
>
> Here is region server log:
> http://pastebin.com/6d1QgNXR
>
> There was only one exception around the time I performed get:
> 2010-03-13 03:38:30,325 INFO [regionserver/10.10.31.137:60020.worker]
> regionserver.HRegion(342): region
> domaincrawltable,com.intensedebate.s:http\x2Fsignup,1268098564908/60229
> 3480
> available; sequence id is 225656738
> 2010-03-13 03:38:30,476 ERROR [IPC Server handler 23 on 60020]
> regionserver.HRegionServer(844):
> org.apache.hadoop.hbase.NotServingRegionException:
> ruletable,com.hoovers.www/companyindex/New_Mexico/Corrales/Telecommunic
> ations-1.html,1268084284374
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSer
> ver.java:2307)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.ja
> va:1784)
> at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso
> rImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:91
> 5)
>
> Here is master server log which doesn't have exception at all:
> http://pastebin.com/80949RK2
>
> On Sat, Mar 13, 2010 at 10:10 AM, Jonathan Gray <jl...@streamy.com>
> wrote:
>
> > Ted,
> >
> >
> >
> > Your attachments didn't come through. Try putting them up on the web
> or
> > pastebin somewhere.
> >
> >
> >
> > What's happening in the RegionServer logs between the time that the
> get
> > works and the get doesn't work?
> >
> >
> >
> > Also, I recommend upgrading to 0.20.3, there are critical fixes.
> >
> >
> >
> > From: Ted Yu [mailto:yuzhihong@gmail.com]
> > Sent: Saturday, March 13, 2010 3:48 AM
> > To: hbase-user@hadoop.apache.org
> > Subject: Re: slow response in hbase shell
> >
> >
> >
> > We use hbase 0.20.1
> > There are 3 region servers. 13 regions on each server.
> >
> > I don't see any exception in master log.
> >
> > I was able to run 10 successful get commands before hitting the
> following:
> > I am attaching (partial) master log and region server log from
> 10.10.31.137
> >
> > hbase(main):014:0> get 'ruletable', 'com.about.acne'
> > COLUMN CELL
> > lpm_1.0:category timestamp=1268347483823,
> > value=http://acne.about.com\t9002:0.86580086\thost\n
> > 1 row(s) in 0.0120 seconds
> > hbase(main):015:0> get 'ruletable', 'com.about.acne'
> > NativeException:
> org.apache.hadoop.hbase.client.RetriesExhaustedException:
> > Trying to contact region server 10.10.31.137:60020 for region
> > ruletable,,1268083966723, row 'com.about.acne', but failed after 5
> > attempts.
> > Exceptions:
> > org.apache.hadoop.hbase.NotServingRegionException:
> > org.apache.hadoop.hbase.NotServingRegionException:
> ruletable,,1268083966723
> > at
> >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSer
> ver.j
> > ava:2307)
> > at
> >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.ja
> va:17
> > 84)
> > at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
> > at
> >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso
> rImpl
> > .java:25)
> > at java.lang.reflect.Method.invoke(Method.java:597)
> > at
> > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
> > at
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:91
> 5)
> >
> >
> >
> > On Fri, Mar 12, 2010 at 9:08 PM, Jonathan Gray <jl...@streamy.com>
> wrote:
> >
> > Seems like something weird is going on with your regionservers and
> > balancing.
> >
> > Can you post big snippets from the regionserver and master logs?
> Have you
> > checked to see what's going on? Is there repetitive balancing going
> on
> > that
> > never seems to reach steady-state?
> >
> > How many regions and how many nodes on which version of HBase?
> >
> >
> > > -----Original Message-----
> > > From: Ted Yu [mailto:yuzhihong@gmail.com]
> > > Sent: Friday, March 12, 2010 8:24 PM
> > > To: hbase-user@hadoop.apache.org
> > > Subject: slow response in hbase shell
> > >
> > > Hi,
> > >
> > > > We sometimes saw over 5 second delay running get in hbase 0.20.1
> > > shell:
> > > >
> > > > hbase(main):002:0> get 'ruletable', 'ca.tsn.www'
> > > > 0 row(s) in 10.1330 seconds
> > > >
> > > > From our 3 region servers there are a lot of such messages:
> > > >
> > >
> > >
> > > > 2010-03-12 00:00:00,996 INFO [regionserver/10.10.31.135:60020]
> > > > regionserver.HRegionServer(493): MSG_REGION_CLOSE:
> > > >
> > >
> crawltable,com.pandora.www:http\x2Finclude\x2FlyricsAdEmbed.html\x3Fgen
> > >
> re\x3Delectronica\x26artist\x3DR273847\x26webname\x3D\x26sz\x3D2000x8\x
> > >
> 26ord\x3D125823029226371645\x26tile\x3D3\x26_id\x3Dbottom_leaderboard_c
> > > ontainer,1266944566406:
> > > > Overloaded
> > > > 2010-03-12 00:00:00,997 INFO [regionserver/10.10.31.135:60020]
> > > > regionserver.HRegionServer(493): MSG_REGION_CLOSE:
> > > > domaincrawltable,,1268098564908: Overloaded
> > > >
> > > > grep Overloaded hbase-hbaseadmin-regionserver.log | wc
> > > > 40428 343638 11523230
> > > >
> > > > grep Overloaded hbase-hbaseadmin-regionserver.log | wc
> > > > 40430 343655 11307703
> > > >
> > > > grep Overloaded hbase-hbaseadmin-regionserver.log | wc
> > > > 40466 343961 11340379
> > > >
> > > > Was the slow response due to the load balancing ?
> > > >
> > >
> > > The strange thing was that after several quick responses I would
> see:
> > >
> > > hbase(main):004:0> get 'ruletable', 'com.about.acne'
> > > COLUMN
> > > CELL
> > > lpm_1.0:category timestamp=1268347483823, value=
> > > http://acne.about.com\t9002:0.86580 086\thost\n
> > > 1 row(s) in 0.0040 seconds
> > > hbase(main):005:0> get 'ruletable', 'com.about.acne'
> > > COLUMN
> > > CELL
> > > lpm_1.0:category timestamp=1268347483823, value=
> > > http://acne.about.com\t9002:0.86580 086\thost\n
> > > 1 row(s) in 0.0040 seconds
> > > hbase(main):006:0> get 'ruletable', 'com.about.acne'
> > > NativeException:
> > > org.apache.hadoop.hbase.client.RetriesExhaustedException:
> > > Trying to contact region server 10.10.31.136:60020 for region
> > > ruletable,,1268083966723, row 'com.about.acne', but failed after 5
> > > attempts.
> > > Exceptions:
> > > org.apache.hadoop.hbase.NotServingRegionException:
> > > org.apache.hadoop.hbase.NotServingRegionException:
> > > ruletable,,1268083966723
> > > at
> > >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSer
> > > ver.java:2307)
> > > at
> > >
> org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.ja
> > > va:1784)
> > > at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown
> Source)
> > > at
> > >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso
> > > rImpl.java:25)
> > > at java.lang.reflect.Method.invoke(Method.java:597)
> > > at
> > > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
> > > at
> > >
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:91
> > > 5)
> > >
> > > But 006:60010/master.jsp refreshes quickly and shows all three
> > > regionservers.
> > > Don't know why hbase shell encountered NSRE.
> > >
> > > >
> > > > Thanks
> >
> >
> >
> >
Re: slow response in hbase shell
Posted by Ted Yu <yu...@gmail.com>.
Here is region server log:
http://pastebin.com/6d1QgNXR
There was only one exception around the time I performed get:
2010-03-13 03:38:30,325 INFO [regionserver/10.10.31.137:60020.worker]
regionserver.HRegion(342): region
domaincrawltable,com.intensedebate.s:http\x2Fsignup,1268098564908/602293480
available; sequence id is 225656738
2010-03-13 03:38:30,476 ERROR [IPC Server handler 23 on 60020]
regionserver.HRegionServer(844):
org.apache.hadoop.hbase.NotServingRegionException:
ruletable,com.hoovers.www/companyindex/New_Mexico/Corrales/Telecommunications-1.html,1268084284374
at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2307)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1784)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
Here is master server log which doesn't have exception at all:
http://pastebin.com/80949RK2
On Sat, Mar 13, 2010 at 10:10 AM, Jonathan Gray <jl...@streamy.com> wrote:
> Ted,
>
>
>
> Your attachments didn't come through. Try putting them up on the web or
> pastebin somewhere.
>
>
>
> What's happening in the RegionServer logs between the time that the get
> works and the get doesn't work?
>
>
>
> Also, I recommend upgrading to 0.20.3, there are critical fixes.
>
>
>
> From: Ted Yu [mailto:yuzhihong@gmail.com]
> Sent: Saturday, March 13, 2010 3:48 AM
> To: hbase-user@hadoop.apache.org
> Subject: Re: slow response in hbase shell
>
>
>
> We use hbase 0.20.1
> There are 3 region servers. 13 regions on each server.
>
> I don't see any exception in master log.
>
> I was able to run 10 successful get commands before hitting the following:
> I am attaching (partial) master log and region server log from 10.10.31.137
>
> hbase(main):014:0> get 'ruletable', 'com.about.acne'
> COLUMN CELL
> lpm_1.0:category timestamp=1268347483823,
> value=http://acne.about.com\t9002:0.86580086\thost\n
> 1 row(s) in 0.0120 seconds
> hbase(main):015:0> get 'ruletable', 'com.about.acne'
> NativeException: org.apache.hadoop.hbase.client.RetriesExhaustedException:
> Trying to contact region server 10.10.31.137:60020 for region
> ruletable,,1268083966723, row 'com.about.acne', but failed after 5
> attempts.
> Exceptions:
> org.apache.hadoop.hbase.NotServingRegionException:
> org.apache.hadoop.hbase.NotServingRegionException: ruletable,,1268083966723
> at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.j
> ava:2307)
> at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:17
> 84)
> at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
> .java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
>
>
>
> On Fri, Mar 12, 2010 at 9:08 PM, Jonathan Gray <jl...@streamy.com> wrote:
>
> Seems like something weird is going on with your regionservers and
> balancing.
>
> Can you post big snippets from the regionserver and master logs? Have you
> checked to see what's going on? Is there repetitive balancing going on
> that
> never seems to reach steady-state?
>
> How many regions and how many nodes on which version of HBase?
>
>
> > -----Original Message-----
> > From: Ted Yu [mailto:yuzhihong@gmail.com]
> > Sent: Friday, March 12, 2010 8:24 PM
> > To: hbase-user@hadoop.apache.org
> > Subject: slow response in hbase shell
> >
> > Hi,
> >
> > > We sometimes saw over 5 second delay running get in hbase 0.20.1
> > shell:
> > >
> > > hbase(main):002:0> get 'ruletable', 'ca.tsn.www'
> > > 0 row(s) in 10.1330 seconds
> > >
> > > From our 3 region servers there are a lot of such messages:
> > >
> >
> >
> > > 2010-03-12 00:00:00,996 INFO [regionserver/10.10.31.135:60020]
> > > regionserver.HRegionServer(493): MSG_REGION_CLOSE:
> > >
> > crawltable,com.pandora.www:http\x2Finclude\x2FlyricsAdEmbed.html\x3Fgen
> > re\x3Delectronica\x26artist\x3DR273847\x26webname\x3D\x26sz\x3D2000x8\x
> > 26ord\x3D125823029226371645\x26tile\x3D3\x26_id\x3Dbottom_leaderboard_c
> > ontainer,1266944566406:
> > > Overloaded
> > > 2010-03-12 00:00:00,997 INFO [regionserver/10.10.31.135:60020]
> > > regionserver.HRegionServer(493): MSG_REGION_CLOSE:
> > > domaincrawltable,,1268098564908: Overloaded
> > >
> > > grep Overloaded hbase-hbaseadmin-regionserver.log | wc
> > > 40428 343638 11523230
> > >
> > > grep Overloaded hbase-hbaseadmin-regionserver.log | wc
> > > 40430 343655 11307703
> > >
> > > grep Overloaded hbase-hbaseadmin-regionserver.log | wc
> > > 40466 343961 11340379
> > >
> > > Was the slow response due to the load balancing ?
> > >
> >
> > The strange thing was that after several quick responses I would see:
> >
> > hbase(main):004:0> get 'ruletable', 'com.about.acne'
> > COLUMN
> > CELL
> > lpm_1.0:category timestamp=1268347483823, value=
> > http://acne.about.com\t9002:0.86580 086\thost\n
> > 1 row(s) in 0.0040 seconds
> > hbase(main):005:0> get 'ruletable', 'com.about.acne'
> > COLUMN
> > CELL
> > lpm_1.0:category timestamp=1268347483823, value=
> > http://acne.about.com\t9002:0.86580 086\thost\n
> > 1 row(s) in 0.0040 seconds
> > hbase(main):006:0> get 'ruletable', 'com.about.acne'
> > NativeException:
> > org.apache.hadoop.hbase.client.RetriesExhaustedException:
> > Trying to contact region server 10.10.31.136:60020 for region
> > ruletable,,1268083966723, row 'com.about.acne', but failed after 5
> > attempts.
> > Exceptions:
> > org.apache.hadoop.hbase.NotServingRegionException:
> > org.apache.hadoop.hbase.NotServingRegionException:
> > ruletable,,1268083966723
> > at
> > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSer
> > ver.java:2307)
> > at
> > org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.ja
> > va:1784)
> > at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)
> > at
> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso
> > rImpl.java:25)
> > at java.lang.reflect.Method.invoke(Method.java:597)
> > at
> > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
> > at
> > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:91
> > 5)
> >
> > But 006:60010/master.jsp refreshes quickly and shows all three
> > regionservers.
> > Don't know why hbase shell encountered NSRE.
> >
> > >
> > > Thanks
>
>
>
>
RE: slow response in hbase shell
Posted by Jonathan Gray <jl...@streamy.com>.
Ted,
Your attachments didn't come through. Try putting them up on the web or
pastebin somewhere.
What's happening in the RegionServer logs between the time that the get
works and the get doesn't work?
Also, I recommend upgrading to 0.20.3, there are critical fixes.
From: Ted Yu [mailto:yuzhihong@gmail.com]
Sent: Saturday, March 13, 2010 3:48 AM
To: hbase-user@hadoop.apache.org
Subject: Re: slow response in hbase shell
We use hbase 0.20.1
There are 3 region servers. 13 regions on each server.
I don't see any exception in master log.
I was able to run 10 successful get commands before hitting the following:
I am attaching (partial) master log and region server log from 10.10.31.137
hbase(main):014:0> get 'ruletable', 'com.about.acne'
COLUMN CELL
lpm_1.0:category timestamp=1268347483823,
value=http://acne.about.com\t9002:0.86580086\thost\n
1 row(s) in 0.0120 seconds
hbase(main):015:0> get 'ruletable', 'com.about.acne'
NativeException: org.apache.hadoop.hbase.client.RetriesExhaustedException:
Trying to contact region server 10.10.31.137:60020 for region
ruletable,,1268083966723, row 'com.about.acne', but failed after 5 attempts.
Exceptions:
org.apache.hadoop.hbase.NotServingRegionException:
org.apache.hadoop.hbase.NotServingRegionException: ruletable,,1268083966723
at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.j
ava:2307)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:17
84)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
On Fri, Mar 12, 2010 at 9:08 PM, Jonathan Gray <jl...@streamy.com> wrote:
Seems like something weird is going on with your regionservers and
balancing.
Can you post big snippets from the regionserver and master logs? Have you
checked to see what's going on? Is there repetitive balancing going on that
never seems to reach steady-state?
How many regions and how many nodes on which version of HBase?
> -----Original Message-----
> From: Ted Yu [mailto:yuzhihong@gmail.com]
> Sent: Friday, March 12, 2010 8:24 PM
> To: hbase-user@hadoop.apache.org
> Subject: slow response in hbase shell
>
> Hi,
>
> > We sometimes saw over 5 second delay running get in hbase 0.20.1
> shell:
> >
> > hbase(main):002:0> get 'ruletable', 'ca.tsn.www'
> > 0 row(s) in 10.1330 seconds
> >
> > From our 3 region servers there are a lot of such messages:
> >
>
>
> > 2010-03-12 00:00:00,996 INFO [regionserver/10.10.31.135:60020]
> > regionserver.HRegionServer(493): MSG_REGION_CLOSE:
> >
> crawltable,com.pandora.www:http\x2Finclude\x2FlyricsAdEmbed.html\x3Fgen
> re\x3Delectronica\x26artist\x3DR273847\x26webname\x3D\x26sz\x3D2000x8\x
> 26ord\x3D125823029226371645\x26tile\x3D3\x26_id\x3Dbottom_leaderboard_c
> ontainer,1266944566406:
> > Overloaded
> > 2010-03-12 00:00:00,997 INFO [regionserver/10.10.31.135:60020]
> > regionserver.HRegionServer(493): MSG_REGION_CLOSE:
> > domaincrawltable,,1268098564908: Overloaded
> >
> > grep Overloaded hbase-hbaseadmin-regionserver.log | wc
> > 40428 343638 11523230
> >
> > grep Overloaded hbase-hbaseadmin-regionserver.log | wc
> > 40430 343655 11307703
> >
> > grep Overloaded hbase-hbaseadmin-regionserver.log | wc
> > 40466 343961 11340379
> >
> > Was the slow response due to the load balancing ?
> >
>
> The strange thing was that after several quick responses I would see:
>
> hbase(main):004:0> get 'ruletable', 'com.about.acne'
> COLUMN
> CELL
> lpm_1.0:category timestamp=1268347483823, value=
> http://acne.about.com\t9002:0.86580 086\thost\n
> 1 row(s) in 0.0040 seconds
> hbase(main):005:0> get 'ruletable', 'com.about.acne'
> COLUMN
> CELL
> lpm_1.0:category timestamp=1268347483823, value=
> http://acne.about.com\t9002:0.86580 086\thost\n
> 1 row(s) in 0.0040 seconds
> hbase(main):006:0> get 'ruletable', 'com.about.acne'
> NativeException:
> org.apache.hadoop.hbase.client.RetriesExhaustedException:
> Trying to contact region server 10.10.31.136:60020 for region
> ruletable,,1268083966723, row 'com.about.acne', but failed after 5
> attempts.
> Exceptions:
> org.apache.hadoop.hbase.NotServingRegionException:
> org.apache.hadoop.hbase.NotServingRegionException:
> ruletable,,1268083966723
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSer
> ver.java:2307)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.ja
> va:1784)
> at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso
> rImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:91
> 5)
>
> But 006:60010/master.jsp refreshes quickly and shows all three
> regionservers.
> Don't know why hbase shell encountered NSRE.
>
> >
> > Thanks
Re: slow response in hbase shell
Posted by Stack <st...@duboce.net>.
As JG says, please update to 0.20.3. Its a waste of everyone's time
spending our time debugging only to find the issue already resolved in
a later release.
Also as JG suggests, there is something up w/ your region balancer.
This has been going on for a while now going by the messages here and
those you've been sending me off list. At a guess, the occasional
long delay is because a region is because region is being redeployed
elsewhere (even the redeploy is faster in 0.20.3 -- another reason so
to update).
St.Ack
On Sat, Mar 13, 2010 at 3:48 AM, Ted Yu <yu...@gmail.com> wrote:
> We use hbase 0.20.1
> There are 3 region servers. 13 regions on each server.
>
> I don't see any exception in master log.
>
> I was able to run 10 successful get commands before hitting the following:
> I am attaching (partial) master log and region server log from 10.10.31.137
>
> hbase(main):014:0> get 'ruletable', 'com.about.acne'
> COLUMN CELL
> lpm_1.0:category timestamp=1268347483823,
> value=http://acne.about.com\t9002:0.86580086\thost\n
> 1 row(s) in 0.0120 seconds
> hbase(main):015:0> get 'ruletable', 'com.about.acne'
> NativeException: org.apache.hadoop.hbase.client.RetriesExhaustedException:
> Trying to contact region server 10.10.31.137:60020 for region
> ruletable,,1268083966723, row 'com.about.acne', but failed after 5 attempts.
> Exceptions:
> org.apache.hadoop.hbase.NotServingRegionException:
> org.apache.hadoop.hbase.NotServingRegionException: ruletable,,1268083966723
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2307)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1784)
> at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
>
>
> On Fri, Mar 12, 2010 at 9:08 PM, Jonathan Gray <jl...@streamy.com> wrote:
>>
>> Seems like something weird is going on with your regionservers and
>> balancing.
>>
>> Can you post big snippets from the regionserver and master logs? Have you
>> checked to see what's going on? Is there repetitive balancing going on
>> that
>> never seems to reach steady-state?
>>
>> How many regions and how many nodes on which version of HBase?
>>
>> > -----Original Message-----
>> > From: Ted Yu [mailto:yuzhihong@gmail.com]
>> > Sent: Friday, March 12, 2010 8:24 PM
>> > To: hbase-user@hadoop.apache.org
>> > Subject: slow response in hbase shell
>> >
>> > Hi,
>> >
>> > > We sometimes saw over 5 second delay running get in hbase 0.20.1
>> > shell:
>> > >
>> > > hbase(main):002:0> get 'ruletable', 'ca.tsn.www'
>> > > 0 row(s) in 10.1330 seconds
>> > >
>> > > From our 3 region servers there are a lot of such messages:
>> > >
>> >
>> >
>> > > 2010-03-12 00:00:00,996 INFO [regionserver/10.10.31.135:60020]
>> > > regionserver.HRegionServer(493): MSG_REGION_CLOSE:
>> > >
>> > crawltable,com.pandora.www:http\x2Finclude\x2FlyricsAdEmbed.html\x3Fgen
>> > re\x3Delectronica\x26artist\x3DR273847\x26webname\x3D\x26sz\x3D2000x8\x
>> > 26ord\x3D125823029226371645\x26tile\x3D3\x26_id\x3Dbottom_leaderboard_c
>> > ontainer,1266944566406:
>> > > Overloaded
>> > > 2010-03-12 00:00:00,997 INFO [regionserver/10.10.31.135:60020]
>> > > regionserver.HRegionServer(493): MSG_REGION_CLOSE:
>> > > domaincrawltable,,1268098564908: Overloaded
>> > >
>> > > grep Overloaded hbase-hbaseadmin-regionserver.log | wc
>> > > 40428 343638 11523230
>> > >
>> > > grep Overloaded hbase-hbaseadmin-regionserver.log | wc
>> > > 40430 343655 11307703
>> > >
>> > > grep Overloaded hbase-hbaseadmin-regionserver.log | wc
>> > > 40466 343961 11340379
>> > >
>> > > Was the slow response due to the load balancing ?
>> > >
>> >
>> > The strange thing was that after several quick responses I would see:
>> >
>> > hbase(main):004:0> get 'ruletable', 'com.about.acne'
>> > COLUMN
>> > CELL
>> > lpm_1.0:category timestamp=1268347483823, value=
>> > http://acne.about.com\t9002:0.86580 086\thost\n
>> > 1 row(s) in 0.0040 seconds
>> > hbase(main):005:0> get 'ruletable', 'com.about.acne'
>> > COLUMN
>> > CELL
>> > lpm_1.0:category timestamp=1268347483823, value=
>> > http://acne.about.com\t9002:0.86580 086\thost\n
>> > 1 row(s) in 0.0040 seconds
>> > hbase(main):006:0> get 'ruletable', 'com.about.acne'
>> > NativeException:
>> > org.apache.hadoop.hbase.client.RetriesExhaustedException:
>> > Trying to contact region server 10.10.31.136:60020 for region
>> > ruletable,,1268083966723, row 'com.about.acne', but failed after 5
>> > attempts.
>> > Exceptions:
>> > org.apache.hadoop.hbase.NotServingRegionException:
>> > org.apache.hadoop.hbase.NotServingRegionException:
>> > ruletable,,1268083966723
>> > at
>> > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSer
>> > ver.java:2307)
>> > at
>> > org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.ja
>> > va:1784)
>> > at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)
>> > at
>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso
>> > rImpl.java:25)
>> > at java.lang.reflect.Method.invoke(Method.java:597)
>> > at
>> > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
>> > at
>> > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:91
>> > 5)
>> >
>> > But 006:60010/master.jsp refreshes quickly and shows all three
>> > regionservers.
>> > Don't know why hbase shell encountered NSRE.
>> >
>> > >
>> > > Thanks
>>
>
>
Re: slow response in hbase shell
Posted by Ted Yu <yu...@gmail.com>.
We use hbase 0.20.1
There are 3 region servers. 13 regions on each server.
I don't see any exception in master log.
I was able to run 10 successful get commands before hitting the following:
I am attaching (partial) master log and region server log from 10.10.31.137
hbase(main):014:0> get 'ruletable', 'com.about.acne'
COLUMN CELL
lpm_1.0:category timestamp=1268347483823, value=
http://acne.about.com\t9002:0.86580086\thost\n
1 row(s) in 0.0120 seconds
hbase(main):015:0> get 'ruletable', 'com.about.acne'
NativeException: org.apache.hadoop.hbase.client.RetriesExhaustedException:
Trying to contact region server 10.10.31.137:60020 for region
ruletable,,1268083966723, row 'com.about.acne', but failed after 5 attempts.
Exceptions:
org.apache.hadoop.hbase.NotServingRegionException:
org.apache.hadoop.hbase.NotServingRegionException: ruletable,,1268083966723
at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2307)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1784)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
On Fri, Mar 12, 2010 at 9:08 PM, Jonathan Gray <jl...@streamy.com> wrote:
> Seems like something weird is going on with your regionservers and
> balancing.
>
> Can you post big snippets from the regionserver and master logs? Have you
> checked to see what's going on? Is there repetitive balancing going on
> that
> never seems to reach steady-state?
>
> How many regions and how many nodes on which version of HBase?
>
> > -----Original Message-----
> > From: Ted Yu [mailto:yuzhihong@gmail.com]
> > Sent: Friday, March 12, 2010 8:24 PM
> > To: hbase-user@hadoop.apache.org
> > Subject: slow response in hbase shell
> >
> > Hi,
> >
> > > We sometimes saw over 5 second delay running get in hbase 0.20.1
> > shell:
> > >
> > > hbase(main):002:0> get 'ruletable', 'ca.tsn.www'
> > > 0 row(s) in 10.1330 seconds
> > >
> > > From our 3 region servers there are a lot of such messages:
> > >
> >
> >
> > > 2010-03-12 00:00:00,996 INFO [regionserver/10.10.31.135:60020]
> > > regionserver.HRegionServer(493): MSG_REGION_CLOSE:
> > >
> > crawltable,com.pandora.www:http\x2Finclude\x2FlyricsAdEmbed.html\x3Fgen
> > re\x3Delectronica\x26artist\x3DR273847\x26webname\x3D\x26sz\x3D2000x8\x
> > 26ord\x3D125823029226371645\x26tile\x3D3\x26_id\x3Dbottom_leaderboard_c
> > ontainer,1266944566406:
> > > Overloaded
> > > 2010-03-12 00:00:00,997 INFO [regionserver/10.10.31.135:60020]
> > > regionserver.HRegionServer(493): MSG_REGION_CLOSE:
> > > domaincrawltable,,1268098564908: Overloaded
> > >
> > > grep Overloaded hbase-hbaseadmin-regionserver.log | wc
> > > 40428 343638 11523230
> > >
> > > grep Overloaded hbase-hbaseadmin-regionserver.log | wc
> > > 40430 343655 11307703
> > >
> > > grep Overloaded hbase-hbaseadmin-regionserver.log | wc
> > > 40466 343961 11340379
> > >
> > > Was the slow response due to the load balancing ?
> > >
> >
> > The strange thing was that after several quick responses I would see:
> >
> > hbase(main):004:0> get 'ruletable', 'com.about.acne'
> > COLUMN
> > CELL
> > lpm_1.0:category timestamp=1268347483823, value=
> > http://acne.about.com\t9002:0.86580 086\thost\n
> > 1 row(s) in 0.0040 seconds
> > hbase(main):005:0> get 'ruletable', 'com.about.acne'
> > COLUMN
> > CELL
> > lpm_1.0:category timestamp=1268347483823, value=
> > http://acne.about.com\t9002:0.86580 086\thost\n
> > 1 row(s) in 0.0040 seconds
> > hbase(main):006:0> get 'ruletable', 'com.about.acne'
> > NativeException:
> > org.apache.hadoop.hbase.client.RetriesExhaustedException:
> > Trying to contact region server 10.10.31.136:60020 for region
> > ruletable,,1268083966723, row 'com.about.acne', but failed after 5
> > attempts.
> > Exceptions:
> > org.apache.hadoop.hbase.NotServingRegionException:
> > org.apache.hadoop.hbase.NotServingRegionException:
> > ruletable,,1268083966723
> > at
> > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSer
> > ver.java:2307)
> > at
> > org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.ja
> > va:1784)
> > at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)
> > at
> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso
> > rImpl.java:25)
> > at java.lang.reflect.Method.invoke(Method.java:597)
> > at
> > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
> > at
> > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:91
> > 5)
> >
> > But 006:60010/master.jsp refreshes quickly and shows all three
> > regionservers.
> > Don't know why hbase shell encountered NSRE.
> >
> > >
> > > Thanks
>
>
RE: slow response in hbase shell
Posted by Jonathan Gray <jl...@streamy.com>.
Seems like something weird is going on with your regionservers and
balancing.
Can you post big snippets from the regionserver and master logs? Have you
checked to see what's going on? Is there repetitive balancing going on that
never seems to reach steady-state?
How many regions and how many nodes on which version of HBase?
> -----Original Message-----
> From: Ted Yu [mailto:yuzhihong@gmail.com]
> Sent: Friday, March 12, 2010 8:24 PM
> To: hbase-user@hadoop.apache.org
> Subject: slow response in hbase shell
>
> Hi,
>
> > We sometimes saw over 5 second delay running get in hbase 0.20.1
> shell:
> >
> > hbase(main):002:0> get 'ruletable', 'ca.tsn.www'
> > 0 row(s) in 10.1330 seconds
> >
> > From our 3 region servers there are a lot of such messages:
> >
>
>
> > 2010-03-12 00:00:00,996 INFO [regionserver/10.10.31.135:60020]
> > regionserver.HRegionServer(493): MSG_REGION_CLOSE:
> >
> crawltable,com.pandora.www:http\x2Finclude\x2FlyricsAdEmbed.html\x3Fgen
> re\x3Delectronica\x26artist\x3DR273847\x26webname\x3D\x26sz\x3D2000x8\x
> 26ord\x3D125823029226371645\x26tile\x3D3\x26_id\x3Dbottom_leaderboard_c
> ontainer,1266944566406:
> > Overloaded
> > 2010-03-12 00:00:00,997 INFO [regionserver/10.10.31.135:60020]
> > regionserver.HRegionServer(493): MSG_REGION_CLOSE:
> > domaincrawltable,,1268098564908: Overloaded
> >
> > grep Overloaded hbase-hbaseadmin-regionserver.log | wc
> > 40428 343638 11523230
> >
> > grep Overloaded hbase-hbaseadmin-regionserver.log | wc
> > 40430 343655 11307703
> >
> > grep Overloaded hbase-hbaseadmin-regionserver.log | wc
> > 40466 343961 11340379
> >
> > Was the slow response due to the load balancing ?
> >
>
> The strange thing was that after several quick responses I would see:
>
> hbase(main):004:0> get 'ruletable', 'com.about.acne'
> COLUMN
> CELL
> lpm_1.0:category timestamp=1268347483823, value=
> http://acne.about.com\t9002:0.86580 086\thost\n
> 1 row(s) in 0.0040 seconds
> hbase(main):005:0> get 'ruletable', 'com.about.acne'
> COLUMN
> CELL
> lpm_1.0:category timestamp=1268347483823, value=
> http://acne.about.com\t9002:0.86580 086\thost\n
> 1 row(s) in 0.0040 seconds
> hbase(main):006:0> get 'ruletable', 'com.about.acne'
> NativeException:
> org.apache.hadoop.hbase.client.RetriesExhaustedException:
> Trying to contact region server 10.10.31.136:60020 for region
> ruletable,,1268083966723, row 'com.about.acne', but failed after 5
> attempts.
> Exceptions:
> org.apache.hadoop.hbase.NotServingRegionException:
> org.apache.hadoop.hbase.NotServingRegionException:
> ruletable,,1268083966723
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSer
> ver.java:2307)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.ja
> va:1784)
> at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso
> rImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:91
> 5)
>
> But 006:60010/master.jsp refreshes quickly and shows all three
> regionservers.
> Don't know why hbase shell encountered NSRE.
>
> >
> > Thanks