You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Gaojinchao <ga...@huawei.com> on 2011/04/14 04:55:04 UTC

a lots of error about Region has been OPEN for too long

In hbase version 0.90.1 .

Is there any experience ?

Hmaster Logs :
2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything

-----邮件原件-----
发件人: Ted Yu [mailto:yuzhihong@gmail.com] 
发送时间: 2011年4月14日 4:42
收件人: user@hbase.apache.org
主题: Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job

My comment on HBASE-3734 at 05/Apr/11 05:20 is related to this discussion.

On Wed, Apr 13, 2011 at 1:25 PM, Ruben Quintero <rf...@yahoo.com> wrote:

> Venkatesh, I guess the two quick and dirty solutions are:
>
> - Call deleteAllConnections(bool) at the end of your MapReduce jobs, or
> periodically. If you have no other tables or pools, etc. open, then no
> problem.
> If you do, they'll start throwing IOExceptions, but you can re-instantiate
> them
> with a new config and then continue as usual. (You do have to change the
> config
> or it'll simply grab the closed, cached one from the HCM).
>
> - As J-D said, subclasss TIF and basically copy the old setConf, except
> don't
> clone the conf that gets sent to the table.
>
> Each one has a downside and are definitely not ideal, but if you either
> don't
> modify the config in your job or don't have any other important hbase
> connections, then you can use the appropriate one.
>
>
> Thanks for the assistance, J-D. It's great that these forums are active and
> helpful.
>
> - Ruben
>
>
>
>
> ________________________________
> From: Jean-Daniel Cryans <jd...@apache.org>
> To: user@hbase.apache.org
> Sent: Wed, April 13, 2011 3:50:42 PM
> Subject: Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job
>
> Yeah for a JVM running forever it won't work.
>
> If you know for a fact that the configuration passed to TIF won't be
> changed then you can subclass it and override setConf to not clone the
> conf.
>
> J-D
>
> On Wed, Apr 13, 2011 at 12:45 PM, Ruben Quintero <rf...@yahoo.com>
> wrote:
> > The problem is the connections are never closed... so they just keep
> piling up
> > until it hits the max. My max is at 400 right now, so after 14-15 hours
> of
> > running, it gets stuck in an endless connection retry.
> >
> > I saw that the HConnectionManager will kick older HConnections out, but
> the
> > problem is that their ZooKeeper threads continue on. Those need to be
> >explicitly
> > closed.
> >
> >
> >
> > Again, this is only an issue inside JVMs set to run forever, like
> Venkatesh
> > said, because that's when the orphaned ZK connections will have a chance
> to
> > build up to whatever your maximum is. Setting that higher and higher is
> just
> > prolonging uptime before the eventual crash. It's essentially a memory
> > (connection) leak within TableInputFormat, since there is no way that I
> can
> see
> > to properly access and close those spawned connections.
> >
> > One question for you, JD: Inside of TableInputFormat.setConf, does the
> > Configuration need to be cloned? (i.e. setHTable(new HTable(new
> > Configuration(conf), tableName)); ). I'm guessing this is to prevent
> changes
> > within the job from affecting the table and vice-versa...but if it
> weren't
> > cloned, then you could use the job configuration (job.getConfiguration())
> to
> > close the connection....
> >
> > Other quick fixes that I can think of, none of which are very pretty:
> > 1 - Just call deleteAllConnections(bool), and have any other processes
> using
> > HConnections recover from that.
> > 2 - Make the static HBASE_INSTANCES map accessible (public).... then you
> could
> > iterate through open connections and try to match configs....
> >
> > Venkatesh - unless you have other processes in your JVM accessing HBase
> (I
> have
> > one), #1 might be the best bet.
> >
> > - Ruben
> >
> >
> >
> > ________________________________
> > From: Jean-Daniel Cryans <jd...@apache.org>
> > To: user@hbase.apache.org
> > Sent: Wed, April 13, 2011 3:22:48 PM
> > Subject: Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job
> >
> > Like I said, it's a zookeeper configuration that you can change. If
> > hbase is managing your zookeeper then set
> > hbase.zookeeper.property.maxClientCnxns to something higher than 30
> > and restart the zk server (can be done while hbase is running).
> >
> > J-D
> >
> > On Wed, Apr 13, 2011 at 12:04 PM, Venkatesh <vr...@aol.com>
> wrote:
> >> Reuben:
> >> Yes..I've the exact same issue now..& I'm also kicking off from another
> jvm
> >>that runs for ever..
> >> I don't have an alternate solution..either modify hbase code (or) modify
> my
> >>code to kick off
> >> as a standalone jvm (or) hopefully 0.90.3 release soon :)
> >> J-D/St.Ack may have some suggestions
> >>
> >>
> >> V
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> -----Original Message-----
> >> From: Ruben Quintero <rf...@yahoo.com>
> >> To: user@hbase.apache.org
> >> Sent: Wed, Apr 13, 2011 2:39 pm
> >> Subject: Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce
> job
> >>
> >>
> >> The problem I'm having is in getting the conf that is used to init the
> table
> >>
> >> within TableInputFormat. That's the one that's leaving open ZK
> connections
> for
> >>
> >> me.
> >>
> >>
> >>
> >> Following the code through, TableInputFormat initializes its HTable with
> new
> >>
> >> Configuration(new JobConf(conf)), where conf is the config I pass in via
> job
> >>
> >> initiation. I don't see a way of getting the initalized TableInputFormat
> in
> >>
> >> order to then get its table and its config to be able to properly close
> that
> >>
> >> connection. Cloned configs don't appear to produce similar hashes,
> either.
> The
> >>
> >> only other option I'm left with is closing all connections, but that
> disrupts
> >>
> >> things across the board.
> >>
> >>
> >>
> >>
> >>
> >> For MapReduce jobs run in their own JVM, this wouldn't be much of an
> issue,
> as
> >>
> >> the connection would just be closed on completion, but in my case (our
> code
> >>
> >> triggers the jobs internally), they simply pile up until the
> ConnectionLoss
> >>hits
> >>
> >>
> >>
> >> due to too many ZK connections.
> >>
> >>
> >>
> >> Am I missing a way to get that buried table's config, or another way to
> kill
> >>the
> >>
> >>
> >>
> >> orphaned connections?
> >>
> >>
> >>
> >> - Ruben
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> ________________________________
> >>
> >> From: Venkatesh <vr...@aol.com>
> >>
> >> To: user@hbase.apache.org
> >>
> >> Sent: Wed, April 13, 2011 10:20:50 AM
> >>
> >> Subject: Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce
> job
> >>
> >>
> >>
> >> Thanks J-D
> >>
> >> I made sure to pass conf objects around in places where I was n't..
> >>
> >> will give it a try
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> -----Original Message-----
> >>
> >> From: Jean-Daniel Cryans <jd...@apache.org>
> >>
> >> To: user@hbase.apache.org
> >>
> >> Sent: Tue, Apr 12, 2011 6:40 pm
> >>
> >> Subject: Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce
> job
> >>
> >>
> >>
> >>
> >>
> >> Yes there are a few places like that. Also when you create new
> >>
> >>
> >>
> >> HTables, you should also close their connections (this is not done in
> >>
> >>
> >>
> >> htable.close).
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> See HTable's javadoc which says:
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> Instances of HTable passed the same Configuration instance will share
> >>
> >>
> >>
> >> connections to servers out on the cluster and to the zookeeper
> >>
> >>
> >>
> >> ensemble as well as caches of region locations. This is usually a
> >>
> >>
> >>
> >> *good* thing. This happens because they will all share the same
> >>
> >>
> >>
> >> underlying HConnection instance. See HConnectionManager for more on
> >>
> >>
> >>
> >> how this mechanism works.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> and it points to HCM which has more information:
> >>
> >>
> >>
> >>
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HConnectionManager.html
> >>
> >>l
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> J-D
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Tue, Apr 12, 2011 at 3:09 PM, Ruben Quintero <rf...@yahoo.com>
> wrote:
> >>
> >>
> >>
> >>> I'm running into the same issue, but did some poking around and it
> seems
> that
> >>
> >>
> >>
> >>> Zookeeper connections are being left open by an HBase internal.
> >>
> >>
> >>
> >>>
> >>
> >>
> >>
> >>> Basically, I'm running a mapreduce job within another program, and
> noticed
> in
> >>
> >>
> >>
> >>> the logs that every time the job is run, a connection is open, but I
> never
> > see
> >>
> >>
> >>
> >>> it closed again. The connection is opened within the job.submit().
> >>
> >>
> >>
> >>>
> >>
> >>
> >>
> >>> I looked closer and checked the jstack after running it for just under
> an
> >>
> >>
> >>
> >> hour,
> >>
> >>
> >>
> >>> and sure enough there are a ton of Zookeeper threads just sitting
> there.
> >>
> >>
> >>
> >> Here's
> >>
> >>
> >>
> >>> a pastebin link: http://pastebin.com/MccEuvrc
> >>
> >>
> >>
> >>>
> >>
> >>
> >>
> >>> I'm running 0.90.0 right now.
> >>
> >>
> >>
> >>>
> >>
> >>
> >>
> >>> - Ruben
> >>
> >>
> >>
> >>>
> >>
> >>
> >>
> >>>
> >>
> >>
> >>
> >>>
> >>
> >>
> >>
> >>>
> >>
> >>
> >>
> >>>
> >>
> >>
> >>
> >>>
> >>
> >>
> >>
> >>> ________________________________
> >>
> >>
> >>
> >>> From: Jean-Daniel Cryans <jd...@apache.org>
> >>
> >>
> >>
> >>> To: user@hbase.apache.org
> >>
> >>
> >>
> >>> Sent: Tue, April 12, 2011 4:23:05 PM
> >>
> >>
> >>
> >>> Subject: Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce
> job
> >>
> >>
> >>
> >>>
> >>
> >>
> >>
> >>> It's more in the vain of
> >>
> >>
> >>
> >>> https://issues.apache.org/jira/browse/HBASE-3755 and
> >>
> >>
> >>
> >>> https://issues.apache.org/jira/browse/HBASE-3771
> >>
> >>
> >>
> >>>
> >>
> >>
> >>
> >>> Basically 0.90 has a regression regarding the handling of zookeeper
> >>
> >>
> >>
> >>> connections that make it that you have to be super careful not to have
> >>
> >>
> >>
> >>> more than 30 per machine (each new Configuration is one new ZK
> >>
> >>
> >>
> >>> connection). Upping your zookeeper max connection config should get
> >>
> >>
> >>
> >>> rid of your issues since you only get it occasionally.
> >>
> >>
> >>
> >>>
> >>
> >>
> >>
> >>> J-D
> >>
> >>
> >>
> >>>
> >>
> >>
> >>
> >>> On Tue, Apr 12, 2011 at 7:59 AM, Venkatesh <vr...@aol.com>
> wrote:
> >>
> >>
> >>
> >>>>
> >>
> >>
> >>
> >>>>  I get this occasionally..(not all the time)..Upgrading from 0.20.6 to
> > 0.90.2
> >>
> >>
> >>
> >>>> Is this issue same as this JIRA
> >>
> >>
> >>
> >>>> https://issues.apache.org/jira/browse/HBASE-3578
> >>
> >>
> >>
> >>>>
> >>
> >>
> >>
> >>>> I'm using HBaseConfiguration.create() & setting that in job
> >>
> >>
> >>
> >>>> thx
> >>
> >>
> >>
> >>>> v
> >>
> >>
> >>
> >>>>
> >>
> >>
> >>
> >>>>
> >>
> >>
> >>
> >>>>  2011-04-12 02:13:06,870 ERROR Timer-0
> >>
> >>
> >>
> >>>>org.apache.hadoop.hbase.mapreduce.TableInputFormat -
> >>
> >>
> >>
> >>>>org.apache.hadoop.hbase.ZooKeeperConnectionException:
> >>
> >>
> >>
> >>>>org.apache.hadoop.hbase.ZooKeeperConnectionException:
> >>
> >>
> >>
> >>>>org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode
> >>
> >>
> >>
> >> =
> >>
> >>
> >>
> >>>>ConnectionLoss for /hbase        at
> >>
> >>
> >>
>
> >>>>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:1000)
> >>>>
> >>>>
> >>
> >>>>
> >>
> >>
> >>
> >>>>
> >>
> >>
> >>
> >>>>        at
> >>
> >>
> >>
>
> >>>>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:303)
> >>>>
> >>>>
> >>
> >>>>
> >>
> >>
> >>
> >>>>
> >>
> >>
> >>
> >>>>        at
> >>
> >>
> >>
>
> >>>>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.<init>(HConnectionManager.java:294)
> >>>>
> >>>>
> >>
> >>>>
> >>
> >>
> >>
> >>>>
> >>
> >>
> >>
> >>>>        at
> >>
> >>
> >>
>
> >>>>org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:156)
> >>>>
> >>>>
> >>
> >>>>
> >>
> >>
> >>
> >>>>        at
> org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:167)
> >>
> >>
> >>
> >>>>        at
> org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:145)
> >>
> >>
> >>
> >>>>        at
> >>
> >>
> >>
>
> >>>>org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:91)
> >>>>
> >>>>
> >>
> >>>>
> >>
> >>
> >>
> >>>>
> >>
> >>
> >>
> >>>>        at
> >>
> >>
> >>
> >>>>org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
> >>
> >>
> >>
> >>>>        at
> >>
> >>
> >>
>
> >>>>org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
> >>
> >>
> >>
> >>>>        at
> >>
> >>
> >>
> >>> org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:882)
> >>
> >>
> >>
> >>>>        at
> >>
> >>
> >>
>
> >>>>org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
> >>
> >>
> >>
> >>>>        at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
> >>
> >>
> >>
> >>>>        at
> org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:448)
> >>
> >>
> >>
> >>>>
> >>
> >>
> >>
> >>>>
> >>
> >>
> >>
> >>>>
> >>
> >>
> >>
> >>>>
> >>
> >>
> >>
> >>>
> >>
> >>
> >>
> >>>
> >>
> >>
> >>
> >>>
> >>
> >>
> >>
> >>>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >
> >
> >
> >
>
>
>
>
>

Re: a lots of error about Region has been OPEN for too long

Posted by Stack <st...@duboce.net>.

On Sat, Apr 16, 2011 at 12:27 AM, Gaojinchao <ga...@huawei.com> wrote:
> Reproducing operarion(Region has been OPEN for too long):
> 1. some region server(has 1882 regions ) crashed and wait balance finished.

Thats a lot of regions.  You should up your region size.


> 2.stop hbase cluster

Had server recovery completed I wonder?  Had all regions from crashed
server been handed out?


> 3.start cluster.(it happened)
>

Are the opening too long messages coming because we were still
assigning regions on shutdown.  When we came back up, we carried on w/
assignments but the restart probably took a while on your cluster --
given as you have ~2k regions per server -- and so assignment timed
out.  Also, on startup, if lots of regions, the bulk assign on startup
may not complete before the timer cuts back in and you'll see some
timeouts on assignment.  Thats ok as long as we eventually assign out
all regions.

Update to 0.90.2 too?  There were some fixes around assignment that
may help here.

St.Ack

> My case is different for yours.
> Hmaster had received RS_ZK_REGION_OPENED. But processing OpenedRegionHandler is slow.
> It seems no bugs. I try to change parameter hbase.master.executor.openregion.threads.
>
>
> 2011-04-16 09:48:48,666 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=C4C5.site,60020,1302918210943, region=00621f2d5f3b0f7949bbe638d78e96a5
> 2011-04-16 09:49:20,770 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  ufdrA,5198613861538448,1301646236548.00621f2d5f3b0f7949bbe638d78e96a5. state=OPEN, ts=1302918528615
> 2011-04-16 09:49:29,371 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 00621f2d5f3b0f7949bbe638d78e96a5; deleting unassigned node
> 2011-04-16 09:49:29,371 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x2f5bdad8910000-0x2f5bdad8910000-0x2f5bdad8910000 Deleting existing unassigned node for 00621f2d5f3b0f7949bbe638d78e96a5 that is in expected state RS_ZK_REGION_OPENED
>
> -----邮件原件-----
> 发件人: jdcryans@gmail.com [mailto:jdcryans@gmail.com] 代表 Jean-Daniel Cryans
> 发送时间: 2011年4月15日 3:20
> 收件人: user@hbase.apache.org
> 主题: Re: a lots of error about Region has been OPEN for too long
>
> Taking a more deeper look at the code, the assumption I had that the
> message was sitting in the queue for too long is erroneous. The reason
> is that when the master says "Handling
> transition=RS_ZK_REGION_OPENED", it updates the timestamp that we use
> to keep track of the timeout which means that unless it stayed more
> than 60 seconds in the queue (it took only 3 here it seems), the issue
> is elsewhere.
>
> So logically the other reason would be that the master got the
> information too late, since we use the timestamp provided by the
> region server. Looking in the region server log, I was able to find:
>
> 2011-04-05 15:11:12,946 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZKAssign:
> regionserver:60020-0x22e286f0b9c98f7-0x22e286f0b9c98f7 Successfully
> transitioned node 0db7d1f58e4fced0a371aded0ddec281 from
> RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED
>
> This is 1 minute 7 seconds before the master got the message (I
> verified that our clocks are in perfect sync). This means this is
> another queue that's too slow (the ones that processes all the events
> coming from zookeeper).
>
> I'll keep digging.
>
> J-D
>
> On Thu, Apr 14, 2011 at 10:58 AM, Jean-Daniel Cryans
> <jd...@apache.org> wrote:
>> (please don't leave unrelated discussions at the tail of your emails)
>>
>> So I thought I never got that issue but wanted to make sure so I
>> grepped my logs and indeed saw I got it, so I what I did is that I
>> grepped the name of one of the regions that got the issue and looked
>> at what was happening at that time (which you should do in the
>> future). I see something like this:
>>
>> 2011-04-05 15:12:19,037 DEBUG
>> org.apache.hadoop.hbase.zookeeper.ZKUtil:
>> master:60000-0x42ec2cece810b68 Retrieved 115 byte(s) of data from
>> znode /prodjobs/unassigned/0db7d1f58e4fced0a371aded0ddec281 and set
>> watcher; region=tsdb,�M<��,1297818092053.0db7d1f58e4fced0a371aded0ddec281.,
>> server=sv4borg36,60020,1300313562191, state=RS_ZK_REGION_OPENED
>> 2011-04-05 15:12:19,037 DEBUG
>> org.apache.hadoop.hbase.master.AssignmentManager: Handling
>> transition=RS_ZK_REGION_OPENED, server=sv4borg36,60020,1300313562191,
>> region=0db7d1f58e4fced0a371aded0ddec281
>> ...
>> 2011-04-05 15:12:19,585 INFO
>> org.apache.hadoop.hbase.master.AssignmentManager: Regions in
>> transition timed out:
>> tsdb,\x00\x03\xCBM<}\x08\x00\x00\x01\x00\x00\x8A\x00\x00\x1D\x00\x01\xD1,1297818092053.0db7d1f58e4fced0a371aded0ddec281.
>> state=OPEN, ts=1302041472920
>> 2011-04-05 15:12:19,585 ERROR
>> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN
>> for too long, we don't know where region was opened so can't do
>> anything
>> ...
>> 2011-04-05 15:12:22,504 DEBUG
>> org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling
>> OPENED event for 0db7d1f58e4fced0a371aded0ddec281; deleting unassigned
>> node
>>
>>
>> So if I understand this correctly, the master already got the message
>> via ZooKeeper but it stayed in a queue for just long enough that the
>> RIT timed out and finally the OpenedRegionHandler was able to process
>> it. So in the end nothing looks broken, it just means that the master
>> is processing a LOT of regions being opened, while it also took the
>> region server a long time to get the region opened.
>>
>> There are currently a few states that don't get refreshed in ZK, for
>> example when a region is sitting in the region server's queue of
>> regions to be opened. Very often, when there's a lot of regions to
>> open (and worse if the RS has to replay recovered edits), some regions
>> in that state will timeout. This needs more thinking.
>>
>> J-D
>>
>> 2011/4/13 Gaojinchao <ga...@huawei.com>:
>>> In hbase version 0.90.1 .
>>>
>>> Is there any experience ?
>>>
>>> Hmaster Logs :
>>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>>>
>>
>

Re: a lots of error about Region has been OPEN for too long

Posted by Gaojinchao <ga...@huawei.com>.

Sorry. Last mail has some mistakes.
I try to reproduce by logs.
Hbase version: 0.90.1
Zk version : 3.3.2

Reproducing operarion(Region has been OPEN for too long):
1. some region server(has 1882 regions ) crashed and wait balance finished.
2.stop hbase cluster
3.start cluster.(it happened)

My case is different for yours.
Hmaster had received RS_ZK_REGION_OPENED. But processing OpenedRegionHandler is slow.
It seems no bugs. I try to change parameter hbase.master.executor.openregion.threads.


2011-04-16 09:48:48,666 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=C4C5.site,60020,1302918210943, region=00621f2d5f3b0f7949bbe638d78e96a5
2011-04-16 09:49:20,770 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  ufdrA,5198613861538448,1301646236548.00621f2d5f3b0f7949bbe638d78e96a5. state=OPEN, ts=1302918528615
2011-04-16 09:49:29,371 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 00621f2d5f3b0f7949bbe638d78e96a5; deleting unassigned node
2011-04-16 09:49:29,371 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x2f5bdad8910000-0x2f5bdad8910000-0x2f5bdad8910000 Deleting existing unassigned node for 00621f2d5f3b0f7949bbe638d78e96a5 that is in expected state RS_ZK_REGION_OPENED

-----邮件原件-----
发件人: jdcryans@gmail.com [mailto:jdcryans@gmail.com] 代表 Jean-Daniel Cryans
发送时间: 2011年4月15日 3:20
收件人: user@hbase.apache.org
主题: Re: a lots of error about Region has been OPEN for too long

Taking a more deeper look at the code, the assumption I had that the
message was sitting in the queue for too long is erroneous. The reason
is that when the master says "Handling
transition=RS_ZK_REGION_OPENED", it updates the timestamp that we use
to keep track of the timeout which means that unless it stayed more
than 60 seconds in the queue (it took only 3 here it seems), the issue
is elsewhere.

So logically the other reason would be that the master got the
information too late, since we use the timestamp provided by the
region server. Looking in the region server log, I was able to find:

2011-04-05 15:11:12,946 DEBUG
org.apache.hadoop.hbase.zookeeper.ZKAssign:
regionserver:60020-0x22e286f0b9c98f7-0x22e286f0b9c98f7 Successfully
transitioned node 0db7d1f58e4fced0a371aded0ddec281 from
RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED

This is 1 minute 7 seconds before the master got the message (I
verified that our clocks are in perfect sync). This means this is
another queue that's too slow (the ones that processes all the events
coming from zookeeper).

I'll keep digging.

J-D

On Thu, Apr 14, 2011 at 10:58 AM, Jean-Daniel Cryans
<jd...@apache.org> wrote:
> (please don't leave unrelated discussions at the tail of your emails)
>
> So I thought I never got that issue but wanted to make sure so I
> grepped my logs and indeed saw I got it, so I what I did is that I
> grepped the name of one of the regions that got the issue and looked
> at what was happening at that time (which you should do in the
> future). I see something like this:
>
> 2011-04-05 15:12:19,037 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZKUtil:
> master:60000-0x42ec2cece810b68 Retrieved 115 byte(s) of data from
> znode /prodjobs/unassigned/0db7d1f58e4fced0a371aded0ddec281 and set
> watcher; region=tsdb,�M<��,1297818092053.0db7d1f58e4fced0a371aded0ddec281.,
> server=sv4borg36,60020,1300313562191, state=RS_ZK_REGION_OPENED
> 2011-04-05 15:12:19,037 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Handling
> transition=RS_ZK_REGION_OPENED, server=sv4borg36,60020,1300313562191,
> region=0db7d1f58e4fced0a371aded0ddec281
> ...
> 2011-04-05 15:12:19,585 INFO
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in
> transition timed out:
> tsdb,\x00\x03\xCBM<}\x08\x00\x00\x01\x00\x00\x8A\x00\x00\x1D\x00\x01\xD1,1297818092053.0db7d1f58e4fced0a371aded0ddec281.
> state=OPEN, ts=1302041472920
> 2011-04-05 15:12:19,585 ERROR
> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN
> for too long, we don't know where region was opened so can't do
> anything
> ...
> 2011-04-05 15:12:22,504 DEBUG
> org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling
> OPENED event for 0db7d1f58e4fced0a371aded0ddec281; deleting unassigned
> node
>
>
> So if I understand this correctly, the master already got the message
> via ZooKeeper but it stayed in a queue for just long enough that the
> RIT timed out and finally the OpenedRegionHandler was able to process
> it. So in the end nothing looks broken, it just means that the master
> is processing a LOT of regions being opened, while it also took the
> region server a long time to get the region opened.
>
> There are currently a few states that don't get refreshed in ZK, for
> example when a region is sitting in the region server's queue of
> regions to be opened. Very often, when there's a lot of regions to
> open (and worse if the RS has to replay recovered edits), some regions
> in that state will timeout. This needs more thinking.
>
> J-D
>
> 2011/4/13 Gaojinchao <ga...@huawei.com>:
>> In hbase version 0.90.1 .
>>
>> Is there any experience ?
>>
>> Hmaster Logs :
>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>>
>

Re: unable to disable tables

Posted by Ajay Govindarajan <ag...@yahoo.com>.

Ok, we need to upgrade anyway.

thanks
-ajay




________________________________
From: Jean-Daniel Cryans <jd...@apache.org>
To: user@hbase.apache.org; Ajay Govindarajan <ag...@yahoo.com>
Sent: Friday, April 15, 2011 11:19 AM
Subject: Re: unable to disable tables

This sounds like an old bug but I can't quite remember, is it possible
to upgrade to 0.90.2? Would be easier to support.

Thx,

J-D

On Fri, Apr 15, 2011 at 11:17 AM, Ajay Govindarajan
<ag...@yahoo.com> wrote:
> Hbase version is: 0.89.20100924+28-1
>
>
>
>
>
> ________________________________
> From: Jean-Daniel Cryans <jd...@apache.org>
> To: user@hbase.apache.org; Ajay Govindarajan <ag...@yahoo.com>
> Sent: Friday, April 15, 2011 11:15 AM
> Subject: Re: unable to disable tables
>
> Which hbase version?
>
> J-D
>
> On Fri, Apr 15, 2011 at 11:12 AM, Ajay Govindarajan
> <ag...@yahoo.com> wrote:
>> I was doing a bulk update on a table when the RegionServer crashed. IT failed because we had not allocated enough memory to the process. So we restarted the master and region servers and everything seems to work fine except that I am not able to disable tables anymore. When I run the disable command I get the "0 row(s) in 0.0820 seconds" response.
>> However when i do a describe on the table it says that the table is still enabled and consequently i can't drop the table.
>>
>> I don't see anything in the logs. We are running in non-clustered mode.
>>
>> Any ideas as to why this might happen and how to fix this?
>>
>> thanks
>> -ajay

Re: unable to disable tables

Posted by Jean-Daniel Cryans <jd...@apache.org>.

This sounds like an old bug but I can't quite remember, is it possible
to upgrade to 0.90.2? Would be easier to support.

Thx,

J-D

On Fri, Apr 15, 2011 at 11:17 AM, Ajay Govindarajan
<ag...@yahoo.com> wrote:
> Hbase version is: 0.89.20100924+28-1
>
>
>
>
>
> ________________________________
> From: Jean-Daniel Cryans <jd...@apache.org>
> To: user@hbase.apache.org; Ajay Govindarajan <ag...@yahoo.com>
> Sent: Friday, April 15, 2011 11:15 AM
> Subject: Re: unable to disable tables
>
> Which hbase version?
>
> J-D
>
> On Fri, Apr 15, 2011 at 11:12 AM, Ajay Govindarajan
> <ag...@yahoo.com> wrote:
>> I was doing a bulk update on a table when the RegionServer crashed. IT failed because we had not allocated enough memory to the process. So we restarted the master and region servers and everything seems to work fine except that I am not able to disable tables anymore. When I run the disable command I get the "0 row(s) in 0.0820 seconds" response.
>> However when i do a describe on the table it says that the table is still enabled and consequently i can't drop the table.
>>
>> I don't see anything in the logs. We are running in non-clustered mode.
>>
>> Any ideas as to why this might happen and how to fix this?
>>
>> thanks
>> -ajay

Re: unable to disable tables

Posted by Ajay Govindarajan <ag...@yahoo.com>.

Hbase version is: 0.89.20100924+28-1





________________________________
From: Jean-Daniel Cryans <jd...@apache.org>
To: user@hbase.apache.org; Ajay Govindarajan <ag...@yahoo.com>
Sent: Friday, April 15, 2011 11:15 AM
Subject: Re: unable to disable tables

Which hbase version?

J-D

On Fri, Apr 15, 2011 at 11:12 AM, Ajay Govindarajan
<ag...@yahoo.com> wrote:
> I was doing a bulk update on a table when the RegionServer crashed. IT failed because we had not allocated enough memory to the process. So we restarted the master and region servers and everything seems to work fine except that I am not able to disable tables anymore. When I run the disable command I get the "0 row(s) in 0.0820 seconds" response.
> However when i do a describe on the table it says that the table is still enabled and consequently i can't drop the table.
>
> I don't see anything in the logs. We are running in non-clustered mode.
>
> Any ideas as to why this might happen and how to fix this?
>
> thanks
> -ajay

Re: unable to disable tables

Posted by Jean-Daniel Cryans <jd...@apache.org>.

Which hbase version?

J-D

On Fri, Apr 15, 2011 at 11:12 AM, Ajay Govindarajan
<ag...@yahoo.com> wrote:
> I was doing a bulk update on a table when the RegionServer crashed. IT failed because we had not allocated enough memory to the process. So we restarted the master and region servers and everything seems to work fine except that I am not able to disable tables anymore. When I run the disable command I get the "0 row(s) in 0.0820 seconds" response.
> However when i do a describe on the table it says that the table is still enabled and consequently i can't drop the table.
>
> I don't see anything in the logs. We are running in non-clustered mode.
>
> Any ideas as to why this might happen and how to fix this?
>
> thanks
> -ajay

unable to disable tables

Posted by Ajay Govindarajan <ag...@yahoo.com>.

I was doing a bulk update on a table when the RegionServer crashed. IT failed because we had not allocated enough memory to the process. So we restarted the master and region servers and everything seems to work fine except that I am not able to disable tables anymore. When I run the disable command I get the "0 row(s) in 0.0820 seconds" response.
However when i do a describe on the table it says that the table is still enabled and consequently i can't drop the table.

I don't see anything in the logs. We are running in non-clustered mode. 

Any ideas as to why this might happen and how to fix this?

thanks
-ajay

Re: a lots of error about Region has been OPEN for too long

Posted by Gaojinchao <ga...@huawei.com>.

It reproduces easily. 
In my cluster, It reproduces when balance region. 

Log is:
2011-04-15 14:54:21,885 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0., src=C4C3.site,60020,1302849920122,

Today , I will open ZK client debug log and keep digging.

I will try to make state chart, then can understand easily for assigning a region.

-----邮件原件-----
发件人: jdcryans@gmail.com [mailto:jdcryans@gmail.com] 代表 Jean-Daniel Cryans
发送时间: 2011年4月16日 1:56
收件人: user@hbase.apache.org
主题: Re: a lots of error about Region has been OPEN for too long

Yeah it's not clear from the logs why it did that, and looking through
my logs I can't see any occurrence. Can you reproduce it easily?

J-D

On Fri, Apr 15, 2011 at 12:46 AM, Gaojinchao <ga...@huawei.com> wrote:
> Thanks for your reply.
>
> Hbase version : 0.90.1
>
> I find other abnormal logs.
> Hmaster receives twice RS_ZK_REGION_OPENED event.
> It reproduces in balance processing.
> But I can't root out what brings on ?
>
> Hmaster logs:
> 2011-04-15 14:54:21,885 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0., src=C4C3.site,60020,1302849920122, dest=C4C5.site,60020,1302849920117
> 2011-04-15 14:54:21,885 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0. (offlining)
> 2011-04-15 14:54:21,886 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to serverName=C4C3.site,60020,1302849920122, load=(requests=0, regions=588, usedHeap=122, maxHeap=8175) for region ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0.
> 2011-04-15 14:54:25,333 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase/unassigned/89e2c4452de2864bf3362a8f16e758d0 (region=ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0., server=C4C3.site,60020,1302849920122, state=RS_ZK_REGION_CLOSED)
> 2011-04-15 14:54:25,333 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSED, server=C4C3.site,60020,1302849920122, region=89e2c4452de2864bf3362a8f16e758d0
> 2011-04-15 14:55:14,364 DEBUG org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED event for 89e2c4452de2864bf3362a8f16e758d0
> 2011-04-15 14:55:14,380 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0. state=CLOSED, ts=1302850513941
> 2011-04-15 14:55:14,380 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x2f57e8192e0000 Creating (or updating) unassigned node for 89e2c4452de2864bf3362a8f16e758d0 with OFFLINE state
> 2011-04-15 14:55:14,614 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0.; plan=hri=ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0., src=C4C3.site,60020,1302849920122, dest=C4C5.site,60020,1302849920117
> 2011-04-15 14:55:14,614 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0. to C4C5.site,60020,1302849920117
> 2011-04-15 14:55:14,705 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=C4C5.site,60020,1302849920117, region=89e2c4452de2864bf3362a8f16e758d0
>
> 2011-04-15 14:55:14,714 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=C4C5.site,60020,1302849920117, region=89e2c4452de2864bf3362a8f16e758d0
>
> //repetition
> 2011-04-15 14:55:14,714 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=C4C5.site,60020,1302849920117, region=89e2c4452de2864bf3362a8f16e758d0
>
> // Hmaster abnormal logs
> 2011-04-15 14:55:14,715 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENED for region 89e2c4452de2864bf3362a8f16e758d0 from server C4C5.site,60020,1302849920117 but region was in  the state ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0. state=OPEN, ts=1302850514694 and not in expected PENDING_OPEN or OPENING states
>
> 2011-04-15 14:55:42,362 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 89e2c4452de2864bf3362a8f16e758d0; deleting unassigned node
> 2011-04-15 14:55:42,362 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x2f57e8192e0000 Deleting existing unassigned node for 89e2c4452de2864bf3362a8f16e758d0 that is in expected state RS_ZK_REGION_OPENED
> 2011-04-15 14:55:42,392 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x2f57e8192e0000 Successfully deleted unassigned node for region 89e2c4452de2864bf3362a8f16e758d0 in expected state RS_ZK_REGION_OPENED
> 2011-04-15 14:55:42,411 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0. on C4C5.site,60020,1302849920117
>
>
> Region server logs:
> 2011-04-15 14:55:14,603 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open region: ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0.
> 2011-04-15 14:55:14,617 DEBUG org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Processing open of ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0.
> 2011-04-15 14:55:14,617 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x2f57e8192e0004 Attempting to transition node 89e2c4452de2864bf3362a8f16e758d0 from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING
> 2011-04-15 14:55:14,644 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x2f57e8192e0004 Successfully transitioned node 89e2c4452de2864bf3362a8f16e758d0 from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING
> 2011-04-15 14:55:14,644 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Opening region: REGION => {NAME => 'ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0.', STARTKEY => '1088613823076918', ENDKEY => '1088613826923071', ENCODED => 89e2c4452de2864bf3362a8f16e758d0, TABLE => {{NAME => 'ufdrA', FAMILIES => [{NAME => 'value', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'GZ', TTL => '86400', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
> 2011-04-15 14:55:14,644 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Instantiated ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0.
> 2011-04-15 14:55:14,648 INFO org.apache.hadoop.hbase.regionserver.HRegion: Onlined ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0.; next sequenceid=1
> 2011-04-15 14:55:14,648 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x2f57e8192e0004 Attempting to transition node 89e2c4452de2864bf3362a8f16e758d0 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING
> 2011-04-15 14:55:14,677 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x2f57e8192e0004 Successfully transitioned node 89e2c4452de2864bf3362a8f16e758d0 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING
> 2011-04-15 14:55:14,680 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Updated row ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0. in region .META.,,1 with server=C4C5.site:60020, startcode=1302849920117
> 2011-04-15 14:55:14,680 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x2f57e8192e0004 Attempting to transition node 89e2c4452de2864bf3362a8f16e758d0 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED
> 2011-04-15 14:55:14,702 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x2f57e8192e0004 Successfully transitioned node 89e2c4452de2864bf3362a8f16e758d0 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED
> 2011-04-15 14:55:14,702 DEBUG org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0.

Re: a lots of error about Region has been OPEN for too long

Posted by Jean-Daniel Cryans <jd...@apache.org>.

Yeah it's not clear from the logs why it did that, and looking through
my logs I can't see any occurrence. Can you reproduce it easily?

J-D

On Fri, Apr 15, 2011 at 12:46 AM, Gaojinchao <ga...@huawei.com> wrote:
> Thanks for your reply.
>
> Hbase version : 0.90.1
>
> I find other abnormal logs.
> Hmaster receives twice RS_ZK_REGION_OPENED event.
> It reproduces in balance processing.
> But I can't root out what brings on ?
>
> Hmaster logs:
> 2011-04-15 14:54:21,885 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0., src=C4C3.site,60020,1302849920122, dest=C4C5.site,60020,1302849920117
> 2011-04-15 14:54:21,885 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0. (offlining)
> 2011-04-15 14:54:21,886 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to serverName=C4C3.site,60020,1302849920122, load=(requests=0, regions=588, usedHeap=122, maxHeap=8175) for region ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0.
> 2011-04-15 14:54:25,333 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase/unassigned/89e2c4452de2864bf3362a8f16e758d0 (region=ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0., server=C4C3.site,60020,1302849920122, state=RS_ZK_REGION_CLOSED)
> 2011-04-15 14:54:25,333 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSED, server=C4C3.site,60020,1302849920122, region=89e2c4452de2864bf3362a8f16e758d0
> 2011-04-15 14:55:14,364 DEBUG org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED event for 89e2c4452de2864bf3362a8f16e758d0
> 2011-04-15 14:55:14,380 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0. state=CLOSED, ts=1302850513941
> 2011-04-15 14:55:14,380 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x2f57e8192e0000 Creating (or updating) unassigned node for 89e2c4452de2864bf3362a8f16e758d0 with OFFLINE state
> 2011-04-15 14:55:14,614 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0.; plan=hri=ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0., src=C4C3.site,60020,1302849920122, dest=C4C5.site,60020,1302849920117
> 2011-04-15 14:55:14,614 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0. to C4C5.site,60020,1302849920117
> 2011-04-15 14:55:14,705 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=C4C5.site,60020,1302849920117, region=89e2c4452de2864bf3362a8f16e758d0
>
> 2011-04-15 14:55:14,714 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=C4C5.site,60020,1302849920117, region=89e2c4452de2864bf3362a8f16e758d0
>
> //repetition
> 2011-04-15 14:55:14,714 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=C4C5.site,60020,1302849920117, region=89e2c4452de2864bf3362a8f16e758d0
>
> // Hmaster abnormal logs
> 2011-04-15 14:55:14,715 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENED for region 89e2c4452de2864bf3362a8f16e758d0 from server C4C5.site,60020,1302849920117 but region was in  the state ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0. state=OPEN, ts=1302850514694 and not in expected PENDING_OPEN or OPENING states
>
> 2011-04-15 14:55:42,362 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 89e2c4452de2864bf3362a8f16e758d0; deleting unassigned node
> 2011-04-15 14:55:42,362 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x2f57e8192e0000 Deleting existing unassigned node for 89e2c4452de2864bf3362a8f16e758d0 that is in expected state RS_ZK_REGION_OPENED
> 2011-04-15 14:55:42,392 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x2f57e8192e0000 Successfully deleted unassigned node for region 89e2c4452de2864bf3362a8f16e758d0 in expected state RS_ZK_REGION_OPENED
> 2011-04-15 14:55:42,411 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0. on C4C5.site,60020,1302849920117
>
>
> Region server logs:
> 2011-04-15 14:55:14,603 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open region: ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0.
> 2011-04-15 14:55:14,617 DEBUG org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Processing open of ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0.
> 2011-04-15 14:55:14,617 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x2f57e8192e0004 Attempting to transition node 89e2c4452de2864bf3362a8f16e758d0 from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING
> 2011-04-15 14:55:14,644 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x2f57e8192e0004 Successfully transitioned node 89e2c4452de2864bf3362a8f16e758d0 from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING
> 2011-04-15 14:55:14,644 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Opening region: REGION => {NAME => 'ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0.', STARTKEY => '1088613823076918', ENDKEY => '1088613826923071', ENCODED => 89e2c4452de2864bf3362a8f16e758d0, TABLE => {{NAME => 'ufdrA', FAMILIES => [{NAME => 'value', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'GZ', TTL => '86400', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
> 2011-04-15 14:55:14,644 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Instantiated ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0.
> 2011-04-15 14:55:14,648 INFO org.apache.hadoop.hbase.regionserver.HRegion: Onlined ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0.; next sequenceid=1
> 2011-04-15 14:55:14,648 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x2f57e8192e0004 Attempting to transition node 89e2c4452de2864bf3362a8f16e758d0 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING
> 2011-04-15 14:55:14,677 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x2f57e8192e0004 Successfully transitioned node 89e2c4452de2864bf3362a8f16e758d0 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING
> 2011-04-15 14:55:14,680 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Updated row ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0. in region .META.,,1 with server=C4C5.site:60020, startcode=1302849920117
> 2011-04-15 14:55:14,680 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x2f57e8192e0004 Attempting to transition node 89e2c4452de2864bf3362a8f16e758d0 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED
> 2011-04-15 14:55:14,702 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x2f57e8192e0004 Successfully transitioned node 89e2c4452de2864bf3362a8f16e758d0 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED
> 2011-04-15 14:55:14,702 DEBUG org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0.

Re: a lots of error about Region has been OPEN for too long

Posted by Gaojinchao <ga...@huawei.com>.

Thanks for your reply.

Hbase version : 0.90.1

I find other abnormal logs. 
Hmaster receives twice RS_ZK_REGION_OPENED event. 
It reproduces in balance processing.
But I can't root out what brings on ?

Hmaster logs:
2011-04-15 14:54:21,885 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0., src=C4C3.site,60020,1302849920122, dest=C4C5.site,60020,1302849920117
2011-04-15 14:54:21,885 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0. (offlining)
2011-04-15 14:54:21,886 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to serverName=C4C3.site,60020,1302849920122, load=(requests=0, regions=588, usedHeap=122, maxHeap=8175) for region ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0.
2011-04-15 14:54:25,333 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase/unassigned/89e2c4452de2864bf3362a8f16e758d0 (region=ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0., server=C4C3.site,60020,1302849920122, state=RS_ZK_REGION_CLOSED)
2011-04-15 14:54:25,333 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSED, server=C4C3.site,60020,1302849920122, region=89e2c4452de2864bf3362a8f16e758d0
2011-04-15 14:55:14,364 DEBUG org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED event for 89e2c4452de2864bf3362a8f16e758d0
2011-04-15 14:55:14,380 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0. state=CLOSED, ts=1302850513941
2011-04-15 14:55:14,380 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x2f57e8192e0000 Creating (or updating) unassigned node for 89e2c4452de2864bf3362a8f16e758d0 with OFFLINE state
2011-04-15 14:55:14,614 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0.; plan=hri=ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0., src=C4C3.site,60020,1302849920122, dest=C4C5.site,60020,1302849920117
2011-04-15 14:55:14,614 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0. to C4C5.site,60020,1302849920117
2011-04-15 14:55:14,705 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=C4C5.site,60020,1302849920117, region=89e2c4452de2864bf3362a8f16e758d0

2011-04-15 14:55:14,714 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=C4C5.site,60020,1302849920117, region=89e2c4452de2864bf3362a8f16e758d0

//repetition
2011-04-15 14:55:14,714 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=C4C5.site,60020,1302849920117, region=89e2c4452de2864bf3362a8f16e758d0

// Hmaster abnormal logs 
2011-04-15 14:55:14,715 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENED for region 89e2c4452de2864bf3362a8f16e758d0 from server C4C5.site,60020,1302849920117 but region was in  the state ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0. state=OPEN, ts=1302850514694 and not in expected PENDING_OPEN or OPENING states

2011-04-15 14:55:42,362 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 89e2c4452de2864bf3362a8f16e758d0; deleting unassigned node
2011-04-15 14:55:42,362 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x2f57e8192e0000 Deleting existing unassigned node for 89e2c4452de2864bf3362a8f16e758d0 that is in expected state RS_ZK_REGION_OPENED
2011-04-15 14:55:42,392 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x2f57e8192e0000 Successfully deleted unassigned node for region 89e2c4452de2864bf3362a8f16e758d0 in expected state RS_ZK_REGION_OPENED
2011-04-15 14:55:42,411 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0. on C4C5.site,60020,1302849920117


Region server logs:
2011-04-15 14:55:14,603 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open region: ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0.
2011-04-15 14:55:14,617 DEBUG org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Processing open of ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0.
2011-04-15 14:55:14,617 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x2f57e8192e0004 Attempting to transition node 89e2c4452de2864bf3362a8f16e758d0 from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING
2011-04-15 14:55:14,644 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x2f57e8192e0004 Successfully transitioned node 89e2c4452de2864bf3362a8f16e758d0 from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING
2011-04-15 14:55:14,644 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Opening region: REGION => {NAME => 'ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0.', STARTKEY => '1088613823076918', ENDKEY => '1088613826923071', ENCODED => 89e2c4452de2864bf3362a8f16e758d0, TABLE => {{NAME => 'ufdrA', FAMILIES => [{NAME => 'value', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'GZ', TTL => '86400', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
2011-04-15 14:55:14,644 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Instantiated ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0.
2011-04-15 14:55:14,648 INFO org.apache.hadoop.hbase.regionserver.HRegion: Onlined ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0.; next sequenceid=1
2011-04-15 14:55:14,648 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x2f57e8192e0004 Attempting to transition node 89e2c4452de2864bf3362a8f16e758d0 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING
2011-04-15 14:55:14,677 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x2f57e8192e0004 Successfully transitioned node 89e2c4452de2864bf3362a8f16e758d0 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING
2011-04-15 14:55:14,680 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Updated row ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0. in region .META.,,1 with server=C4C5.site:60020, startcode=1302849920117
2011-04-15 14:55:14,680 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x2f57e8192e0004 Attempting to transition node 89e2c4452de2864bf3362a8f16e758d0 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED
2011-04-15 14:55:14,702 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x2f57e8192e0004 Successfully transitioned node 89e2c4452de2864bf3362a8f16e758d0 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED
2011-04-15 14:55:14,702 DEBUG org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0.



-----邮件原件-----
发件人: jdcryans@gmail.com [mailto:jdcryans@gmail.com] 代表 Jean-Daniel Cryans
发送时间: 2011年4月15日 3:20
收件人: user@hbase.apache.org
主题: Re: a lots of error about Region has been OPEN for too long

Taking a more deeper look at the code, the assumption I had that the
message was sitting in the queue for too long is erroneous. The reason
is that when the master says "Handling
transition=RS_ZK_REGION_OPENED", it updates the timestamp that we use
to keep track of the timeout which means that unless it stayed more
than 60 seconds in the queue (it took only 3 here it seems), the issue
is elsewhere.

So logically the other reason would be that the master got the
information too late, since we use the timestamp provided by the
region server. Looking in the region server log, I was able to find:

2011-04-05 15:11:12,946 DEBUG
org.apache.hadoop.hbase.zookeeper.ZKAssign:
regionserver:60020-0x22e286f0b9c98f7-0x22e286f0b9c98f7 Successfully
transitioned node 0db7d1f58e4fced0a371aded0ddec281 from
RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED

This is 1 minute 7 seconds before the master got the message (I
verified that our clocks are in perfect sync). This means this is
another queue that's too slow (the ones that processes all the events
coming from zookeeper).

I'll keep digging.

J-D

On Thu, Apr 14, 2011 at 10:58 AM, Jean-Daniel Cryans
<jd...@apache.org> wrote:
> (please don't leave unrelated discussions at the tail of your emails)
>
> So I thought I never got that issue but wanted to make sure so I
> grepped my logs and indeed saw I got it, so I what I did is that I
> grepped the name of one of the regions that got the issue and looked
> at what was happening at that time (which you should do in the
> future). I see something like this:
>
> 2011-04-05 15:12:19,037 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZKUtil:
> master:60000-0x42ec2cece810b68 Retrieved 115 byte(s) of data from
> znode /prodjobs/unassigned/0db7d1f58e4fced0a371aded0ddec281 and set
> watcher; region=tsdb,�M<��,1297818092053.0db7d1f58e4fced0a371aded0ddec281.,
> server=sv4borg36,60020,1300313562191, state=RS_ZK_REGION_OPENED
> 2011-04-05 15:12:19,037 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Handling
> transition=RS_ZK_REGION_OPENED, server=sv4borg36,60020,1300313562191,
> region=0db7d1f58e4fced0a371aded0ddec281
> ...
> 2011-04-05 15:12:19,585 INFO
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in
> transition timed out:
> tsdb,\x00\x03\xCBM<}\x08\x00\x00\x01\x00\x00\x8A\x00\x00\x1D\x00\x01\xD1,1297818092053.0db7d1f58e4fced0a371aded0ddec281.
> state=OPEN, ts=1302041472920
> 2011-04-05 15:12:19,585 ERROR
> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN
> for too long, we don't know where region was opened so can't do
> anything
> ...
> 2011-04-05 15:12:22,504 DEBUG
> org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling
> OPENED event for 0db7d1f58e4fced0a371aded0ddec281; deleting unassigned
> node
>
>
> So if I understand this correctly, the master already got the message
> via ZooKeeper but it stayed in a queue for just long enough that the
> RIT timed out and finally the OpenedRegionHandler was able to process
> it. So in the end nothing looks broken, it just means that the master
> is processing a LOT of regions being opened, while it also took the
> region server a long time to get the region opened.
>
> There are currently a few states that don't get refreshed in ZK, for
> example when a region is sitting in the region server's queue of
> regions to be opened. Very often, when there's a lot of regions to
> open (and worse if the RS has to replay recovered edits), some regions
> in that state will timeout. This needs more thinking.
>
> J-D
>
> 2011/4/13 Gaojinchao <ga...@huawei.com>:
>> In hbase version 0.90.1 .
>>
>> Is there any experience ?
>>
>> Hmaster Logs :
>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>>
>

Re: a lots of error about Region has been OPEN for too long

Posted by Jean-Daniel Cryans <jd...@apache.org>.

Taking a more deeper look at the code, the assumption I had that the
message was sitting in the queue for too long is erroneous. The reason
is that when the master says "Handling
transition=RS_ZK_REGION_OPENED", it updates the timestamp that we use
to keep track of the timeout which means that unless it stayed more
than 60 seconds in the queue (it took only 3 here it seems), the issue
is elsewhere.

So logically the other reason would be that the master got the
information too late, since we use the timestamp provided by the
region server. Looking in the region server log, I was able to find:

2011-04-05 15:11:12,946 DEBUG
org.apache.hadoop.hbase.zookeeper.ZKAssign:
regionserver:60020-0x22e286f0b9c98f7-0x22e286f0b9c98f7 Successfully
transitioned node 0db7d1f58e4fced0a371aded0ddec281 from
RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED

This is 1 minute 7 seconds before the master got the message (I
verified that our clocks are in perfect sync). This means this is
another queue that's too slow (the ones that processes all the events
coming from zookeeper).

I'll keep digging.

J-D

On Thu, Apr 14, 2011 at 10:58 AM, Jean-Daniel Cryans
<jd...@apache.org> wrote:
> (please don't leave unrelated discussions at the tail of your emails)
>
> So I thought I never got that issue but wanted to make sure so I
> grepped my logs and indeed saw I got it, so I what I did is that I
> grepped the name of one of the regions that got the issue and looked
> at what was happening at that time (which you should do in the
> future). I see something like this:
>
> 2011-04-05 15:12:19,037 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZKUtil:
> master:60000-0x42ec2cece810b68 Retrieved 115 byte(s) of data from
> znode /prodjobs/unassigned/0db7d1f58e4fced0a371aded0ddec281 and set
> watcher; region=tsdb,�M<��,1297818092053.0db7d1f58e4fced0a371aded0ddec281.,
> server=sv4borg36,60020,1300313562191, state=RS_ZK_REGION_OPENED
> 2011-04-05 15:12:19,037 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Handling
> transition=RS_ZK_REGION_OPENED, server=sv4borg36,60020,1300313562191,
> region=0db7d1f58e4fced0a371aded0ddec281
> ...
> 2011-04-05 15:12:19,585 INFO
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in
> transition timed out:
> tsdb,\x00\x03\xCBM<}\x08\x00\x00\x01\x00\x00\x8A\x00\x00\x1D\x00\x01\xD1,1297818092053.0db7d1f58e4fced0a371aded0ddec281.
> state=OPEN, ts=1302041472920
> 2011-04-05 15:12:19,585 ERROR
> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN
> for too long, we don't know where region was opened so can't do
> anything
> ...
> 2011-04-05 15:12:22,504 DEBUG
> org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling
> OPENED event for 0db7d1f58e4fced0a371aded0ddec281; deleting unassigned
> node
>
>
> So if I understand this correctly, the master already got the message
> via ZooKeeper but it stayed in a queue for just long enough that the
> RIT timed out and finally the OpenedRegionHandler was able to process
> it. So in the end nothing looks broken, it just means that the master
> is processing a LOT of regions being opened, while it also took the
> region server a long time to get the region opened.
>
> There are currently a few states that don't get refreshed in ZK, for
> example when a region is sitting in the region server's queue of
> regions to be opened. Very often, when there's a lot of regions to
> open (and worse if the RS has to replay recovered edits), some regions
> in that state will timeout. This needs more thinking.
>
> J-D
>
> 2011/4/13 Gaojinchao <ga...@huawei.com>:
>> In hbase version 0.90.1 .
>>
>> Is there any experience ?
>>
>> Hmaster Logs :
>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>>
>

Re: a lots of error about Region has been OPEN for too long

Posted by Jean-Daniel Cryans <jd...@apache.org>.

(please don't leave unrelated discussions at the tail of your emails)

So I thought I never got that issue but wanted to make sure so I
grepped my logs and indeed saw I got it, so I what I did is that I
grepped the name of one of the regions that got the issue and looked
at what was happening at that time (which you should do in the
future). I see something like this:

2011-04-05 15:12:19,037 DEBUG
org.apache.hadoop.hbase.zookeeper.ZKUtil:
master:60000-0x42ec2cece810b68 Retrieved 115 byte(s) of data from
znode /prodjobs/unassigned/0db7d1f58e4fced0a371aded0ddec281 and set
watcher; region=tsdb,�M<��,1297818092053.0db7d1f58e4fced0a371aded0ddec281.,
server=sv4borg36,60020,1300313562191, state=RS_ZK_REGION_OPENED
2011-04-05 15:12:19,037 DEBUG
org.apache.hadoop.hbase.master.AssignmentManager: Handling
transition=RS_ZK_REGION_OPENED, server=sv4borg36,60020,1300313562191,
region=0db7d1f58e4fced0a371aded0ddec281
...
2011-04-05 15:12:19,585 INFO
org.apache.hadoop.hbase.master.AssignmentManager: Regions in
transition timed out:
tsdb,\x00\x03\xCBM<}\x08\x00\x00\x01\x00\x00\x8A\x00\x00\x1D\x00\x01\xD1,1297818092053.0db7d1f58e4fced0a371aded0ddec281.
state=OPEN, ts=1302041472920
2011-04-05 15:12:19,585 ERROR
org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN
for too long, we don't know where region was opened so can't do
anything
...
2011-04-05 15:12:22,504 DEBUG
org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling
OPENED event for 0db7d1f58e4fced0a371aded0ddec281; deleting unassigned
node


So if I understand this correctly, the master already got the message
via ZooKeeper but it stayed in a queue for just long enough that the
RIT timed out and finally the OpenedRegionHandler was able to process
it. So in the end nothing looks broken, it just means that the master
is processing a LOT of regions being opened, while it also took the
region server a long time to get the region opened.

There are currently a few states that don't get refreshed in ZK, for
example when a region is sitting in the region server's queue of
regions to be opened. Very often, when there's a lot of regions to
open (and worse if the RS has to replay recovered edits), some regions
in that state will timeout. This needs more thinking.

J-D

2011/4/13 Gaojinchao <ga...@huawei.com>:
> In hbase version 0.90.1 .
>
> Is there any experience ?
>
> Hmaster Logs :
> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
> 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
> 2011-04-08 16:33:09,385 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
>