You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2010/02/18 05:59:28 UTC

[jira] Created: (HBASE-2235) Mechanism that would not have -ROOT- and .META. on same server caused failed assign of .META.

Mechanism that would not have -ROOT- and .META. on same server caused failed assign of .META.
---------------------------------------------------------------------------------------------

                 Key: HBASE-2235
                 URL: https://issues.apache.org/jira/browse/HBASE-2235
             Project: Hadoop HBase
          Issue Type: Bug
            Reporter: stack
             Fix For: 0.20.4, 0.21.0


Here is the short story:

Scenario is a cluster of 3 servers.  Server 1. crashed.  It was carrying the .META.   We split the logs.  .META. is put on the head of the assignment queue.  Server 2. happens to be in a state where it wants to report a split.  The master fails the report because there is no .META. (It fails it ugly with a NPE).  Server 3. checks in and falls into the assignment code (RegionManager#regionsAwaitingAssignment).  In here we have this bit of code around line #412:

{code}
    if (reassigningMetas && isMetaOrRoot && !isSingleServer) {
      return regionsToAssign; // dont assign anything to this server.
    }
{code}

Because we think this not a single server cluster -- we think there are two 'live' nodes -- we won't assign meta.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2235) Mechanism that would not have -ROOT- and .META. on same server caused failed assign of .META.

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835874#action_12835874 ] 

Kannan Muthukkaruppan commented on HBASE-2235:
----------------------------------------------

The original one which crashed/GC paused (was test015). The second log I sent you was ofcourse (test014 as the name indicated). Unfortunately, I have lost test013s log at this point.

> Mechanism that would not have -ROOT- and .META. on same server caused failed assign of .META.
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-2235
>                 URL: https://issues.apache.org/jira/browse/HBASE-2235
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>             Fix For: 0.20.4, 0.21.0
>
>
> Here is the short story:
> Scenario is a cluster of 3 servers.  Server 1. crashed.  It was carrying the .META.   We split the logs.  .META. is put on the head of the assignment queue.  Server 2. happens to be in a state where it wants to report a split.  The master fails the report because there is no .META. (It fails it ugly with a NPE).  Server 3. checks in and falls into the assignment code (RegionManager#regionsAwaitingAssignment).  In here we have this bit of code around line #412:
> {code}
>     if (reassigningMetas && isMetaOrRoot && !isSingleServer) {
>       return regionsToAssign; // dont assign anything to this server.
>     }
> {code}
> Because we think this not a single server cluster -- we think there are two 'live' nodes -- we won't assign meta.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2235) Mechanism that would not have -ROOT- and .META. on same server caused failed assign of .META.

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836078#action_12836078 ] 

stack commented on HBASE-2235:
------------------------------

@ Kannan  For sure, lets do something in 0.20.  I think metascanner when it runs can check an offlined region.  If its not followed by two daughters -- it has the necessary info as columns splitA and splitB -- it can add them.  Let me take a closer look.  Will report back (Am on something else this afternoon).

Having all in the one row is a bit radical.  Gets and puts would each take out a row lock.  I think this might slow down all .META. lookups.  We also have a mechanism for getting the row closest to the asked for one.  Its used for figuring out which region a row sits in.  This would have to be recast if all was in the one row.

> Mechanism that would not have -ROOT- and .META. on same server caused failed assign of .META.
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-2235
>                 URL: https://issues.apache.org/jira/browse/HBASE-2235
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>             Fix For: 0.20.4, 0.21.0
>
>
> Here is the short story:
> Scenario is a cluster of 3 servers.  Server 1. crashed.  It was carrying the .META.   We split the logs.  .META. is put on the head of the assignment queue.  Server 2. happens to be in a state where it wants to report a split.  The master fails the report because there is no .META. (It fails it ugly with a NPE).  Server 3. checks in and falls into the assignment code (RegionManager#regionsAwaitingAssignment).  In here we have this bit of code around line #412:
> {code}
>     if (reassigningMetas && isMetaOrRoot && !isSingleServer) {
>       return regionsToAssign; // dont assign anything to this server.
>     }
> {code}
> Because we think this not a single server cluster -- we think there are two 'live' nodes -- we won't assign meta.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2235) Mechanism that would not have -ROOT- and .META. on same server caused failed assign of .META.

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835870#action_12835870 ] 

stack commented on HBASE-2235:
------------------------------

@Kannan Do I have test-13 regionserver log?  Is that the one that crashed/GC paused (I have that one).  I'm asking because test1,249509,1266107874886 was the parent whose daughters were not online after you did your restart.  I'd like to see its log to see how come the parent offlining worked but not the daughter additions especially if you had hdfs-200 in place.

> Mechanism that would not have -ROOT- and .META. on same server caused failed assign of .META.
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-2235
>                 URL: https://issues.apache.org/jira/browse/HBASE-2235
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>             Fix For: 0.20.4, 0.21.0
>
>
> Here is the short story:
> Scenario is a cluster of 3 servers.  Server 1. crashed.  It was carrying the .META.   We split the logs.  .META. is put on the head of the assignment queue.  Server 2. happens to be in a state where it wants to report a split.  The master fails the report because there is no .META. (It fails it ugly with a NPE).  Server 3. checks in and falls into the assignment code (RegionManager#regionsAwaitingAssignment).  In here we have this bit of code around line #412:
> {code}
>     if (reassigningMetas && isMetaOrRoot && !isSingleServer) {
>       return regionsToAssign; // dont assign anything to this server.
>     }
> {code}
> Because we think this not a single server cluster -- we think there are two 'live' nodes -- we won't assign meta.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2235) Mechanism that would not have -ROOT- and .META. on same server caused failed assign of .META.

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835605#action_12835605 ] 

Kannan Muthukkaruppan commented on HBASE-2235:
----------------------------------------------

stack: Could you clarify in a bit more detail what the normal flow of events is between the requesting RS, the META holder and the master when a split happens? BTW, we were running with syncFS (hdfs-200) in our testing when this whole issue happened.

> Mechanism that would not have -ROOT- and .META. on same server caused failed assign of .META.
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-2235
>                 URL: https://issues.apache.org/jira/browse/HBASE-2235
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>             Fix For: 0.20.4, 0.21.0
>
>
> Here is the short story:
> Scenario is a cluster of 3 servers.  Server 1. crashed.  It was carrying the .META.   We split the logs.  .META. is put on the head of the assignment queue.  Server 2. happens to be in a state where it wants to report a split.  The master fails the report because there is no .META. (It fails it ugly with a NPE).  Server 3. checks in and falls into the assignment code (RegionManager#regionsAwaitingAssignment).  In here we have this bit of code around line #412:
> {code}
>     if (reassigningMetas && isMetaOrRoot && !isSingleServer) {
>       return regionsToAssign; // dont assign anything to this server.
>     }
> {code}
> Because we think this not a single server cluster -- we think there are two 'live' nodes -- we won't assign meta.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2235) Mechanism that would not have -ROOT- and .META. on same server caused failed assign of .META.

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835592#action_12835592 ] 

stack commented on HBASE-2235:
------------------------------

Taking a look at the regionserver that was trying to report the split, it had successfully added the splits to the .META. table.  The .META.-holder crashed between the edits were added and the sending of the message to the master.  The .META.-holder probably lost daughter edits and offlining of the parent if no working flush.

> Mechanism that would not have -ROOT- and .META. on same server caused failed assign of .META.
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-2235
>                 URL: https://issues.apache.org/jira/browse/HBASE-2235
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>             Fix For: 0.20.4, 0.21.0
>
>
> Here is the short story:
> Scenario is a cluster of 3 servers.  Server 1. crashed.  It was carrying the .META.   We split the logs.  .META. is put on the head of the assignment queue.  Server 2. happens to be in a state where it wants to report a split.  The master fails the report because there is no .META. (It fails it ugly with a NPE).  Server 3. checks in and falls into the assignment code (RegionManager#regionsAwaitingAssignment).  In here we have this bit of code around line #412:
> {code}
>     if (reassigningMetas && isMetaOrRoot && !isSingleServer) {
>       return regionsToAssign; // dont assign anything to this server.
>     }
> {code}
> Because we think this not a single server cluster -- we think there are two 'live' nodes -- we won't assign meta.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2235) Mechanism that would not have -ROOT- and .META. on same server caused failed assign of .META.

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835985#action_12835985 ] 

Kannan Muthukkaruppan commented on HBASE-2235:
----------------------------------------------

@stack: You wrote <<<The report of split is not atomic; we are sending 3 separate puts. We don't have means of making a cross-row transaction out of this operation>>>.

Would keeping all region info for a table as columns under a single row key in .META. be such a crazy idea? With that you can do atomic mutations to a given table's region info. The cons would be that entire "rowkey" would be hosted on a single server... but for most tables that'd probably already be the case unless it has an exorbitant number of regions. 

> Mechanism that would not have -ROOT- and .META. on same server caused failed assign of .META.
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-2235
>                 URL: https://issues.apache.org/jira/browse/HBASE-2235
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>             Fix For: 0.20.4, 0.21.0
>
>
> Here is the short story:
> Scenario is a cluster of 3 servers.  Server 1. crashed.  It was carrying the .META.   We split the logs.  .META. is put on the head of the assignment queue.  Server 2. happens to be in a state where it wants to report a split.  The master fails the report because there is no .META. (It fails it ugly with a NPE).  Server 3. checks in and falls into the assignment code (RegionManager#regionsAwaitingAssignment).  In here we have this bit of code around line #412:
> {code}
>     if (reassigningMetas && isMetaOrRoot && !isSingleServer) {
>       return regionsToAssign; // dont assign anything to this server.
>     }
> {code}
> Because we think this not a single server cluster -- we think there are two 'live' nodes -- we won't assign meta.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2235) Mechanism that would not have -ROOT- and .META. on same server caused failed assign of .META.

Posted by "ryan rawson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835137#action_12835137 ] 

ryan rawson commented on HBASE-2235:
------------------------------------

Can we fail a message to the regionserver? or are we going to have to accept SPLIT messages when META is offline then queue them until META comes up?

> Mechanism that would not have -ROOT- and .META. on same server caused failed assign of .META.
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-2235
>                 URL: https://issues.apache.org/jira/browse/HBASE-2235
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>             Fix For: 0.20.4, 0.21.0
>
>
> Here is the short story:
> Scenario is a cluster of 3 servers.  Server 1. crashed.  It was carrying the .META.   We split the logs.  .META. is put on the head of the assignment queue.  Server 2. happens to be in a state where it wants to report a split.  The master fails the report because there is no .META. (It fails it ugly with a NPE).  Server 3. checks in and falls into the assignment code (RegionManager#regionsAwaitingAssignment).  In here we have this bit of code around line #412:
> {code}
>     if (reassigningMetas && isMetaOrRoot && !isSingleServer) {
>       return regionsToAssign; // dont assign anything to this server.
>     }
> {code}
> Because we think this not a single server cluster -- we think there are two 'live' nodes -- we won't assign meta.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2235) Mechanism that would not have -ROOT- and .META. on same server caused failed assign of .META.

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836104#action_12836104 ] 

Kannan Muthukkaruppan commented on HBASE-2235:
----------------------------------------------

forked off the "META getting inconsistent" issue into its own JIRA (HBASE-2244).

> Mechanism that would not have -ROOT- and .META. on same server caused failed assign of .META.
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-2235
>                 URL: https://issues.apache.org/jira/browse/HBASE-2235
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>             Fix For: 0.20.4, 0.21.0
>
>
> Here is the short story:
> Scenario is a cluster of 3 servers.  Server 1. crashed.  It was carrying the .META.   We split the logs.  .META. is put on the head of the assignment queue.  Server 2. happens to be in a state where it wants to report a split.  The master fails the report because there is no .META. (It fails it ugly with a NPE).  Server 3. checks in and falls into the assignment code (RegionManager#regionsAwaitingAssignment).  In here we have this bit of code around line #412:
> {code}
>     if (reassigningMetas && isMetaOrRoot && !isSingleServer) {
>       return regionsToAssign; // dont assign anything to this server.
>     }
> {code}
> Because we think this not a single server cluster -- we think there are two 'live' nodes -- we won't assign meta.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2235) Mechanism that would not have -ROOT- and .META. on same server caused failed assign of .META.

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835485#action_12835485 ] 

stack commented on HBASE-2235:
------------------------------

I don't think we can queue splits in master and apply them when .META. comes up.  Master could die and drop them.

What if we add a message to the regionserver+master protocol, one that said FAILED.  A FAILED could carry a message from the master like any other such as a .META. assignment.

> Mechanism that would not have -ROOT- and .META. on same server caused failed assign of .META.
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-2235
>                 URL: https://issues.apache.org/jira/browse/HBASE-2235
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>             Fix For: 0.20.4, 0.21.0
>
>
> Here is the short story:
> Scenario is a cluster of 3 servers.  Server 1. crashed.  It was carrying the .META.   We split the logs.  .META. is put on the head of the assignment queue.  Server 2. happens to be in a state where it wants to report a split.  The master fails the report because there is no .META. (It fails it ugly with a NPE).  Server 3. checks in and falls into the assignment code (RegionManager#regionsAwaitingAssignment).  In here we have this bit of code around line #412:
> {code}
>     if (reassigningMetas && isMetaOrRoot && !isSingleServer) {
>       return regionsToAssign; // dont assign anything to this server.
>     }
> {code}
> Because we think this not a single server cluster -- we think there are two 'live' nodes -- we won't assign meta.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2235) Mechanism that would not have -ROOT- and .META. on same server caused failed assign of .META.

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835615#action_12835615 ] 

stack commented on HBASE-2235:
------------------------------

@Kannan

See CompactSplitThread#split.  See how we do a few puts to the .META. table: One to disable parent split, and one each to add new daughter regions.  We then add to the queue of messages to send the server a 'split' message.  These gets sent on the next heartbeat over to the master 'informing' it of the split so it will go about assigning the new daughters (IIRC, if this message never made it across, the periodic scan of .META. would notice the new unassigned daughters and add them to the to-be-assigned list).

Let me know if you need more.

> Mechanism that would not have -ROOT- and .META. on same server caused failed assign of .META.
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-2235
>                 URL: https://issues.apache.org/jira/browse/HBASE-2235
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>             Fix For: 0.20.4, 0.21.0
>
>
> Here is the short story:
> Scenario is a cluster of 3 servers.  Server 1. crashed.  It was carrying the .META.   We split the logs.  .META. is put on the head of the assignment queue.  Server 2. happens to be in a state where it wants to report a split.  The master fails the report because there is no .META. (It fails it ugly with a NPE).  Server 3. checks in and falls into the assignment code (RegionManager#regionsAwaitingAssignment).  In here we have this bit of code around line #412:
> {code}
>     if (reassigningMetas && isMetaOrRoot && !isSingleServer) {
>       return regionsToAssign; // dont assign anything to this server.
>     }
> {code}
> Because we think this not a single server cluster -- we think there are two 'live' nodes -- we won't assign meta.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2235) Mechanism that would not have -ROOT- and .META. on same server caused failed assign of .META.

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835618#action_12835618 ] 

stack commented on HBASE-2235:
------------------------------

@Kannan The report of split is not atomic; we are sending 3 separate puts.  We don't have means of making a cross-row transaction out of this operation.  And then there is the message to the master.  The message to the master used to be done as three messages but in head of branch it was collapsed into one.  We need to improve upon the above.  Using something like the Todd-suggested receipe, we'd write first to our WAL all details on the split (parent and daughters).  If we died in the middle of update of .META. or report to master, the replay of the log on open of the region post-crash in the new location would find the split recording and check .META. that split made it in doing fixup if split is not all present (If the split was incompletely recorded in the log, clean up the failed split and retry it).

> Mechanism that would not have -ROOT- and .META. on same server caused failed assign of .META.
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-2235
>                 URL: https://issues.apache.org/jira/browse/HBASE-2235
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>             Fix For: 0.20.4, 0.21.0
>
>
> Here is the short story:
> Scenario is a cluster of 3 servers.  Server 1. crashed.  It was carrying the .META.   We split the logs.  .META. is put on the head of the assignment queue.  Server 2. happens to be in a state where it wants to report a split.  The master fails the report because there is no .META. (It fails it ugly with a NPE).  Server 3. checks in and falls into the assignment code (RegionManager#regionsAwaitingAssignment).  In here we have this bit of code around line #412:
> {code}
>     if (reassigningMetas && isMetaOrRoot && !isSingleServer) {
>       return regionsToAssign; // dont assign anything to this server.
>     }
> {code}
> Because we think this not a single server cluster -- we think there are two 'live' nodes -- we won't assign meta.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2235) Mechanism that would not have -ROOT- and .META. on same server caused failed assign of .META.

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835979#action_12835979 ] 

Kannan Muthukkaruppan commented on HBASE-2235:
----------------------------------------------

I managed to get the .META. table inconsistent again in my small test cluster under load. The region server went down due to some errors from the HDFS layer... which we are separately following up on (probably just too much compaction, and stuff going on at the same time).

I know I can run the add_table to restore its sanity. But a few times now we have managed to get .META. inconsistent that it might make sense to do something about it in the 0.20.x timeframe.. (either make .META. updates atomic or have the meta scanner perhaps fix broken children).

So, roughly here is what happened today.

(i) A RS got a lot of org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException errors followed by:

{code}
2010-02-19 08:49:07,102 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_9144926768183088527_186431 bad datanode[0] nodes == null
2010-02-19 08:49:07,102 WARN org.apache.hadoop.hdfs.DFSClient: Could not get block locations. Source file "/hbase-kannan1/test1/580635726/actions/133921297\
0969249937" - Aborting...
2010-02-19 08:49:07,117 FATAL org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Replay of hlog required. Forcing server shutdow
{code}

(ii) During shutdown there were other errors like:

{code}
2010-02-19 08:51:07,557 ERROR org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction/Split failed for region test1,1761194,1266576717079
java.io.IOException: Filesystem closed
2010-02-19 08:51:07,660 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: Shutting down HRegionServer: file system not available
java.io.IOException: File system is not available
...

2010-02-19 08:50:07,321 WARN org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Tried to hold up flushing for compactions of region test1,1761194,126657\
6717079 but have waited longer than 90000ms, continuing
2010-02-19 08:50:07,322 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: NOT flushing memstore for region test1,1761194,1266576717079, flushing=false, w\
ritesEnabled=false
2010-02-19 08:50:07,348 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call put([B@39804c99, [Lorg.apache.hadoop.hbase.client.Put;@1624ee4d)\
 from 10.131.1.186:36796: output error
2010-02-19 08:50:07,354 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call put([B@5f3034b2, [Lorg.apache.hadoop.hbase.client.Put;@55d3c2f0)\
 from 10.131.1.186:36796: output error
2010-02-19 08:50:07,354 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 82 on 60020 caught: java.nio.channels.ClosedChannelException
        at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
        at org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1125)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:615)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:679)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:943)
{code}

After all this, when I restarted the RS. But several regions seem to be in odd state in .META. For example, for a particular startkey, I see all these entries:
{code}
test1,1204765,1266569946560 column=info:regioninfo, timestamp=1266581302018, value=REGION => {NAME => 'test1,
                             1204765,1266569946560', STARTKEY => '1204765', ENDKEY => '1441091', ENCODED => 18
                             19368969, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
                              [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
                             , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
 test1,1204765,1266569946560 column=info:server, timestamp=1266570029133, value=10.129.68.212:60020
 test1,1204765,1266569946560 column=info:serverstartcode, timestamp=1266570029133, value=1266562597546
 test1,1204765,1266569946560 column=info:splitB, timestamp=1266581302018, value=\x00\x071441091\x00\x00\x00\x0
                             1\x26\xE6\x1F\xDF\x27\x1Btest1,1290703,1266581233447\x00\x071290703\x00\x00\x00\x
                             05\x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x
                             00\x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00
                             \x00\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSI
                             ON\x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TT
                             L\x00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00
                             \x00\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04t
                             rueh\x0FQ\xCF
 test1,1204765,1266581233447 column=info:regioninfo, timestamp=1266609172177, value=REGION => {NAME => 'test1,
                             1204765,1266581233447', STARTKEY => '1204765', ENDKEY => '1290703', ENCODED => 13
                             73493090, OFFLINE => true, SPLIT => true, TABLE => {{NAME => 'test1', FAMILIES =>
                              [{NAME => 'actions', VERSIONS => '3', COMPRESSION => 'NONE', TTL => '2147483647'
                             , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
 test1,1204765,1266581233447 column=info:server, timestamp=1266604768670, value=10.129.68.213:60020
 test1,1204765,1266581233447 column=info:serverstartcode, timestamp=1266604768670, value=1266562597511
 test1,1204765,1266581233447 column=info:splitA, timestamp=1266609172177, value=\x00\x071226169\x00\x00\x00\x0
                             1\x26\xE7\xCA,\x7D\x1Btest1,1204765,1266609171581\x00\x071204765\x00\x00\x00\x05\
                             x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
                             x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
                             0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
                             x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
                             00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
                             0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
                             \xB9\xBD\xFEO
 test1,1204765,1266581233447 column=info:splitB, timestamp=1266609172177, value=\x00\x071290703\x00\x00\x00\x0
                             1\x26\xE7\xCA,\x7D\x1Btest1,1226169,1266609171581\x00\x071226169\x00\x00\x00\x05\
                             x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
                             x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
                             0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
                             x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
                             00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
                             0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
                             \xE1\xDF\xF8p
 test1,1204765,1266609171581 column=info:regioninfo, timestamp=1266609172212, value=REGION => {NAME => 'test1,
                             1204765,1266609171581', STARTKEY => '1204765', ENDKEY => '1226169', ENCODED => 21
                             34878372, TABLE => {{NAME => 'test1', FAMILIES => [{NAME => 'actions', VERSIONS =
                             > '3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMOR
                             Y => 'false', BLOCKCACHE => 'true'}]}}
{code} 

> Mechanism that would not have -ROOT- and .META. on same server caused failed assign of .META.
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-2235
>                 URL: https://issues.apache.org/jira/browse/HBASE-2235
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>             Fix For: 0.20.4, 0.21.0
>
>
> Here is the short story:
> Scenario is a cluster of 3 servers.  Server 1. crashed.  It was carrying the .META.   We split the logs.  .META. is put on the head of the assignment queue.  Server 2. happens to be in a state where it wants to report a split.  The master fails the report because there is no .META. (It fails it ugly with a NPE).  Server 3. checks in and falls into the assignment code (RegionManager#regionsAwaitingAssignment).  In here we have this bit of code around line #412:
> {code}
>     if (reassigningMetas && isMetaOrRoot && !isSingleServer) {
>       return regionsToAssign; // dont assign anything to this server.
>     }
> {code}
> Because we think this not a single server cluster -- we think there are two 'live' nodes -- we won't assign meta.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.