You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Vidhyashankar Venkataraman <vi...@yahoo-inc.com> on 2011/05/11 23:32:04 UTC

Master crash during assignment.

The master of my Hbase instance (0.90.x) crashes each time it is restarted, with the exceptions shown below. Can you let me know what this is usually due to? (I also saw these exceptions in a JIRA but they were about uncaught EOF exception). Only the master dies while the region servers wait for a master to wake back up.

Thank you
Vidhya

The master log:

2011-05-11 21:19:04,259 FATAL org.apache.hadoop.hbase.master.HMaster: Remote unexpected exception
java.io.IOException: Call to /67.195.47.230:44420 failed on local exception: java.io.EOFException        at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:788)
        at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:757)        at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
        at $Proxy7.closeRegion(Unknown Source)
        at org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:589)
        at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1092)
        at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1039)
        at org.apache.hadoop.hbase.master.AssignmentManager.balance(AssignmentManager.java:1808)
        at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:691)
        at org.apache.hadoop.hbase.master.HMaster$1.chore(HMaster.java:582)
        at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
Caused by: java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:375)
        at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:521)        at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:459)2011-05-11 21:19:04,260 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
2011-05-11 21:19:04,260 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=WCC.davesch2,r:at#start#www!/Gateway2000!http,1302916227366.b7d206f663282e2a37adb24ba7e4c0de., src=b3110318.yst.yahoo.net,44420,1305073517470, dest=b3110175.yst.yahoo.net,44420,1305073507459
2011-05-11 21:19:04,260 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region WCC.davesch2,r:at#start#www!/Gateway2000!http
,1302916227366.b7d206f663282e2a37adb24ba7e4c0de. (offlining)
2011-05-11 21:19:04,260 FATAL org.apache.hadoop.hbase.master.HMaster: Remote unexpected exception
java.io.IOException: Call to /67.195.47.230:44420 failed on local exception: java.io.EOFException
        at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:788)        at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:757)
        at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
        at $Proxy7.closeRegion(Unknown Source)        at org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:589)
        at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1092)
        at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1039)
        at org.apache.hadoop.hbase.master.AssignmentManager.balance(AssignmentManager.java:1808)
        at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:691)
        at org.apache.hadoop.hbase.master.HMaster$1.chore(HMaster.java:582)
        at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
Caused by: java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:375)
        at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:521)
        at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:459)
2011-05-11 21:19:04,260 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service threads
2011-05-11 21:19:04,260 INFO org.apache.hadoop.hbase.master.HMaster: Aborting

Re: Master crash during assignment.

Posted by Vidhyashankar Venkataraman <vi...@yahoo-inc.com>.
Thank you Stack! I will try applying to my code base and see if it works.

Thanks again
Vidhya


On 5/12/11 11:02 AM, "Stack" <st...@duboce.net> wrote:

The issue says that it was applied to the branch for 0.90.2.    Thats
a misstatement.   The patch was not applied.  Will apply to the branch
now.
St.Ack

On Thu, May 12, 2011 at 10:59 AM, Stack <st...@duboce.net> wrote:
> Vidhya:
>
> So its failing to send close to an explicit server -- see the IP in
> the below -- and the other server is closing down the request
> prematurely so we get the EOFE.  Can you see anything in the logs on
> that machine?
>
> Regards EOFE crashing Master, you might want to pick up a TRUNK
> change.  See http://hbase.apache.org/xref/org/apache/hadoop/hbase/master/AssignmentManager.html#1261
> (This is how TRUNK looks).  Notice that its more generic than what you
> currently have -- or add a catch for the EOFE.
>
> The patch is actually kinda small and targetted explicitly to fix the
> likes of what you are seeing:
>
> +   HBASE-3617  NoRouteToHostException during balancing will cause Master abort
> +               (Ted Yu via Stack)
>
> Let me know if it works for you.  If so, I'll backport it to the branch.
>
> St.Ack
>
>
>
> On Wed, May 11, 2011 at 2:32 PM, Vidhyashankar Venkataraman
> <vi...@yahoo-inc.com> wrote:
>> The master of my Hbase instance (0.90.x) crashes each time it is restarted, with the exceptions shown below. Can you let me know what this is usually due to? (I also saw these exceptions in a JIRA but they were about uncaught EOF exception). Only the master dies while the region servers wait for a master to wake back up.
>>
>> Thank you
>> Vidhya
>>
>> The master log:
>>
>> 2011-05-11 21:19:04,259 FATAL org.apache.hadoop.hbase.master.HMaster: Remote unexpected exception
>> java.io.IOException: Call to /67.195.47.230:44420 failed on local exception: java.io.EOFException        at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:788)
>>        at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:757)        at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
>>        at $Proxy7.closeRegion(Unknown Source)
>>        at org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:589)
>>        at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1092)
>>        at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1039)
>>        at org.apache.hadoop.hbase.master.AssignmentManager.balance(AssignmentManager.java:1808)
>>        at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:691)
>>        at org.apache.hadoop.hbase.master.HMaster$1.chore(HMaster.java:582)
>>        at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
>> Caused by: java.io.EOFException
>>        at java.io.DataInputStream.readInt(DataInputStream.java:375)
>>        at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:521)        at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:459)2011-05-11 21:19:04,260 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
>> 2011-05-11 21:19:04,260 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=WCC.davesch2,r:at#start#www!/Gateway2000!http,1302916227366.b7d206f663282e2a37adb24ba7e4c0de., src=b3110318.yst.yahoo.net,44420,1305073517470, dest=b3110175.yst.yahoo.net,44420,1305073507459
>> 2011-05-11 21:19:04,260 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region WCC.davesch2,r:at#start#www!/Gateway2000!http
>> ,1302916227366.b7d206f663282e2a37adb24ba7e4c0de. (offlining)
>> 2011-05-11 21:19:04,260 FATAL org.apache.hadoop.hbase.master.HMaster: Remote unexpected exception
>> java.io.IOException: Call to /67.195.47.230:44420 failed on local exception: java.io.EOFException
>>        at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:788)        at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:757)
>>        at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
>>        at $Proxy7.closeRegion(Unknown Source)        at org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:589)
>>        at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1092)
>>        at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1039)
>>        at org.apache.hadoop.hbase.master.AssignmentManager.balance(AssignmentManager.java:1808)
>>        at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:691)
>>        at org.apache.hadoop.hbase.master.HMaster$1.chore(HMaster.java:582)
>>        at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
>> Caused by: java.io.EOFException
>>        at java.io.DataInputStream.readInt(DataInputStream.java:375)
>>        at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:521)
>>        at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:459)
>> 2011-05-11 21:19:04,260 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service threads
>> 2011-05-11 21:19:04,260 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
>>
>


Re: Master crash during assignment.

Posted by Stack <st...@duboce.net>.
The issue says that it was applied to the branch for 0.90.2.    Thats
a misstatement.   The patch was not applied.  Will apply to the branch
now.
St.Ack

On Thu, May 12, 2011 at 10:59 AM, Stack <st...@duboce.net> wrote:
> Vidhya:
>
> So its failing to send close to an explicit server -- see the IP in
> the below -- and the other server is closing down the request
> prematurely so we get the EOFE.  Can you see anything in the logs on
> that machine?
>
> Regards EOFE crashing Master, you might want to pick up a TRUNK
> change.  See http://hbase.apache.org/xref/org/apache/hadoop/hbase/master/AssignmentManager.html#1261
> (This is how TRUNK looks).  Notice that its more generic than what you
> currently have -- or add a catch for the EOFE.
>
> The patch is actually kinda small and targetted explicitly to fix the
> likes of what you are seeing:
>
> +   HBASE-3617  NoRouteToHostException during balancing will cause Master abort
> +               (Ted Yu via Stack)
>
> Let me know if it works for you.  If so, I'll backport it to the branch.
>
> St.Ack
>
>
>
> On Wed, May 11, 2011 at 2:32 PM, Vidhyashankar Venkataraman
> <vi...@yahoo-inc.com> wrote:
>> The master of my Hbase instance (0.90.x) crashes each time it is restarted, with the exceptions shown below. Can you let me know what this is usually due to? (I also saw these exceptions in a JIRA but they were about uncaught EOF exception). Only the master dies while the region servers wait for a master to wake back up.
>>
>> Thank you
>> Vidhya
>>
>> The master log:
>>
>> 2011-05-11 21:19:04,259 FATAL org.apache.hadoop.hbase.master.HMaster: Remote unexpected exception
>> java.io.IOException: Call to /67.195.47.230:44420 failed on local exception: java.io.EOFException        at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:788)
>>        at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:757)        at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
>>        at $Proxy7.closeRegion(Unknown Source)
>>        at org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:589)
>>        at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1092)
>>        at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1039)
>>        at org.apache.hadoop.hbase.master.AssignmentManager.balance(AssignmentManager.java:1808)
>>        at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:691)
>>        at org.apache.hadoop.hbase.master.HMaster$1.chore(HMaster.java:582)
>>        at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
>> Caused by: java.io.EOFException
>>        at java.io.DataInputStream.readInt(DataInputStream.java:375)
>>        at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:521)        at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:459)2011-05-11 21:19:04,260 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
>> 2011-05-11 21:19:04,260 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=WCC.davesch2,r:at#start#www!/Gateway2000!http,1302916227366.b7d206f663282e2a37adb24ba7e4c0de., src=b3110318.yst.yahoo.net,44420,1305073517470, dest=b3110175.yst.yahoo.net,44420,1305073507459
>> 2011-05-11 21:19:04,260 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region WCC.davesch2,r:at#start#www!/Gateway2000!http
>> ,1302916227366.b7d206f663282e2a37adb24ba7e4c0de. (offlining)
>> 2011-05-11 21:19:04,260 FATAL org.apache.hadoop.hbase.master.HMaster: Remote unexpected exception
>> java.io.IOException: Call to /67.195.47.230:44420 failed on local exception: java.io.EOFException
>>        at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:788)        at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:757)
>>        at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
>>        at $Proxy7.closeRegion(Unknown Source)        at org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:589)
>>        at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1092)
>>        at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1039)
>>        at org.apache.hadoop.hbase.master.AssignmentManager.balance(AssignmentManager.java:1808)
>>        at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:691)
>>        at org.apache.hadoop.hbase.master.HMaster$1.chore(HMaster.java:582)
>>        at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
>> Caused by: java.io.EOFException
>>        at java.io.DataInputStream.readInt(DataInputStream.java:375)
>>        at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:521)
>>        at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:459)
>> 2011-05-11 21:19:04,260 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service threads
>> 2011-05-11 21:19:04,260 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
>>
>

Re: Master crash during assignment.

Posted by Stack <st...@duboce.net>.
Vidhya:

So its failing to send close to an explicit server -- see the IP in
the below -- and the other server is closing down the request
prematurely so we get the EOFE.  Can you see anything in the logs on
that machine?

Regards EOFE crashing Master, you might want to pick up a TRUNK
change.  See http://hbase.apache.org/xref/org/apache/hadoop/hbase/master/AssignmentManager.html#1261
(This is how TRUNK looks).  Notice that its more generic than what you
currently have -- or add a catch for the EOFE.

The patch is actually kinda small and targetted explicitly to fix the
likes of what you are seeing:

+   HBASE-3617  NoRouteToHostException during balancing will cause Master abort
+               (Ted Yu via Stack)

Let me know if it works for you.  If so, I'll backport it to the branch.

St.Ack



On Wed, May 11, 2011 at 2:32 PM, Vidhyashankar Venkataraman
<vi...@yahoo-inc.com> wrote:
> The master of my Hbase instance (0.90.x) crashes each time it is restarted, with the exceptions shown below. Can you let me know what this is usually due to? (I also saw these exceptions in a JIRA but they were about uncaught EOF exception). Only the master dies while the region servers wait for a master to wake back up.
>
> Thank you
> Vidhya
>
> The master log:
>
> 2011-05-11 21:19:04,259 FATAL org.apache.hadoop.hbase.master.HMaster: Remote unexpected exception
> java.io.IOException: Call to /67.195.47.230:44420 failed on local exception: java.io.EOFException        at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:788)
>        at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:757)        at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
>        at $Proxy7.closeRegion(Unknown Source)
>        at org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:589)
>        at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1092)
>        at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1039)
>        at org.apache.hadoop.hbase.master.AssignmentManager.balance(AssignmentManager.java:1808)
>        at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:691)
>        at org.apache.hadoop.hbase.master.HMaster$1.chore(HMaster.java:582)
>        at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
> Caused by: java.io.EOFException
>        at java.io.DataInputStream.readInt(DataInputStream.java:375)
>        at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:521)        at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:459)2011-05-11 21:19:04,260 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
> 2011-05-11 21:19:04,260 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=WCC.davesch2,r:at#start#www!/Gateway2000!http,1302916227366.b7d206f663282e2a37adb24ba7e4c0de., src=b3110318.yst.yahoo.net,44420,1305073517470, dest=b3110175.yst.yahoo.net,44420,1305073507459
> 2011-05-11 21:19:04,260 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region WCC.davesch2,r:at#start#www!/Gateway2000!http
> ,1302916227366.b7d206f663282e2a37adb24ba7e4c0de. (offlining)
> 2011-05-11 21:19:04,260 FATAL org.apache.hadoop.hbase.master.HMaster: Remote unexpected exception
> java.io.IOException: Call to /67.195.47.230:44420 failed on local exception: java.io.EOFException
>        at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:788)        at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:757)
>        at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
>        at $Proxy7.closeRegion(Unknown Source)        at org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:589)
>        at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1092)
>        at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1039)
>        at org.apache.hadoop.hbase.master.AssignmentManager.balance(AssignmentManager.java:1808)
>        at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:691)
>        at org.apache.hadoop.hbase.master.HMaster$1.chore(HMaster.java:582)
>        at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
> Caused by: java.io.EOFException
>        at java.io.DataInputStream.readInt(DataInputStream.java:375)
>        at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:521)
>        at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:459)
> 2011-05-11 21:19:04,260 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service threads
> 2011-05-11 21:19:04,260 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
>