You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by Rahul Das <ra...@gmail.com> on 2011/07/22 09:15:57 UTC

Hadoop Namenode problem

Hi,

I am running a Hadoop cluster with 20 Data node. Yesterday I found that the
Namenode was not responding ( No write/read to HDFS is happening). It got
stuck for few hours, then I shut down the Namenode and found the following
error from the Name node log.

2011-07-21 16:15:31,500 WARN org.apache.hadoop.ipc.Server: IPC Server
Responder, call
getProtocolVersion(org.apache.hadoop.hdfs.protocol.ClientProtocol, 41) from
xx.xx.xx.xx:13568: output error

This error was coming for every data node and data nodes are not able to
communicate with the Name node

After I restart the Namenode

2011-07-21 16:31:54,110 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
2011-07-21 16:31:54,216 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
Initializing RPC Metrics with hostName=NameNode, port=9000
2011-07-21 16:31:54,223 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
xx.xx.xx.xx:9000
2011-07-21 16:31:54,225 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=NameNode, sessionId=null
2011-07-21 16:31:54,226 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing
NameNodeMeterics using context
object:org.apache.hadoop.metrics.spi.NullContext
2011-07-21 16:31:54,280 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
2011-07-21 16:31:54,280 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2011-07-21 16:31:54,280 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
isPermissionEnabled=false
2011-07-21 16:31:54,287 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
Initializing FSNamesystemMetrics using context
object:org.apache.hadoop.metrics.spi.NullContext
2011-07-21 16:31:54,289 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
FSNamesystemStatusMBean
2011-07-21 16:31:54,880 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files = 15817482
2011-07-21 16:34:38,463 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files under construction = 82
2011-07-21 16:34:41,177 INFO org.apache.hadoop.hdfs.server.common.Storage:
Image file of size 2042701824 loaded in 166 seconds.
2011-07-21 16:58:07,624 INFO org.apache.hadoop.hdfs.server.common.Storage:
Edits file /home/hadoop/current/edits of size 12751835 edits # 138217 loaded
in 1406 seconds.

And it goes for a long halt. After about an hour it starts working again.

My question is when the error "IPC Server Responde" comes and is there a way
to deal with it.
Also if my Namenode is busy doing something then what is the way to find out
what it is doing.

Regards,
Rahul

Re: Hadoop Namenode problem

Posted by Joey Echeverria <jo...@cloudera.com>.

Yes, it should print something along the lines of:

The reported blocks 11 has reached the threshold 0.9990 of total
blocks 11. Safe mode will be turned off automatically in 8 seconds.

-Joey

On Fri, Jul 29, 2011 at 12:26 AM, Rahul Das <ra...@gmail.com> wrote:
> No there was no error only following things happens.
>
> 2011-07-21 14:14:30,039 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
> ugi=hadoop,hadoop       ip=/xx.xx.xx.xx  cmd=create
> src=/user/hdfs/files/d954x328-85x8-4dfe-b73c-34a7a2c1xb0f
> dst=null        perm=hadoop:supergroup:rw-r--r--
> 2011-07-21 14:14:30,041 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.allocateBlock:
> /user/hdfs/files/d954x328-85x8-4dfe-b73c-34a7a2c1xb0f .
> blk_-3217626427379030207_15834365
> 2011-07-21 14:14:30,120 INFO org.apache.hadoop.hdfs.StateChange: DIR*
> NameSystem.completeFile: file
> /user/hdfs/files/d954x328-85x8-4dfe-b73c-34a7a2c1xb0f is closed by
> DFSClient_1277823200
>
> Is there any way I can find out from the log when the safe mode gets over.
>
> Regards,
> Rahul
>
> On Thu, Jul 28, 2011 at 6:16 PM, Joey Echeverria <jo...@cloudera.com> wrote:
>>
>> Nothing from around 1630?
>> -Joey
>>
>>
>>
>> On Jul 28, 2011, at 5:06, Rahul Das <ra...@gmail.com> wrote:
>>
>> Hi Joey,
>>
>> The log is too big to attach into mail. What I found that there is no
>> error during this time.
>> Only few Warnings are coming like
>>
>> 2011-07-21 14:13:47,814 WARN
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>> PendingReplicationMonitor timed out block blk_-6058282241824946206_13375223
>> ...
>> ...
>> 2011-07-21 14:30:49,511 WARN
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Inconsistent size for
>> block blk_8615896953045629213_15838442 reported from xx.xx.xx.xx:50010
>> current size is 1950720 reported size is 2448907
>>
>> I think the edit file size was too huge thats why it took long time.
>>
>> Regards,
>> Rahul
>>
>> On Fri, Jul 22, 2011 at 9:33 PM, Joey Echeverria <jo...@cloudera.com>
>> wrote:
>>>
>>> The long startup time after the restart looks like it was caused because
>>> the SecondaryNameNode hasn't been able to roll the edits log for some time.
>>> Can you post your Namenode log from around the same time in this
>>> SecondaryNameNode log (2011-07-21 16:00-16:30)?
>>> -Joey
>>>
>>> On Fri, Jul 22, 2011 at 8:29 AM, Rahul Das <ra...@gmail.com> wrote:
>>>>
>>>> Yes I have a secondary Namenode running. Here are the log for
>>>> SecondaryNamenode
>>>>
>>>> 2011-07-21 16:02:47,908 INFO
>>>> org.apache.hadoop.hdfs.server.common.Storage: Edits file
>>>> /home/hadoop/tmp/dfs/namesecondary/current/edits of size 12751835 edits #
>>>> 138217 loaded in 1581 seconds.
>>>> 2011-07-21 16:03:21,925 INFO
>>>> org.apache.hadoop.hdfs.server.common.Storage: Image file of size 2045516451
>>>> saved in 29 seconds.
>>>> 2011-07-21 16:03:24,974 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions:
>>>> 0 Total time for transactions(ms): 0Number of transactions batched in Syncs:
>>>> 0 Number of syncs: 0 SyncTimes(ms): 0
>>>> 2011-07-21 16:03:25,545 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Posted URL
>>>> xx.xx.xx.xx:50070putimage=1&port=50090&machine=xx.xx.xx.xx&token=-18:1554828842:0:1311242583000:1311240481442
>>>> 2011-07-21 16:29:24,356 ERROR
>>>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in
>>>> doCheckpoint:
>>>> 2011-07-21 16:29:24,358 ERROR
>>>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:
>>>> java.io.IOException: Call to xx.xx.xx.xx:9000 failed on local exception:
>>>> java.io.IOException: Connection reset by peer
>>>>
>>>> Regards,
>>>> Rahul
>>>>
>>>> On Fri, Jul 22, 2011 at 5:40 PM, Joey Echeverria <jo...@cloudera.com>
>>>> wrote:
>>>>>
>>>>> Do you have an instance of the SecondaryNamenode in your cluster?
>>>>> -Joey
>>>>>
>>>>> On Fri, Jul 22, 2011 at 3:15 AM, Rahul Das <ra...@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am running a Hadoop cluster with 20 Data node. Yesterday I found
>>>>>> that the Namenode was not responding ( No write/read to HDFS is happening).
>>>>>> It got stuck for few hours, then I shut down the Namenode and found the
>>>>>> following error from the Name node log.
>>>>>>
>>>>>> 2011-07-21 16:15:31,500 WARN org.apache.hadoop.ipc.Server: IPC Server
>>>>>> Responder, call
>>>>>> getProtocolVersion(org.apache.hadoop.hdfs.protocol.ClientProtocol, 41) from
>>>>>> xx.xx.xx.xx:13568: output error
>>>>>>
>>>>>> This error was coming for every data node and data nodes are not able
>>>>>> to communicate with the Name node
>>>>>>
>>>>>> After I restart the Namenode
>>>>>>
>>>>>> 2011-07-21 16:31:54,110 INFO
>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
>>>>>> 2011-07-21 16:31:54,216 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
>>>>>> Initializing RPC Metrics with hostName=NameNode, port=9000
>>>>>> 2011-07-21 16:31:54,223 INFO
>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
>>>>>> xx.xx.xx.xx:9000
>>>>>> 2011-07-21 16:31:54,225 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
>>>>>> Initializing JVM Metrics with processName=NameNode, sessionId=null
>>>>>> 2011-07-21 16:31:54,226 INFO
>>>>>> org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing
>>>>>> NameNodeMeterics using context
>>>>>> object:org.apache.hadoop.metrics.spi.NullContext
>>>>>> 2011-07-21 16:31:54,280 INFO
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
>>>>>> 2011-07-21 16:31:54,280 INFO
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
>>>>>> 2011-07-21 16:31:54,280 INFO
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>>>>>> isPermissionEnabled=false
>>>>>> 2011-07-21 16:31:54,287 INFO
>>>>>> org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
>>>>>> Initializing FSNamesystemMetrics using context
>>>>>> object:org.apache.hadoop.metrics.spi.NullContext
>>>>>> 2011-07-21 16:31:54,289 INFO
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
>>>>>> FSNamesystemStatusMBean
>>>>>> 2011-07-21 16:31:54,880 INFO
>>>>>> org.apache.hadoop.hdfs.server.common.Storage: Number of files = 15817482
>>>>>> 2011-07-21 16:34:38,463 INFO
>>>>>> org.apache.hadoop.hdfs.server.common.Storage: Number of files under
>>>>>> construction = 82
>>>>>> 2011-07-21 16:34:41,177 INFO
>>>>>> org.apache.hadoop.hdfs.server.common.Storage: Image file of size 2042701824
>>>>>> loaded in 166 seconds.
>>>>>> 2011-07-21 16:58:07,624 INFO
>>>>>> org.apache.hadoop.hdfs.server.common.Storage: Edits file
>>>>>> /home/hadoop/current/edits of size 12751835 edits # 138217 loaded in 1406
>>>>>> seconds.
>>>>>>
>>>>>> And it goes for a long halt. After about an hour it starts working
>>>>>> again.
>>>>>>
>>>>>> My question is when the error "IPC Server Responde" comes and is there
>>>>>> a way to deal with it.
>>>>>> Also if my Namenode is busy doing something then what is the way to
>>>>>> find out what it is doing.
>>>>>>
>>>>>> Regards,
>>>>>> Rahul
>>>>>
>>>>>
>>>>> --
>>>>> Joseph Echeverria
>>>>> Cloudera, Inc.
>>>>> 443.305.9434
>>>>
>>>
>>>
>>>
>>> --
>>> Joseph Echeverria
>>> Cloudera, Inc.
>>> 443.305.9434
>>
>
>



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Re: Hadoop Namenode problem

Posted by Rahul Das <ra...@gmail.com>.

No there was no error only following things happens.

2011-07-21 14:14:30,039 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
ugi=hadoop,hadoop       ip=/xx.xx.xx.xx  cmd=create
src=/user/hdfs/files/d954x328-85x8-4dfe-b73c-34a7a2c1xb0f
dst=null        perm=hadoop:supergroup:rw-r--r--
2011-07-21 14:14:30,041 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.allocateBlock: /user/hdfs/files/d954x328-85x8-4dfe-b73c-34a7a2c1xb0f
. blk_-3217626427379030207_15834365
2011-07-21 14:14:30,120 INFO org.apache.hadoop.hdfs.StateChange: DIR*
NameSystem.completeFile: file
/user/hdfs/files/d954x328-85x8-4dfe-b73c-34a7a2c1xb0f
is closed by DFSClient_1277823200

Is there any way I can find out from the log when the safe mode gets over.

Regards,
Rahul

On Thu, Jul 28, 2011 at 6:16 PM, Joey Echeverria <jo...@cloudera.com> wrote:

> Nothing from around 1630?
>
> -Joey
>
>
>
> On Jul 28, 2011, at 5:06, Rahul Das <ra...@gmail.com> wrote:
>
> Hi Joey,
>
> The log is too big to attach into mail. What I found that there is no error
> during this time.
> Only few Warnings are coming like
>
> 2011-07-21 14:13:47,814 WARN
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> PendingReplicationMonitor timed out block blk_-6058282241824946206_13375223
> ...
> ...
> 2011-07-21 14:30:49,511 WARN
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Inconsistent size for
> block blk_8615896953045629213_15838442 reported from xx.xx.xx.xx:50010
> current size is 1950720 reported size is 2448907
>
> I think the edit file size was too huge thats why it took long time.
>
> Regards,
> Rahul
>
> On Fri, Jul 22, 2011 at 9:33 PM, Joey Echeverria < <jo...@cloudera.com>
> joey@cloudera.com> wrote:
>
>> The long startup time after the restart looks like it was caused because
>> the SecondaryNameNode hasn't been able to roll the edits log for some time.
>> Can you post your Namenode log from around the same time in this
>> SecondaryNameNode log (2011-07-21 16:00-16:30)?
>>
>> -Joey
>>
>>
>> On Fri, Jul 22, 2011 at 8:29 AM, Rahul Das < <ra...@gmail.com>
>> rahul.hdpq@gmail.com> wrote:
>>
>>> Yes I have a secondary Namenode running. Here are the log for
>>> SecondaryNamenode
>>>
>>> 2011-07-21 16:02:47,908 INFO
>>> org.apache.hadoop.hdfs.server.common.Storage: Edits file
>>> /home/hadoop/tmp/dfs/namesecondary/current/edits of size 12751835 edits #
>>> 138217 loaded in 1581 seconds.
>>> 2011-07-21 16:03:21,925 INFO
>>> org.apache.hadoop.hdfs.server.common.Storage: Image file of size
>>> 2045516451 saved in 29 seconds.
>>> 2011-07-21 16:03:24,974 INFO
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions:
>>> 0 Total time for transactions(ms): 0Number of transactions batched in Syncs:
>>> 0 Number of syncs: 0 SyncTimes(ms): 0
>>> 2011-07-21 16:03:25,545 INFO
>>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Posted URL
>>> xx.xx.xx.xx:50070putimage=1&port=50090&machine=xx.xx.xx.xx&token=-18:1554828842:0:1311242583000:1311240481442
>>> 2011-07-21 16:29:24,356 ERROR
>>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in
>>> doCheckpoint:
>>> 2011-07-21 16:29:24,358 ERROR
>>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:
>>> java.io.IOException: Call to xx.xx.xx.xx:9000 failed on local exception:
>>> java.io.IOException: Connection reset by peer
>>>
>>> Regards,
>>> Rahul
>>>
>>>
>>> On Fri, Jul 22, 2011 at 5:40 PM, Joey Echeverria < <jo...@cloudera.com>
>>> joey@cloudera.com> wrote:
>>>
>>>> Do you have an instance of the SecondaryNamenode in your cluster?
>>>>
>>>> -Joey
>>>>
>>>>
>>>> On Fri, Jul 22, 2011 at 3:15 AM, Rahul Das < <ra...@gmail.com>
>>>> rahul.hdpq@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am running a Hadoop cluster with 20 Data node. Yesterday I found that
>>>>> the Namenode was not responding ( No write/read to HDFS is happening). It
>>>>> got stuck for few hours, then I shut down the Namenode and found the
>>>>> following error from the Name node log.
>>>>>
>>>>> 2011-07-21 16:15:31,500 WARN org.apache.hadoop.ipc.Server: IPC Server
>>>>> Responder, call
>>>>> getProtocolVersion(org.apache.hadoop.hdfs.protocol.ClientProtocol, 41) from
>>>>> xx.xx.xx.xx:13568: output error
>>>>>
>>>>> This error was coming for every data node and data nodes are not able
>>>>> to communicate with the Name node
>>>>>
>>>>> After I restart the Namenode
>>>>>
>>>>> 2011-07-21 16:31:54,110 INFO
>>>>> org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
>>>>> 2011-07-21 16:31:54,216 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
>>>>> Initializing RPC Metrics with hostName=NameNode, port=9000
>>>>> 2011-07-21 16:31:54,223 INFO
>>>>> org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
>>>>> xx.xx.xx.xx:9000
>>>>> 2011-07-21 16:31:54,225 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
>>>>> Initializing JVM Metrics with processName=NameNode, sessionId=null
>>>>> 2011-07-21 16:31:54,226 INFO
>>>>> org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing
>>>>> NameNodeMeterics using context
>>>>> object:org.apache.hadoop.metrics.spi.NullContext
>>>>> 2011-07-21 16:31:54,280 INFO
>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
>>>>> 2011-07-21 16:31:54,280 INFO
>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
>>>>> 2011-07-21 16:31:54,280 INFO
>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>>>>> isPermissionEnabled=false
>>>>> 2011-07-21 16:31:54,287 INFO
>>>>> org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
>>>>> Initializing FSNamesystemMetrics using context
>>>>> object:org.apache.hadoop.metrics.spi.NullContext
>>>>> 2011-07-21 16:31:54,289 INFO
>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
>>>>> FSNamesystemStatusMBean
>>>>> 2011-07-21 16:31:54,880 INFO
>>>>> org.apache.hadoop.hdfs.server.common.Storage: Number of files = 15817482
>>>>> 2011-07-21 16:34:38,463 INFO
>>>>> org.apache.hadoop.hdfs.server.common.Storage: Number of files under
>>>>> construction = 82
>>>>> 2011-07-21 16:34:41,177 INFO
>>>>> org.apache.hadoop.hdfs.server.common.Storage: Image file of size
>>>>> 2042701824 loaded in 166 seconds.
>>>>> 2011-07-21 16:58:07,624 INFO
>>>>> org.apache.hadoop.hdfs.server.common.Storage: Edits file
>>>>> /home/hadoop/current/edits of size 12751835 edits # 138217 loaded in 1406
>>>>> seconds.
>>>>>
>>>>> And it goes for a long halt. After about an hour it starts working
>>>>> again.
>>>>>
>>>>> My question is when the error "IPC Server Responde" comes and is there
>>>>> a way to deal with it.
>>>>> Also if my Namenode is busy doing something then what is the way to
>>>>> find out what it is doing.
>>>>>
>>>>> Regards,
>>>>> Rahul
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Joseph Echeverria
>>>> Cloudera, Inc.
>>>> 443.305.9434
>>>>
>>>>
>>>
>>
>>
>> --
>> Joseph Echeverria
>> Cloudera, Inc.
>> 443.305.9434
>>
>>
>

Re: Hadoop Namenode problem

Posted by Joey Echeverria <jo...@cloudera.com>.

Nothing from around 1630?

-Joey



On Jul 28, 2011, at 5:06, Rahul Das <ra...@gmail.com> wrote:

> Hi Joey,
> 
> The log is too big to attach into mail. What I found that there is no error during this time. 
> Only few Warnings are coming like
> 
> 2011-07-21 14:13:47,814 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: PendingReplicationMonitor timed out block blk_-6058282241824946206_13375223
> ...
> ...
> 2011-07-21 14:30:49,511 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Inconsistent size for block blk_8615896953045629213_15838442 reported from xx.xx.xx.xx:50010 current size is 1950720 reported size is 2448907
> 
> I think the edit file size was too huge thats why it took long time.
> 
> Regards,
> Rahul
> 
> On Fri, Jul 22, 2011 at 9:33 PM, Joey Echeverria <jo...@cloudera.com> wrote:
> The long startup time after the restart looks like it was caused because the SecondaryNameNode hasn't been able to roll the edits log for some time. Can you post your Namenode log from around the same time in this SecondaryNameNode log (2011-07-21 16:00-16:30)?
> 
> -Joey
> 
> 
> On Fri, Jul 22, 2011 at 8:29 AM, Rahul Das <ra...@gmail.com> wrote:
> Yes I have a secondary Namenode running. Here are the log for SecondaryNamenode
> 
> 2011-07-21 16:02:47,908 INFO org.apache.hadoop.hdfs.server.common.Storage: Edits file /home/hadoop/tmp/dfs/namesecondary/current/edits of size 12751835 edits # 138217 loaded in 1581 seconds.
> 2011-07-21 16:03:21,925 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 2045516451 saved in 29 seconds.
> 2011-07-21 16:03:24,974 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions: 0 Total time for transactions(ms): 0Number of transactions batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0 
> 2011-07-21 16:03:25,545 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Posted URL xx.xx.xx.xx:50070putimage=1&port=50090&machine=xx.xx.xx.xx&token=-18:1554828842:0:1311242583000:1311240481442
> 2011-07-21 16:29:24,356 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint: 
> 2011-07-21 16:29:24,358 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: java.io.IOException: Call to xx.xx.xx.xx:9000 failed on local exception: java.io.IOException: Connection reset by peer
> 
> Regards,
> Rahul
> 
> 
> On Fri, Jul 22, 2011 at 5:40 PM, Joey Echeverria <jo...@cloudera.com> wrote:
> Do you have an instance of the SecondaryNamenode in your cluster?
> 
> -Joey
> 
> 
> On Fri, Jul 22, 2011 at 3:15 AM, Rahul Das <ra...@gmail.com> wrote:
> Hi,
> 
> I am running a Hadoop cluster with 20 Data node. Yesterday I found that the Namenode was not responding ( No write/read to HDFS is happening). It got stuck for few hours, then I shut down the Namenode and found the following error from the Name node log.
> 
> 2011-07-21 16:15:31,500 WARN org.apache.hadoop.ipc.Server: IPC Server Responder, call getProtocolVersion(org.apache.hadoop.hdfs.protocol.ClientProtocol, 41) from xx.xx.xx.xx:13568: output error
> 
> This error was coming for every data node and data nodes are not able to communicate with the Name node
> 
> After I restart the Namenode
> 
> 2011-07-21 16:31:54,110 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
> 2011-07-21 16:31:54,216 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=9000
> 2011-07-21 16:31:54,223 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: xx.xx.xx.xx:9000
> 2011-07-21 16:31:54,225 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null
> 2011-07-21 16:31:54,226 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext
> 2011-07-21 16:31:54,280 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
> 2011-07-21 16:31:54,280 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
> 2011-07-21 16:31:54,280 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=false
> 2011-07-21 16:31:54,287 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext
> 2011-07-21 16:31:54,289 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean
> 2011-07-21 16:31:54,880 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 15817482
> 2011-07-21 16:34:38,463 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 82
> 2011-07-21 16:34:41,177 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 2042701824 loaded in 166 seconds.
> 2011-07-21 16:58:07,624 INFO org.apache.hadoop.hdfs.server.common.Storage: Edits file /home/hadoop/current/edits of size 12751835 edits # 138217 loaded in 1406 seconds.
> 
> And it goes for a long halt. After about an hour it starts working again.
> 
> My question is when the error "IPC Server Responde" comes and is there a way to deal with it.
> Also if my Namenode is busy doing something then what is the way to find out what it is doing.
> 
> Regards,
> Rahul
> 
> 
> 
> -- 
> Joseph Echeverria
> Cloudera, Inc.
> 443.305.9434
> 
> 
> 
> 
> 
> -- 
> Joseph Echeverria
> Cloudera, Inc.
> 443.305.9434
> 
>

Re: Hadoop Namenode problem

Posted by Rahul Das <ra...@gmail.com>.

Hi Joey,

The log is too big to attach into mail. What I found that there is no error
during this time.
Only few Warnings are coming like

2011-07-21 14:13:47,814 WARN
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
PendingReplicationMonitor timed out block blk_-6058282241824946206_13375223
...
...
2011-07-21 14:30:49,511 WARN
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Inconsistent size for
block blk_8615896953045629213_15838442 reported from xx.xx.xx.xx:50010
current size is 1950720 reported size is 2448907

I think the edit file size was too huge thats why it took long time.

Regards,
Rahul

On Fri, Jul 22, 2011 at 9:33 PM, Joey Echeverria <jo...@cloudera.com> wrote:

> The long startup time after the restart looks like it was caused because
> the SecondaryNameNode hasn't been able to roll the edits log for some time.
> Can you post your Namenode log from around the same time in this
> SecondaryNameNode log (2011-07-21 16:00-16:30)?
>
> -Joey
>
>
> On Fri, Jul 22, 2011 at 8:29 AM, Rahul Das <ra...@gmail.com> wrote:
>
>> Yes I have a secondary Namenode running. Here are the log for
>> SecondaryNamenode
>>
>> 2011-07-21 16:02:47,908 INFO org.apache.hadoop.hdfs.server.common.Storage:
>> Edits file /home/hadoop/tmp/dfs/namesecondary/current/edits of size 12751835
>> edits # 138217 loaded in 1581 seconds.
>> 2011-07-21 16:03:21,925 INFO org.apache.hadoop.hdfs.server.common.Storage:
>> Image file of size 2045516451 saved in 29 seconds.
>> 2011-07-21 16:03:24,974 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions:
>> 0 Total time for transactions(ms): 0Number of transactions batched in Syncs:
>> 0 Number of syncs: 0 SyncTimes(ms): 0
>> 2011-07-21 16:03:25,545 INFO
>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Posted URL
>> xx.xx.xx.xx:50070putimage=1&port=50090&machine=xx.xx.xx.xx&token=-18:1554828842:0:1311242583000:1311240481442
>> 2011-07-21 16:29:24,356 ERROR
>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in
>> doCheckpoint:
>> 2011-07-21 16:29:24,358 ERROR
>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:
>> java.io.IOException: Call to xx.xx.xx.xx:9000 failed on local exception:
>> java.io.IOException: Connection reset by peer
>>
>> Regards,
>> Rahul
>>
>>
>> On Fri, Jul 22, 2011 at 5:40 PM, Joey Echeverria <jo...@cloudera.com>wrote:
>>
>>> Do you have an instance of the SecondaryNamenode in your cluster?
>>>
>>> -Joey
>>>
>>>
>>> On Fri, Jul 22, 2011 at 3:15 AM, Rahul Das <ra...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am running a Hadoop cluster with 20 Data node. Yesterday I found that
>>>> the Namenode was not responding ( No write/read to HDFS is happening). It
>>>> got stuck for few hours, then I shut down the Namenode and found the
>>>> following error from the Name node log.
>>>>
>>>> 2011-07-21 16:15:31,500 WARN org.apache.hadoop.ipc.Server: IPC Server
>>>> Responder, call
>>>> getProtocolVersion(org.apache.hadoop.hdfs.protocol.ClientProtocol, 41) from
>>>> xx.xx.xx.xx:13568: output error
>>>>
>>>> This error was coming for every data node and data nodes are not able to
>>>> communicate with the Name node
>>>>
>>>> After I restart the Namenode
>>>>
>>>> 2011-07-21 16:31:54,110 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
>>>> 2011-07-21 16:31:54,216 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
>>>> Initializing RPC Metrics with hostName=NameNode, port=9000
>>>> 2011-07-21 16:31:54,223 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
>>>> xx.xx.xx.xx:9000
>>>> 2011-07-21 16:31:54,225 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
>>>> Initializing JVM Metrics with processName=NameNode, sessionId=null
>>>> 2011-07-21 16:31:54,226 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing
>>>> NameNodeMeterics using context
>>>> object:org.apache.hadoop.metrics.spi.NullContext
>>>> 2011-07-21 16:31:54,280 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
>>>> 2011-07-21 16:31:54,280 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
>>>> 2011-07-21 16:31:54,280 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>>>> isPermissionEnabled=false
>>>> 2011-07-21 16:31:54,287 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
>>>> Initializing FSNamesystemMetrics using context
>>>> object:org.apache.hadoop.metrics.spi.NullContext
>>>> 2011-07-21 16:31:54,289 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
>>>> FSNamesystemStatusMBean
>>>> 2011-07-21 16:31:54,880 INFO
>>>> org.apache.hadoop.hdfs.server.common.Storage: Number of files = 15817482
>>>> 2011-07-21 16:34:38,463 INFO
>>>> org.apache.hadoop.hdfs.server.common.Storage: Number of files under
>>>> construction = 82
>>>> 2011-07-21 16:34:41,177 INFO
>>>> org.apache.hadoop.hdfs.server.common.Storage: Image file of size
>>>> 2042701824 loaded in 166 seconds.
>>>> 2011-07-21 16:58:07,624 INFO
>>>> org.apache.hadoop.hdfs.server.common.Storage: Edits file
>>>> /home/hadoop/current/edits of size 12751835 edits # 138217 loaded in 1406
>>>> seconds.
>>>>
>>>> And it goes for a long halt. After about an hour it starts working
>>>> again.
>>>>
>>>> My question is when the error "IPC Server Responde" comes and is there a
>>>> way to deal with it.
>>>> Also if my Namenode is busy doing something then what is the way to find
>>>> out what it is doing.
>>>>
>>>> Regards,
>>>> Rahul
>>>
>>>
>>>
>>>
>>> --
>>> Joseph Echeverria
>>> Cloudera, Inc.
>>> 443.305.9434
>>>
>>>
>>
>
>
> --
> Joseph Echeverria
> Cloudera, Inc.
> 443.305.9434
>
>

Re: Hadoop Namenode problem

Posted by Joey Echeverria <jo...@cloudera.com>.

The long startup time after the restart looks like it was caused because the
SecondaryNameNode hasn't been able to roll the edits log for some time. Can
you post your Namenode log from around the same time in this
SecondaryNameNode log (2011-07-21 16:00-16:30)?

-Joey

On Fri, Jul 22, 2011 at 8:29 AM, Rahul Das <ra...@gmail.com> wrote:

> Yes I have a secondary Namenode running. Here are the log for
> SecondaryNamenode
>
> 2011-07-21 16:02:47,908 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Edits file /home/hadoop/tmp/dfs/namesecondary/current/edits of size 12751835
> edits # 138217 loaded in 1581 seconds.
> 2011-07-21 16:03:21,925 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Image file of size 2045516451 saved in 29 seconds.
> 2011-07-21 16:03:24,974 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions:
> 0 Total time for transactions(ms): 0Number of transactions batched in Syncs:
> 0 Number of syncs: 0 SyncTimes(ms): 0
> 2011-07-21 16:03:25,545 INFO
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Posted URL
> xx.xx.xx.xx:50070putimage=1&port=50090&machine=xx.xx.xx.xx&token=-18:1554828842:0:1311242583000:1311240481442
> 2011-07-21 16:29:24,356 ERROR
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in
> doCheckpoint:
> 2011-07-21 16:29:24,358 ERROR
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:
> java.io.IOException: Call to xx.xx.xx.xx:9000 failed on local exception:
> java.io.IOException: Connection reset by peer
>
> Regards,
> Rahul
>
>
> On Fri, Jul 22, 2011 at 5:40 PM, Joey Echeverria <jo...@cloudera.com>wrote:
>
>> Do you have an instance of the SecondaryNamenode in your cluster?
>>
>> -Joey
>>
>>
>> On Fri, Jul 22, 2011 at 3:15 AM, Rahul Das <ra...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I am running a Hadoop cluster with 20 Data node. Yesterday I found that
>>> the Namenode was not responding ( No write/read to HDFS is happening). It
>>> got stuck for few hours, then I shut down the Namenode and found the
>>> following error from the Name node log.
>>>
>>> 2011-07-21 16:15:31,500 WARN org.apache.hadoop.ipc.Server: IPC Server
>>> Responder, call
>>> getProtocolVersion(org.apache.hadoop.hdfs.protocol.ClientProtocol, 41) from
>>> xx.xx.xx.xx:13568: output error
>>>
>>> This error was coming for every data node and data nodes are not able to
>>> communicate with the Name node
>>>
>>> After I restart the Namenode
>>>
>>> 2011-07-21 16:31:54,110 INFO
>>> org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
>>> 2011-07-21 16:31:54,216 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
>>> Initializing RPC Metrics with hostName=NameNode, port=9000
>>> 2011-07-21 16:31:54,223 INFO
>>> org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
>>> xx.xx.xx.xx:9000
>>> 2011-07-21 16:31:54,225 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
>>> Initializing JVM Metrics with processName=NameNode, sessionId=null
>>> 2011-07-21 16:31:54,226 INFO
>>> org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing
>>> NameNodeMeterics using context
>>> object:org.apache.hadoop.metrics.spi.NullContext
>>> 2011-07-21 16:31:54,280 INFO
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
>>> 2011-07-21 16:31:54,280 INFO
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
>>> 2011-07-21 16:31:54,280 INFO
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>>> isPermissionEnabled=false
>>> 2011-07-21 16:31:54,287 INFO
>>> org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
>>> Initializing FSNamesystemMetrics using context
>>> object:org.apache.hadoop.metrics.spi.NullContext
>>> 2011-07-21 16:31:54,289 INFO
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
>>> FSNamesystemStatusMBean
>>> 2011-07-21 16:31:54,880 INFO
>>> org.apache.hadoop.hdfs.server.common.Storage: Number of files = 15817482
>>> 2011-07-21 16:34:38,463 INFO
>>> org.apache.hadoop.hdfs.server.common.Storage: Number of files under
>>> construction = 82
>>> 2011-07-21 16:34:41,177 INFO
>>> org.apache.hadoop.hdfs.server.common.Storage: Image file of size
>>> 2042701824 loaded in 166 seconds.
>>> 2011-07-21 16:58:07,624 INFO
>>> org.apache.hadoop.hdfs.server.common.Storage: Edits file
>>> /home/hadoop/current/edits of size 12751835 edits # 138217 loaded in 1406
>>> seconds.
>>>
>>> And it goes for a long halt. After about an hour it starts working again.
>>>
>>> My question is when the error "IPC Server Responde" comes and is there a
>>> way to deal with it.
>>> Also if my Namenode is busy doing something then what is the way to find
>>> out what it is doing.
>>>
>>> Regards,
>>> Rahul
>>
>>
>>
>>
>> --
>> Joseph Echeverria
>> Cloudera, Inc.
>> 443.305.9434
>>
>>
>


-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Re: Hadoop Namenode problem

Posted by Rahul Das <ra...@gmail.com>.

Yes I have a secondary Namenode running. Here are the log for
SecondaryNamenode

2011-07-21 16:02:47,908 INFO org.apache.hadoop.hdfs.server.common.Storage:
Edits file /home/hadoop/tmp/dfs/namesecondary/current/edits of size 12751835
edits # 138217 loaded in 1581 seconds.
2011-07-21 16:03:21,925 INFO org.apache.hadoop.hdfs.server.common.Storage:
Image file of size 2045516451 saved in 29 seconds.
2011-07-21 16:03:24,974 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions:
0 Total time for transactions(ms): 0Number of transactions batched in Syncs:
0 Number of syncs: 0 SyncTimes(ms): 0
2011-07-21 16:03:25,545 INFO
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Posted URL
xx.xx.xx.xx:50070putimage=1&port=50090&machine=xx.xx.xx.xx&token=-18:1554828842:0:1311242583000:1311240481442
2011-07-21 16:29:24,356 ERROR
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in
doCheckpoint:
2011-07-21 16:29:24,358 ERROR
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:
java.io.IOException: Call to xx.xx.xx.xx:9000 failed on local exception:
java.io.IOException: Connection reset by peer

Regards,
Rahul

On Fri, Jul 22, 2011 at 5:40 PM, Joey Echeverria <jo...@cloudera.com> wrote:

> Do you have an instance of the SecondaryNamenode in your cluster?
>
> -Joey
>
>
> On Fri, Jul 22, 2011 at 3:15 AM, Rahul Das <ra...@gmail.com> wrote:
>
>> Hi,
>>
>> I am running a Hadoop cluster with 20 Data node. Yesterday I found that
>> the Namenode was not responding ( No write/read to HDFS is happening). It
>> got stuck for few hours, then I shut down the Namenode and found the
>> following error from the Name node log.
>>
>> 2011-07-21 16:15:31,500 WARN org.apache.hadoop.ipc.Server: IPC Server
>> Responder, call
>> getProtocolVersion(org.apache.hadoop.hdfs.protocol.ClientProtocol, 41) from
>> xx.xx.xx.xx:13568: output error
>>
>> This error was coming for every data node and data nodes are not able to
>> communicate with the Name node
>>
>> After I restart the Namenode
>>
>> 2011-07-21 16:31:54,110 INFO
>> org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
>> 2011-07-21 16:31:54,216 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
>> Initializing RPC Metrics with hostName=NameNode, port=9000
>> 2011-07-21 16:31:54,223 INFO
>> org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
>> xx.xx.xx.xx:9000
>> 2011-07-21 16:31:54,225 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
>> Initializing JVM Metrics with processName=NameNode, sessionId=null
>> 2011-07-21 16:31:54,226 INFO
>> org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing
>> NameNodeMeterics using context
>> object:org.apache.hadoop.metrics.spi.NullContext
>> 2011-07-21 16:31:54,280 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
>> 2011-07-21 16:31:54,280 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
>> 2011-07-21 16:31:54,280 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>> isPermissionEnabled=false
>> 2011-07-21 16:31:54,287 INFO
>> org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
>> Initializing FSNamesystemMetrics using context
>> object:org.apache.hadoop.metrics.spi.NullContext
>> 2011-07-21 16:31:54,289 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
>> FSNamesystemStatusMBean
>> 2011-07-21 16:31:54,880 INFO org.apache.hadoop.hdfs.server.common.Storage:
>> Number of files = 15817482
>> 2011-07-21 16:34:38,463 INFO org.apache.hadoop.hdfs.server.common.Storage:
>> Number of files under construction = 82
>> 2011-07-21 16:34:41,177 INFO org.apache.hadoop.hdfs.server.common.Storage:
>> Image file of size 2042701824 loaded in 166 seconds.
>> 2011-07-21 16:58:07,624 INFO org.apache.hadoop.hdfs.server.common.Storage:
>> Edits file /home/hadoop/current/edits of size 12751835 edits # 138217 loaded
>> in 1406 seconds.
>>
>> And it goes for a long halt. After about an hour it starts working again.
>>
>> My question is when the error "IPC Server Responde" comes and is there a
>> way to deal with it.
>> Also if my Namenode is busy doing something then what is the way to find
>> out what it is doing.
>>
>> Regards,
>> Rahul
>
>
>
>
> --
> Joseph Echeverria
> Cloudera, Inc.
> 443.305.9434
>
>

Re: Obtain the filename that is procesed by Map class when CombineFileInputFormat is used

Posted by Harsh J <ha...@cloudera.com>.

Florin,

I believe you answered yourself accidentally?

On Thu, Jul 28, 2011 at 4:10 PM, Florin P <fl...@yahoo.com> wrote:
> --- On Fri, 7/22/11, Florin P <fl...@yahoo.com> wrote:
>
> From: Florin P <fl...@yahoo.com>
> Subject: Obtain the filename that is procesed by Map class when CombineFileInputFormat is used
> To: hdfs-user@hadoop.apache.org
> Date: Friday, July 22, 2011, 8:34 AM
>
> Hello!
>   I would like to ask you, how can you obtain the filenames that is processed by Map class when CombineFileInputFormat is used?
>    As far as I know when using CombineFileInputFormat, multiple files will be processed by the same mapper. In my case, I would like to know how to obtain these file names.

Depending on your you have implemented your per-FileSplit record
readers in the CFIP, you can set "map.input.file" in the Configuration
instance in each of its initialization. This is somewhat self managed
here since several record readers may be initialized. Let me know if
you would like to see a simple example along as well.

--
Harsh J

Re: Obtain the filename that is procesed by Map class when CombineFileInputFormat is used

Posted by Florin P <fl...@yahoo.com>.

Hello!
 In the Hadoop 0.20, you'll do the following:
In the mapper class
1. create a field "job" of type JobConf
2. in the "configure" method of the mapper class 
      initialize the job with the received argument
3. In the map function you'll get the processed file name by using  the property map.input.file (example job.get("map.input.file"))

I hope that this help.
  Regards,
 Florin


--- On Fri, 7/22/11, Florin P <fl...@yahoo.com> wrote:

From: Florin P <fl...@yahoo.com>
Subject: Obtain the filename that is procesed by Map class when CombineFileInputFormat is used
To: hdfs-user@hadoop.apache.org
Date: Friday, July 22, 2011, 8:34 AM

Hello!
  I would like to ask you, how can you obtain the filenames that is processed by Map class when CombineFileInputFormat is used?
   As far as I know when using  CombineFileInputFormat, multiple files will be processed by the same mapper. In my case, I would like to know how to obtain these file names.

I look forward for your answers. Thank you.
  Regards,
  Florin

Obtain the filename that is procesed by Map class when CombineFileInputFormat is used

Posted by Florin P <fl...@yahoo.com>.

Hello!
  I would like to ask you, how can you obtain the filenames that is processed by Map class when CombineFileInputFormat is used?
   As far as I know when using  CombineFileInputFormat, multiple files will be processed by the same mapper. In my case, I would like to know how to obtain these file names.

I look forward for your answers. Thank you.
  Regards,
  Florin

Re: Hadoop Namenode problem

Posted by Joey Echeverria <jo...@cloudera.com>.

Do you have an instance of the SecondaryNamenode in your cluster?

-Joey

On Fri, Jul 22, 2011 at 3:15 AM, Rahul Das <ra...@gmail.com> wrote:

> Hi,
>
> I am running a Hadoop cluster with 20 Data node. Yesterday I found that the
> Namenode was not responding ( No write/read to HDFS is happening). It got
> stuck for few hours, then I shut down the Namenode and found the following
> error from the Name node log.
>
> 2011-07-21 16:15:31,500 WARN org.apache.hadoop.ipc.Server: IPC Server
> Responder, call
> getProtocolVersion(org.apache.hadoop.hdfs.protocol.ClientProtocol, 41) from
> xx.xx.xx.xx:13568: output error
>
> This error was coming for every data node and data nodes are not able to
> communicate with the Name node
>
> After I restart the Namenode
>
> 2011-07-21 16:31:54,110 INFO
> org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
> 2011-07-21 16:31:54,216 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
> Initializing RPC Metrics with hostName=NameNode, port=9000
> 2011-07-21 16:31:54,223 INFO
> org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
> xx.xx.xx.xx:9000
> 2011-07-21 16:31:54,225 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
> Initializing JVM Metrics with processName=NameNode, sessionId=null
> 2011-07-21 16:31:54,226 INFO
> org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing
> NameNodeMeterics using context
> object:org.apache.hadoop.metrics.spi.NullContext
> 2011-07-21 16:31:54,280 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
> 2011-07-21 16:31:54,280 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
> 2011-07-21 16:31:54,280 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> isPermissionEnabled=false
> 2011-07-21 16:31:54,287 INFO
> org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
> Initializing FSNamesystemMetrics using context
> object:org.apache.hadoop.metrics.spi.NullContext
> 2011-07-21 16:31:54,289 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
> FSNamesystemStatusMBean
> 2011-07-21 16:31:54,880 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Number of files = 15817482
> 2011-07-21 16:34:38,463 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Number of files under construction = 82
> 2011-07-21 16:34:41,177 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Image file of size 2042701824 loaded in 166 seconds.
> 2011-07-21 16:58:07,624 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Edits file /home/hadoop/current/edits of size 12751835 edits # 138217 loaded
> in 1406 seconds.
>
> And it goes for a long halt. After about an hour it starts working again.
>
> My question is when the error "IPC Server Responde" comes and is there a
> way to deal with it.
> Also if my Namenode is busy doing something then what is the way to find
> out what it is doing.
>
> Regards,
> Rahul




-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434