You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by sam liu <sa...@gmail.com> on 2013/05/13 04:28:56 UTC

The minimum memory requirements to datanode and namenode?

Hi,

I setup a cluster with 3 nodes, and after that I did not submit any job on
it. But, after few days, I found the cluster is unhealthy:
- No result returned after issuing command 'hadoop dfs -ls /' or 'hadoop
dfsadmin -report' for a while
- The page of 'http://namenode:50070' could not be opened as expected...
- ...

I did not find any usefull info in the logs, but found the avaible memory
of the cluster nodes are very low at that time:
- node1(NN,JT,DN,TT): 158 mb mem is available
- node2(DN,TT): 75 mb mem is available
- node3(DN,TT): 174 mb mem is available

I guess the issue of my cluster is caused by lacking of memeory, and my
questions are:
- Without running jobs, what's the minimum memory requirements to datanode
and namenode?
- How to define the minimum memeory for datanode and namenode?

Thanks!

Sam Liu

Re: The minimum memory requirements to datanode and namenode?

Posted by shashwat shriparv <dw...@gmail.com>.

Due to Small amount of memory available to the nodes they are not able to
send response in time, and socket connection exception, and there may be
some network issue to.

Please check which program is using memory? as there will be some other
cohosted application eating up the memory.

ps -e -orss=,args= | sort -b -k1,1n | pr -TW$COLUMNS

or

give top command then press shift+M
anc then c
and check application is eating up the memory.

there must be apmple memory available to the nodes beside the reserved for
JVM

*Thanks & Regards    *

∞
Shashwat Shriparv



On Mon, May 13, 2013 at 12:23 PM, Nitin Pawar <ni...@gmail.com>wrote:

> 4GB memory on NN? this will run out of memory in few days.
>
> You will need to make sure your NN has atleast more than double RAM of
> your DNs if you have a miniature  cluster.
>
>
> On Mon, May 13, 2013 at 11:52 AM, sam liu <sa...@gmail.com> wrote:
>
>> I can issue a command 'hadoop dfsadmin -report', but it did not return
>> any result for a long time. Also, I can open the NN UI(
>> http://namenode:50070), but it is always keeping in the connecting
>> status, and could not return any cluster statistic.
>>
>> The mem of NN:
>>                   total       used       free
>> Mem:          3834       3686        148
>>
>> After running a top command, I can see following process are taking up
>> the memory: namenode, jobtracker, tasktracker, hbase, ...
>>
>> I can restart the cluster, and then the cluster will be healthy. But this
>> issue will probably occur in a few days later. I think it's caused by
>> lacking of free/available mem, but do not know how many extra
>> free/available mem of node is required, besides the necessary mem for
>> running datanode/tasktracker process?
>>
>>
>>
>>
>> 2013/5/13 Nitin Pawar <ni...@gmail.com>
>>
>>> just one node not having memory does not mean your cluster is down.
>>>
>>> Can you see your hdfs health on NN UI?
>>>
>>> how much memory do you have on NN? if there are no jobs running on the
>>> cluster then you can safely restart datanode and tasktracker.
>>>
>>> Also run a top command and figure out which processes are taking up the
>>> memory and for what purpose?
>>>
>>>
>>> On Mon, May 13, 2013 at 11:28 AM, sam liu <sa...@gmail.com>wrote:
>>>
>>>> Nitin,
>>>>
>>>> In my cluster, the tasktracker and datanode already have been launched,
>>>> and are still running now. But the free/available mem of node3 now is just
>>>> 167 mb, and do you think it's the reason why my hadoop is unhealthy now(it
>>>> does not return result of command 'hadoop dfs -ls /')?
>>>>
>>>>
>>>> 2013/5/13 Nitin Pawar <ni...@gmail.com>
>>>>
>>>>> Sam,
>>>>>
>>>>> There is no formula for determining how much memory one should give to
>>>>> datanode and tasktracker. Ther formula is available for how many slots you
>>>>> want to have on a machine.
>>>>>
>>>>> In my prior experience, we did give 512MB memory each to a datanode
>>>>> and tasktracker.
>>>>>
>>>>>
>>>>> On Mon, May 13, 2013 at 11:18 AM, sam liu <sa...@gmail.com>wrote:
>>>>>
>>>>>> For node3, the memory is:
>>>>>>                    total       used       free     shared
>>>>>> buffers     cached
>>>>>> Mem:          3834       3666        167          0        187
>>>>>> 1136
>>>>>> -/+ buffers/cache:       2342       1491
>>>>>> Swap:         8196          0       8196
>>>>>>
>>>>>> To a 3 nodes cluster as mine, what's the required minimum
>>>>>> free/available memory for the datanode process and tasktracker process,
>>>>>> without running any map/reduce task?
>>>>>> Any formula to determine it?
>>>>>>
>>>>>>
>>>>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>>>>>
>>>>>>> can you tell specs of node3. Even on a test/demo cluster, anything
>>>>>>> below 4 GB ram makes the node almost inaccessible as per my experience.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sun, May 12, 2013 at 8:25 PM, sam liu <sa...@gmail.com>wrote:
>>>>>>>
>>>>>>>> Got some exceptions on node3:
>>>>>>>> 1. datanode log:
>>>>>>>> 2013-04-17 11:13:44,719 INFO
>>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
>>>>>>>> blk_2478755809192724446_1477 received exception
>>>>>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
>>>>>>>> channel to be ready for read. ch :
>>>>>>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>>>>>>> 9.50.102.79:50010]
>>>>>>>> 2013-04-17 11:13:44,721 ERROR
>>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>>>>>>>> 9.50.102.80:50010,
>>>>>>>> storageID=DS-2038715921-9.50.102.80-50010-1366091297051, infoPort=50075,
>>>>>>>> ipcPort=50020):DataXceiver
>>>>>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting
>>>>>>>> for channel to be ready for read. ch :
>>>>>>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>>>>>>> 9.50.102.79:50010]
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
>>>>>>>>         at
>>>>>>>> java.io.DataInputStream.readShort(DataInputStream.java:306)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:359)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
>>>>>>>>         at java.lang.Thread.run(Thread.java:738)
>>>>>>>> 2013-04-17 11:13:44,818 INFO
>>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>>>>>>>> blk_8413378381769505032_1477 src: /9.50.102.81:35279 dest: /
>>>>>>>> 9.50.102.80:50010
>>>>>>>>
>>>>>>>>
>>>>>>>> 2. tasktracker log:
>>>>>>>> 2013-04-23 11:48:26,783 INFO
>>>>>>>> org.apache.hadoop.mapred.UserLogCleaner: Deleting user log path
>>>>>>>> job_201304152248_0011
>>>>>>>> 2013-04-30 14:48:15,506 ERROR org.apache.hadoop.mapred.TaskTracker:
>>>>>>>> Caught exception: java.io.IOException: Call to node1/
>>>>>>>> 9.50.102.81:9001 failed on local exception: java.io.IOException:
>>>>>>>> Connection reset by peer
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
>>>>>>>>         at org.apache.hadoop.ipc.Client.call(Client.java:1112)
>>>>>>>>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
>>>>>>>>         at org.apache.hadoop.mapred.$Proxy2.heartbeat(Unknown
>>>>>>>> Source)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:2008)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1802)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2654)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3909)
>>>>>>>> Caused by: java.io.IOException: Connection reset by peer
>>>>>>>>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>>>>>>>>         at
>>>>>>>> sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:33)
>>>>>>>>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:210)
>>>>>>>>         at sun.nio.ch.IOUtil.read(IOUtil.java:183)
>>>>>>>>         at
>>>>>>>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:257)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>>>>>         at
>>>>>>>> java.io.FilterInputStream.read(FilterInputStream.java:127)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:361)
>>>>>>>>         at
>>>>>>>> java.io.BufferedInputStream.fill(BufferedInputStream.java:229)
>>>>>>>>         at
>>>>>>>> java.io.BufferedInputStream.read(BufferedInputStream.java:248)
>>>>>>>>         at java.io.DataInputStream.readInt(DataInputStream.java:381)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.ipc.Client$Connection.run(Client.java:786)
>>>>>>>>
>>>>>>>> 2013-04-30 14:48:15,517 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>>>> Resending 'status' to 'node1' with reponseId '-12904
>>>>>>>> 2013-04-30 14:48:16,404 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>>>> SHUTDOWN_MSG:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>>>>>>>
>>>>>>>>> do you get any error when trying to connect to cluster, something
>>>>>>>>> like 'tried n times' or replicated 0 times.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sun, May 12, 2013 at 7:28 PM, sam liu <sa...@gmail.com>wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I setup a cluster with 3 nodes, and after that I did not submit
>>>>>>>>>> any job on it. But, after few days, I found the cluster is unhealthy:
>>>>>>>>>> - No result returned after issuing command 'hadoop dfs -ls /' or
>>>>>>>>>> 'hadoop dfsadmin -report' for a while
>>>>>>>>>> - The page of 'http://namenode:50070' could not be opened as
>>>>>>>>>> expected...
>>>>>>>>>> - ...
>>>>>>>>>>
>>>>>>>>>> I did not find any usefull info in the logs, but found the
>>>>>>>>>> avaible memory of the cluster nodes are very low at that time:
>>>>>>>>>> - node1(NN,JT,DN,TT): 158 mb mem is available
>>>>>>>>>> - node2(DN,TT): 75 mb mem is available
>>>>>>>>>> - node3(DN,TT): 174 mb mem is available
>>>>>>>>>>
>>>>>>>>>> I guess the issue of my cluster is caused by lacking of memeory,
>>>>>>>>>> and my questions are:
>>>>>>>>>> - Without running jobs, what's the minimum memory requirements to
>>>>>>>>>> datanode and namenode?
>>>>>>>>>> - How to define the minimum memeory for datanode and namenode?
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>>
>>>>>>>>>> Sam Liu
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Nitin Pawar
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>
>
> --
> Nitin Pawar
>

Re: The minimum memory requirements to datanode and namenode?

Posted by shashwat shriparv <dw...@gmail.com>.

Due to Small amount of memory available to the nodes they are not able to
send response in time, and socket connection exception, and there may be
some network issue to.

Please check which program is using memory? as there will be some other
cohosted application eating up the memory.

ps -e -orss=,args= | sort -b -k1,1n | pr -TW$COLUMNS

or

give top command then press shift+M
anc then c
and check application is eating up the memory.

there must be apmple memory available to the nodes beside the reserved for
JVM

*Thanks & Regards    *

∞
Shashwat Shriparv



On Mon, May 13, 2013 at 12:23 PM, Nitin Pawar <ni...@gmail.com>wrote:

> 4GB memory on NN? this will run out of memory in few days.
>
> You will need to make sure your NN has atleast more than double RAM of
> your DNs if you have a miniature  cluster.
>
>
> On Mon, May 13, 2013 at 11:52 AM, sam liu <sa...@gmail.com> wrote:
>
>> I can issue a command 'hadoop dfsadmin -report', but it did not return
>> any result for a long time. Also, I can open the NN UI(
>> http://namenode:50070), but it is always keeping in the connecting
>> status, and could not return any cluster statistic.
>>
>> The mem of NN:
>>                   total       used       free
>> Mem:          3834       3686        148
>>
>> After running a top command, I can see following process are taking up
>> the memory: namenode, jobtracker, tasktracker, hbase, ...
>>
>> I can restart the cluster, and then the cluster will be healthy. But this
>> issue will probably occur in a few days later. I think it's caused by
>> lacking of free/available mem, but do not know how many extra
>> free/available mem of node is required, besides the necessary mem for
>> running datanode/tasktracker process?
>>
>>
>>
>>
>> 2013/5/13 Nitin Pawar <ni...@gmail.com>
>>
>>> just one node not having memory does not mean your cluster is down.
>>>
>>> Can you see your hdfs health on NN UI?
>>>
>>> how much memory do you have on NN? if there are no jobs running on the
>>> cluster then you can safely restart datanode and tasktracker.
>>>
>>> Also run a top command and figure out which processes are taking up the
>>> memory and for what purpose?
>>>
>>>
>>> On Mon, May 13, 2013 at 11:28 AM, sam liu <sa...@gmail.com>wrote:
>>>
>>>> Nitin,
>>>>
>>>> In my cluster, the tasktracker and datanode already have been launched,
>>>> and are still running now. But the free/available mem of node3 now is just
>>>> 167 mb, and do you think it's the reason why my hadoop is unhealthy now(it
>>>> does not return result of command 'hadoop dfs -ls /')?
>>>>
>>>>
>>>> 2013/5/13 Nitin Pawar <ni...@gmail.com>
>>>>
>>>>> Sam,
>>>>>
>>>>> There is no formula for determining how much memory one should give to
>>>>> datanode and tasktracker. Ther formula is available for how many slots you
>>>>> want to have on a machine.
>>>>>
>>>>> In my prior experience, we did give 512MB memory each to a datanode
>>>>> and tasktracker.
>>>>>
>>>>>
>>>>> On Mon, May 13, 2013 at 11:18 AM, sam liu <sa...@gmail.com>wrote:
>>>>>
>>>>>> For node3, the memory is:
>>>>>>                    total       used       free     shared
>>>>>> buffers     cached
>>>>>> Mem:          3834       3666        167          0        187
>>>>>> 1136
>>>>>> -/+ buffers/cache:       2342       1491
>>>>>> Swap:         8196          0       8196
>>>>>>
>>>>>> To a 3 nodes cluster as mine, what's the required minimum
>>>>>> free/available memory for the datanode process and tasktracker process,
>>>>>> without running any map/reduce task?
>>>>>> Any formula to determine it?
>>>>>>
>>>>>>
>>>>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>>>>>
>>>>>>> can you tell specs of node3. Even on a test/demo cluster, anything
>>>>>>> below 4 GB ram makes the node almost inaccessible as per my experience.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sun, May 12, 2013 at 8:25 PM, sam liu <sa...@gmail.com>wrote:
>>>>>>>
>>>>>>>> Got some exceptions on node3:
>>>>>>>> 1. datanode log:
>>>>>>>> 2013-04-17 11:13:44,719 INFO
>>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
>>>>>>>> blk_2478755809192724446_1477 received exception
>>>>>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
>>>>>>>> channel to be ready for read. ch :
>>>>>>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>>>>>>> 9.50.102.79:50010]
>>>>>>>> 2013-04-17 11:13:44,721 ERROR
>>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>>>>>>>> 9.50.102.80:50010,
>>>>>>>> storageID=DS-2038715921-9.50.102.80-50010-1366091297051, infoPort=50075,
>>>>>>>> ipcPort=50020):DataXceiver
>>>>>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting
>>>>>>>> for channel to be ready for read. ch :
>>>>>>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>>>>>>> 9.50.102.79:50010]
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
>>>>>>>>         at
>>>>>>>> java.io.DataInputStream.readShort(DataInputStream.java:306)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:359)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
>>>>>>>>         at java.lang.Thread.run(Thread.java:738)
>>>>>>>> 2013-04-17 11:13:44,818 INFO
>>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>>>>>>>> blk_8413378381769505032_1477 src: /9.50.102.81:35279 dest: /
>>>>>>>> 9.50.102.80:50010
>>>>>>>>
>>>>>>>>
>>>>>>>> 2. tasktracker log:
>>>>>>>> 2013-04-23 11:48:26,783 INFO
>>>>>>>> org.apache.hadoop.mapred.UserLogCleaner: Deleting user log path
>>>>>>>> job_201304152248_0011
>>>>>>>> 2013-04-30 14:48:15,506 ERROR org.apache.hadoop.mapred.TaskTracker:
>>>>>>>> Caught exception: java.io.IOException: Call to node1/
>>>>>>>> 9.50.102.81:9001 failed on local exception: java.io.IOException:
>>>>>>>> Connection reset by peer
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
>>>>>>>>         at org.apache.hadoop.ipc.Client.call(Client.java:1112)
>>>>>>>>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
>>>>>>>>         at org.apache.hadoop.mapred.$Proxy2.heartbeat(Unknown
>>>>>>>> Source)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:2008)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1802)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2654)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3909)
>>>>>>>> Caused by: java.io.IOException: Connection reset by peer
>>>>>>>>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>>>>>>>>         at
>>>>>>>> sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:33)
>>>>>>>>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:210)
>>>>>>>>         at sun.nio.ch.IOUtil.read(IOUtil.java:183)
>>>>>>>>         at
>>>>>>>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:257)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>>>>>         at
>>>>>>>> java.io.FilterInputStream.read(FilterInputStream.java:127)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:361)
>>>>>>>>         at
>>>>>>>> java.io.BufferedInputStream.fill(BufferedInputStream.java:229)
>>>>>>>>         at
>>>>>>>> java.io.BufferedInputStream.read(BufferedInputStream.java:248)
>>>>>>>>         at java.io.DataInputStream.readInt(DataInputStream.java:381)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.ipc.Client$Connection.run(Client.java:786)
>>>>>>>>
>>>>>>>> 2013-04-30 14:48:15,517 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>>>> Resending 'status' to 'node1' with reponseId '-12904
>>>>>>>> 2013-04-30 14:48:16,404 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>>>> SHUTDOWN_MSG:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>>>>>>>
>>>>>>>>> do you get any error when trying to connect to cluster, something
>>>>>>>>> like 'tried n times' or replicated 0 times.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sun, May 12, 2013 at 7:28 PM, sam liu <sa...@gmail.com>wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I setup a cluster with 3 nodes, and after that I did not submit
>>>>>>>>>> any job on it. But, after few days, I found the cluster is unhealthy:
>>>>>>>>>> - No result returned after issuing command 'hadoop dfs -ls /' or
>>>>>>>>>> 'hadoop dfsadmin -report' for a while
>>>>>>>>>> - The page of 'http://namenode:50070' could not be opened as
>>>>>>>>>> expected...
>>>>>>>>>> - ...
>>>>>>>>>>
>>>>>>>>>> I did not find any usefull info in the logs, but found the
>>>>>>>>>> avaible memory of the cluster nodes are very low at that time:
>>>>>>>>>> - node1(NN,JT,DN,TT): 158 mb mem is available
>>>>>>>>>> - node2(DN,TT): 75 mb mem is available
>>>>>>>>>> - node3(DN,TT): 174 mb mem is available
>>>>>>>>>>
>>>>>>>>>> I guess the issue of my cluster is caused by lacking of memeory,
>>>>>>>>>> and my questions are:
>>>>>>>>>> - Without running jobs, what's the minimum memory requirements to
>>>>>>>>>> datanode and namenode?
>>>>>>>>>> - How to define the minimum memeory for datanode and namenode?
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>>
>>>>>>>>>> Sam Liu
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Nitin Pawar
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>
>
> --
> Nitin Pawar
>

Re: The minimum memory requirements to datanode and namenode?

Posted by shashwat shriparv <dw...@gmail.com>.

Due to Small amount of memory available to the nodes they are not able to
send response in time, and socket connection exception, and there may be
some network issue to.

Please check which program is using memory? as there will be some other
cohosted application eating up the memory.

ps -e -orss=,args= | sort -b -k1,1n | pr -TW$COLUMNS

or

give top command then press shift+M
anc then c
and check application is eating up the memory.

there must be apmple memory available to the nodes beside the reserved for
JVM

*Thanks & Regards    *

∞
Shashwat Shriparv



On Mon, May 13, 2013 at 12:23 PM, Nitin Pawar <ni...@gmail.com>wrote:

> 4GB memory on NN? this will run out of memory in few days.
>
> You will need to make sure your NN has atleast more than double RAM of
> your DNs if you have a miniature  cluster.
>
>
> On Mon, May 13, 2013 at 11:52 AM, sam liu <sa...@gmail.com> wrote:
>
>> I can issue a command 'hadoop dfsadmin -report', but it did not return
>> any result for a long time. Also, I can open the NN UI(
>> http://namenode:50070), but it is always keeping in the connecting
>> status, and could not return any cluster statistic.
>>
>> The mem of NN:
>>                   total       used       free
>> Mem:          3834       3686        148
>>
>> After running a top command, I can see following process are taking up
>> the memory: namenode, jobtracker, tasktracker, hbase, ...
>>
>> I can restart the cluster, and then the cluster will be healthy. But this
>> issue will probably occur in a few days later. I think it's caused by
>> lacking of free/available mem, but do not know how many extra
>> free/available mem of node is required, besides the necessary mem for
>> running datanode/tasktracker process?
>>
>>
>>
>>
>> 2013/5/13 Nitin Pawar <ni...@gmail.com>
>>
>>> just one node not having memory does not mean your cluster is down.
>>>
>>> Can you see your hdfs health on NN UI?
>>>
>>> how much memory do you have on NN? if there are no jobs running on the
>>> cluster then you can safely restart datanode and tasktracker.
>>>
>>> Also run a top command and figure out which processes are taking up the
>>> memory and for what purpose?
>>>
>>>
>>> On Mon, May 13, 2013 at 11:28 AM, sam liu <sa...@gmail.com>wrote:
>>>
>>>> Nitin,
>>>>
>>>> In my cluster, the tasktracker and datanode already have been launched,
>>>> and are still running now. But the free/available mem of node3 now is just
>>>> 167 mb, and do you think it's the reason why my hadoop is unhealthy now(it
>>>> does not return result of command 'hadoop dfs -ls /')?
>>>>
>>>>
>>>> 2013/5/13 Nitin Pawar <ni...@gmail.com>
>>>>
>>>>> Sam,
>>>>>
>>>>> There is no formula for determining how much memory one should give to
>>>>> datanode and tasktracker. Ther formula is available for how many slots you
>>>>> want to have on a machine.
>>>>>
>>>>> In my prior experience, we did give 512MB memory each to a datanode
>>>>> and tasktracker.
>>>>>
>>>>>
>>>>> On Mon, May 13, 2013 at 11:18 AM, sam liu <sa...@gmail.com>wrote:
>>>>>
>>>>>> For node3, the memory is:
>>>>>>                    total       used       free     shared
>>>>>> buffers     cached
>>>>>> Mem:          3834       3666        167          0        187
>>>>>> 1136
>>>>>> -/+ buffers/cache:       2342       1491
>>>>>> Swap:         8196          0       8196
>>>>>>
>>>>>> To a 3 nodes cluster as mine, what's the required minimum
>>>>>> free/available memory for the datanode process and tasktracker process,
>>>>>> without running any map/reduce task?
>>>>>> Any formula to determine it?
>>>>>>
>>>>>>
>>>>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>>>>>
>>>>>>> can you tell specs of node3. Even on a test/demo cluster, anything
>>>>>>> below 4 GB ram makes the node almost inaccessible as per my experience.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sun, May 12, 2013 at 8:25 PM, sam liu <sa...@gmail.com>wrote:
>>>>>>>
>>>>>>>> Got some exceptions on node3:
>>>>>>>> 1. datanode log:
>>>>>>>> 2013-04-17 11:13:44,719 INFO
>>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
>>>>>>>> blk_2478755809192724446_1477 received exception
>>>>>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
>>>>>>>> channel to be ready for read. ch :
>>>>>>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>>>>>>> 9.50.102.79:50010]
>>>>>>>> 2013-04-17 11:13:44,721 ERROR
>>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>>>>>>>> 9.50.102.80:50010,
>>>>>>>> storageID=DS-2038715921-9.50.102.80-50010-1366091297051, infoPort=50075,
>>>>>>>> ipcPort=50020):DataXceiver
>>>>>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting
>>>>>>>> for channel to be ready for read. ch :
>>>>>>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>>>>>>> 9.50.102.79:50010]
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
>>>>>>>>         at
>>>>>>>> java.io.DataInputStream.readShort(DataInputStream.java:306)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:359)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
>>>>>>>>         at java.lang.Thread.run(Thread.java:738)
>>>>>>>> 2013-04-17 11:13:44,818 INFO
>>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>>>>>>>> blk_8413378381769505032_1477 src: /9.50.102.81:35279 dest: /
>>>>>>>> 9.50.102.80:50010
>>>>>>>>
>>>>>>>>
>>>>>>>> 2. tasktracker log:
>>>>>>>> 2013-04-23 11:48:26,783 INFO
>>>>>>>> org.apache.hadoop.mapred.UserLogCleaner: Deleting user log path
>>>>>>>> job_201304152248_0011
>>>>>>>> 2013-04-30 14:48:15,506 ERROR org.apache.hadoop.mapred.TaskTracker:
>>>>>>>> Caught exception: java.io.IOException: Call to node1/
>>>>>>>> 9.50.102.81:9001 failed on local exception: java.io.IOException:
>>>>>>>> Connection reset by peer
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
>>>>>>>>         at org.apache.hadoop.ipc.Client.call(Client.java:1112)
>>>>>>>>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
>>>>>>>>         at org.apache.hadoop.mapred.$Proxy2.heartbeat(Unknown
>>>>>>>> Source)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:2008)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1802)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2654)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3909)
>>>>>>>> Caused by: java.io.IOException: Connection reset by peer
>>>>>>>>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>>>>>>>>         at
>>>>>>>> sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:33)
>>>>>>>>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:210)
>>>>>>>>         at sun.nio.ch.IOUtil.read(IOUtil.java:183)
>>>>>>>>         at
>>>>>>>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:257)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>>>>>         at
>>>>>>>> java.io.FilterInputStream.read(FilterInputStream.java:127)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:361)
>>>>>>>>         at
>>>>>>>> java.io.BufferedInputStream.fill(BufferedInputStream.java:229)
>>>>>>>>         at
>>>>>>>> java.io.BufferedInputStream.read(BufferedInputStream.java:248)
>>>>>>>>         at java.io.DataInputStream.readInt(DataInputStream.java:381)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.ipc.Client$Connection.run(Client.java:786)
>>>>>>>>
>>>>>>>> 2013-04-30 14:48:15,517 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>>>> Resending 'status' to 'node1' with reponseId '-12904
>>>>>>>> 2013-04-30 14:48:16,404 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>>>> SHUTDOWN_MSG:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>>>>>>>
>>>>>>>>> do you get any error when trying to connect to cluster, something
>>>>>>>>> like 'tried n times' or replicated 0 times.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sun, May 12, 2013 at 7:28 PM, sam liu <sa...@gmail.com>wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I setup a cluster with 3 nodes, and after that I did not submit
>>>>>>>>>> any job on it. But, after few days, I found the cluster is unhealthy:
>>>>>>>>>> - No result returned after issuing command 'hadoop dfs -ls /' or
>>>>>>>>>> 'hadoop dfsadmin -report' for a while
>>>>>>>>>> - The page of 'http://namenode:50070' could not be opened as
>>>>>>>>>> expected...
>>>>>>>>>> - ...
>>>>>>>>>>
>>>>>>>>>> I did not find any usefull info in the logs, but found the
>>>>>>>>>> avaible memory of the cluster nodes are very low at that time:
>>>>>>>>>> - node1(NN,JT,DN,TT): 158 mb mem is available
>>>>>>>>>> - node2(DN,TT): 75 mb mem is available
>>>>>>>>>> - node3(DN,TT): 174 mb mem is available
>>>>>>>>>>
>>>>>>>>>> I guess the issue of my cluster is caused by lacking of memeory,
>>>>>>>>>> and my questions are:
>>>>>>>>>> - Without running jobs, what's the minimum memory requirements to
>>>>>>>>>> datanode and namenode?
>>>>>>>>>> - How to define the minimum memeory for datanode and namenode?
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>>
>>>>>>>>>> Sam Liu
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Nitin Pawar
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>
>
> --
> Nitin Pawar
>

Re: The minimum memory requirements to datanode and namenode?

Posted by shashwat shriparv <dw...@gmail.com>.

Due to Small amount of memory available to the nodes they are not able to
send response in time, and socket connection exception, and there may be
some network issue to.

Please check which program is using memory? as there will be some other
cohosted application eating up the memory.

ps -e -orss=,args= | sort -b -k1,1n | pr -TW$COLUMNS

or

give top command then press shift+M
anc then c
and check application is eating up the memory.

there must be apmple memory available to the nodes beside the reserved for
JVM

*Thanks & Regards    *

∞
Shashwat Shriparv



On Mon, May 13, 2013 at 12:23 PM, Nitin Pawar <ni...@gmail.com>wrote:

> 4GB memory on NN? this will run out of memory in few days.
>
> You will need to make sure your NN has atleast more than double RAM of
> your DNs if you have a miniature  cluster.
>
>
> On Mon, May 13, 2013 at 11:52 AM, sam liu <sa...@gmail.com> wrote:
>
>> I can issue a command 'hadoop dfsadmin -report', but it did not return
>> any result for a long time. Also, I can open the NN UI(
>> http://namenode:50070), but it is always keeping in the connecting
>> status, and could not return any cluster statistic.
>>
>> The mem of NN:
>>                   total       used       free
>> Mem:          3834       3686        148
>>
>> After running a top command, I can see following process are taking up
>> the memory: namenode, jobtracker, tasktracker, hbase, ...
>>
>> I can restart the cluster, and then the cluster will be healthy. But this
>> issue will probably occur in a few days later. I think it's caused by
>> lacking of free/available mem, but do not know how many extra
>> free/available mem of node is required, besides the necessary mem for
>> running datanode/tasktracker process?
>>
>>
>>
>>
>> 2013/5/13 Nitin Pawar <ni...@gmail.com>
>>
>>> just one node not having memory does not mean your cluster is down.
>>>
>>> Can you see your hdfs health on NN UI?
>>>
>>> how much memory do you have on NN? if there are no jobs running on the
>>> cluster then you can safely restart datanode and tasktracker.
>>>
>>> Also run a top command and figure out which processes are taking up the
>>> memory and for what purpose?
>>>
>>>
>>> On Mon, May 13, 2013 at 11:28 AM, sam liu <sa...@gmail.com>wrote:
>>>
>>>> Nitin,
>>>>
>>>> In my cluster, the tasktracker and datanode already have been launched,
>>>> and are still running now. But the free/available mem of node3 now is just
>>>> 167 mb, and do you think it's the reason why my hadoop is unhealthy now(it
>>>> does not return result of command 'hadoop dfs -ls /')?
>>>>
>>>>
>>>> 2013/5/13 Nitin Pawar <ni...@gmail.com>
>>>>
>>>>> Sam,
>>>>>
>>>>> There is no formula for determining how much memory one should give to
>>>>> datanode and tasktracker. Ther formula is available for how many slots you
>>>>> want to have on a machine.
>>>>>
>>>>> In my prior experience, we did give 512MB memory each to a datanode
>>>>> and tasktracker.
>>>>>
>>>>>
>>>>> On Mon, May 13, 2013 at 11:18 AM, sam liu <sa...@gmail.com>wrote:
>>>>>
>>>>>> For node3, the memory is:
>>>>>>                    total       used       free     shared
>>>>>> buffers     cached
>>>>>> Mem:          3834       3666        167          0        187
>>>>>> 1136
>>>>>> -/+ buffers/cache:       2342       1491
>>>>>> Swap:         8196          0       8196
>>>>>>
>>>>>> To a 3 nodes cluster as mine, what's the required minimum
>>>>>> free/available memory for the datanode process and tasktracker process,
>>>>>> without running any map/reduce task?
>>>>>> Any formula to determine it?
>>>>>>
>>>>>>
>>>>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>>>>>
>>>>>>> can you tell specs of node3. Even on a test/demo cluster, anything
>>>>>>> below 4 GB ram makes the node almost inaccessible as per my experience.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sun, May 12, 2013 at 8:25 PM, sam liu <sa...@gmail.com>wrote:
>>>>>>>
>>>>>>>> Got some exceptions on node3:
>>>>>>>> 1. datanode log:
>>>>>>>> 2013-04-17 11:13:44,719 INFO
>>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
>>>>>>>> blk_2478755809192724446_1477 received exception
>>>>>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
>>>>>>>> channel to be ready for read. ch :
>>>>>>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>>>>>>> 9.50.102.79:50010]
>>>>>>>> 2013-04-17 11:13:44,721 ERROR
>>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>>>>>>>> 9.50.102.80:50010,
>>>>>>>> storageID=DS-2038715921-9.50.102.80-50010-1366091297051, infoPort=50075,
>>>>>>>> ipcPort=50020):DataXceiver
>>>>>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting
>>>>>>>> for channel to be ready for read. ch :
>>>>>>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>>>>>>> 9.50.102.79:50010]
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
>>>>>>>>         at
>>>>>>>> java.io.DataInputStream.readShort(DataInputStream.java:306)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:359)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
>>>>>>>>         at java.lang.Thread.run(Thread.java:738)
>>>>>>>> 2013-04-17 11:13:44,818 INFO
>>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>>>>>>>> blk_8413378381769505032_1477 src: /9.50.102.81:35279 dest: /
>>>>>>>> 9.50.102.80:50010
>>>>>>>>
>>>>>>>>
>>>>>>>> 2. tasktracker log:
>>>>>>>> 2013-04-23 11:48:26,783 INFO
>>>>>>>> org.apache.hadoop.mapred.UserLogCleaner: Deleting user log path
>>>>>>>> job_201304152248_0011
>>>>>>>> 2013-04-30 14:48:15,506 ERROR org.apache.hadoop.mapred.TaskTracker:
>>>>>>>> Caught exception: java.io.IOException: Call to node1/
>>>>>>>> 9.50.102.81:9001 failed on local exception: java.io.IOException:
>>>>>>>> Connection reset by peer
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
>>>>>>>>         at org.apache.hadoop.ipc.Client.call(Client.java:1112)
>>>>>>>>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
>>>>>>>>         at org.apache.hadoop.mapred.$Proxy2.heartbeat(Unknown
>>>>>>>> Source)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:2008)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1802)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2654)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3909)
>>>>>>>> Caused by: java.io.IOException: Connection reset by peer
>>>>>>>>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>>>>>>>>         at
>>>>>>>> sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:33)
>>>>>>>>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:210)
>>>>>>>>         at sun.nio.ch.IOUtil.read(IOUtil.java:183)
>>>>>>>>         at
>>>>>>>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:257)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>>>>>         at
>>>>>>>> java.io.FilterInputStream.read(FilterInputStream.java:127)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:361)
>>>>>>>>         at
>>>>>>>> java.io.BufferedInputStream.fill(BufferedInputStream.java:229)
>>>>>>>>         at
>>>>>>>> java.io.BufferedInputStream.read(BufferedInputStream.java:248)
>>>>>>>>         at java.io.DataInputStream.readInt(DataInputStream.java:381)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.ipc.Client$Connection.run(Client.java:786)
>>>>>>>>
>>>>>>>> 2013-04-30 14:48:15,517 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>>>> Resending 'status' to 'node1' with reponseId '-12904
>>>>>>>> 2013-04-30 14:48:16,404 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>>>> SHUTDOWN_MSG:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>>>>>>>
>>>>>>>>> do you get any error when trying to connect to cluster, something
>>>>>>>>> like 'tried n times' or replicated 0 times.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sun, May 12, 2013 at 7:28 PM, sam liu <sa...@gmail.com>wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I setup a cluster with 3 nodes, and after that I did not submit
>>>>>>>>>> any job on it. But, after few days, I found the cluster is unhealthy:
>>>>>>>>>> - No result returned after issuing command 'hadoop dfs -ls /' or
>>>>>>>>>> 'hadoop dfsadmin -report' for a while
>>>>>>>>>> - The page of 'http://namenode:50070' could not be opened as
>>>>>>>>>> expected...
>>>>>>>>>> - ...
>>>>>>>>>>
>>>>>>>>>> I did not find any usefull info in the logs, but found the
>>>>>>>>>> avaible memory of the cluster nodes are very low at that time:
>>>>>>>>>> - node1(NN,JT,DN,TT): 158 mb mem is available
>>>>>>>>>> - node2(DN,TT): 75 mb mem is available
>>>>>>>>>> - node3(DN,TT): 174 mb mem is available
>>>>>>>>>>
>>>>>>>>>> I guess the issue of my cluster is caused by lacking of memeory,
>>>>>>>>>> and my questions are:
>>>>>>>>>> - Without running jobs, what's the minimum memory requirements to
>>>>>>>>>> datanode and namenode?
>>>>>>>>>> - How to define the minimum memeory for datanode and namenode?
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>>
>>>>>>>>>> Sam Liu
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Nitin Pawar
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>
>
> --
> Nitin Pawar
>

Re: The minimum memory requirements to datanode and namenode?

Posted by Nitin Pawar <ni...@gmail.com>.

4GB memory on NN? this will run out of memory in few days.

You will need to make sure your NN has atleast more than double RAM of your
DNs if you have a miniature  cluster.


On Mon, May 13, 2013 at 11:52 AM, sam liu <sa...@gmail.com> wrote:

> I can issue a command 'hadoop dfsadmin -report', but it did not return any
> result for a long time. Also, I can open the NN UI(http://namenode:50070),
> but it is always keeping in the connecting status, and could not return any
> cluster statistic.
>
> The mem of NN:
>                   total       used       free
> Mem:          3834       3686        148
>
> After running a top command, I can see following process are taking up the
> memory: namenode, jobtracker, tasktracker, hbase, ...
>
> I can restart the cluster, and then the cluster will be healthy. But this
> issue will probably occur in a few days later. I think it's caused by
> lacking of free/available mem, but do not know how many extra
> free/available mem of node is required, besides the necessary mem for
> running datanode/tasktracker process?
>
>
>
>
> 2013/5/13 Nitin Pawar <ni...@gmail.com>
>
>> just one node not having memory does not mean your cluster is down.
>>
>> Can you see your hdfs health on NN UI?
>>
>> how much memory do you have on NN? if there are no jobs running on the
>> cluster then you can safely restart datanode and tasktracker.
>>
>> Also run a top command and figure out which processes are taking up the
>> memory and for what purpose?
>>
>>
>> On Mon, May 13, 2013 at 11:28 AM, sam liu <sa...@gmail.com> wrote:
>>
>>> Nitin,
>>>
>>> In my cluster, the tasktracker and datanode already have been launched,
>>> and are still running now. But the free/available mem of node3 now is just
>>> 167 mb, and do you think it's the reason why my hadoop is unhealthy now(it
>>> does not return result of command 'hadoop dfs -ls /')?
>>>
>>>
>>> 2013/5/13 Nitin Pawar <ni...@gmail.com>
>>>
>>>> Sam,
>>>>
>>>> There is no formula for determining how much memory one should give to
>>>> datanode and tasktracker. Ther formula is available for how many slots you
>>>> want to have on a machine.
>>>>
>>>> In my prior experience, we did give 512MB memory each to a datanode and
>>>> tasktracker.
>>>>
>>>>
>>>> On Mon, May 13, 2013 at 11:18 AM, sam liu <sa...@gmail.com>wrote:
>>>>
>>>>> For node3, the memory is:
>>>>>                    total       used       free     shared
>>>>> buffers     cached
>>>>> Mem:          3834       3666        167          0        187
>>>>> 1136
>>>>> -/+ buffers/cache:       2342       1491
>>>>> Swap:         8196          0       8196
>>>>>
>>>>> To a 3 nodes cluster as mine, what's the required minimum
>>>>> free/available memory for the datanode process and tasktracker process,
>>>>> without running any map/reduce task?
>>>>> Any formula to determine it?
>>>>>
>>>>>
>>>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>>>>
>>>>>> can you tell specs of node3. Even on a test/demo cluster, anything
>>>>>> below 4 GB ram makes the node almost inaccessible as per my experience.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sun, May 12, 2013 at 8:25 PM, sam liu <sa...@gmail.com>wrote:
>>>>>>
>>>>>>> Got some exceptions on node3:
>>>>>>> 1. datanode log:
>>>>>>> 2013-04-17 11:13:44,719 INFO
>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
>>>>>>> blk_2478755809192724446_1477 received exception
>>>>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
>>>>>>> channel to be ready for read. ch :
>>>>>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>>>>>> 9.50.102.79:50010]
>>>>>>> 2013-04-17 11:13:44,721 ERROR
>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>>>>>>> 9.50.102.80:50010,
>>>>>>> storageID=DS-2038715921-9.50.102.80-50010-1366091297051, infoPort=50075,
>>>>>>> ipcPort=50020):DataXceiver
>>>>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting
>>>>>>> for channel to be ready for read. ch :
>>>>>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>>>>>> 9.50.102.79:50010]
>>>>>>>         at
>>>>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>>>>>>>         at
>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>>>>         at
>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>>>>         at
>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
>>>>>>>         at
>>>>>>> java.io.DataInputStream.readShort(DataInputStream.java:306)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:359)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
>>>>>>>         at java.lang.Thread.run(Thread.java:738)
>>>>>>> 2013-04-17 11:13:44,818 INFO
>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>>>>>>> blk_8413378381769505032_1477 src: /9.50.102.81:35279 dest: /
>>>>>>> 9.50.102.80:50010
>>>>>>>
>>>>>>>
>>>>>>> 2. tasktracker log:
>>>>>>> 2013-04-23 11:48:26,783 INFO
>>>>>>> org.apache.hadoop.mapred.UserLogCleaner: Deleting user log path
>>>>>>> job_201304152248_0011
>>>>>>> 2013-04-30 14:48:15,506 ERROR org.apache.hadoop.mapred.TaskTracker:
>>>>>>> Caught exception: java.io.IOException: Call to node1/
>>>>>>> 9.50.102.81:9001 failed on local exception: java.io.IOException:
>>>>>>> Connection reset by peer
>>>>>>>         at
>>>>>>> org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
>>>>>>>         at org.apache.hadoop.ipc.Client.call(Client.java:1112)
>>>>>>>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
>>>>>>>         at org.apache.hadoop.mapred.$Proxy2.heartbeat(Unknown Source)
>>>>>>>         at
>>>>>>> org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:2008)
>>>>>>>         at
>>>>>>> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1802)
>>>>>>>         at
>>>>>>> org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2654)
>>>>>>>         at
>>>>>>> org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3909)
>>>>>>> Caused by: java.io.IOException: Connection reset by peer
>>>>>>>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>>>>>>>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:33)
>>>>>>>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:210)
>>>>>>>         at sun.nio.ch.IOUtil.read(IOUtil.java:183)
>>>>>>>         at
>>>>>>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:257)
>>>>>>>         at
>>>>>>> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>>>>>>>         at
>>>>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>>>>>>>         at
>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>>>>         at
>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>>>>         at java.io.FilterInputStream.read(FilterInputStream.java:127)
>>>>>>>         at
>>>>>>> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:361)
>>>>>>>         at
>>>>>>> java.io.BufferedInputStream.fill(BufferedInputStream.java:229)
>>>>>>>         at
>>>>>>> java.io.BufferedInputStream.read(BufferedInputStream.java:248)
>>>>>>>         at java.io.DataInputStream.readInt(DataInputStream.java:381)
>>>>>>>         at
>>>>>>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841)
>>>>>>>         at
>>>>>>> org.apache.hadoop.ipc.Client$Connection.run(Client.java:786)
>>>>>>>
>>>>>>> 2013-04-30 14:48:15,517 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>>> Resending 'status' to 'node1' with reponseId '-12904
>>>>>>> 2013-04-30 14:48:16,404 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>>> SHUTDOWN_MSG:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>>>>>>
>>>>>>>> do you get any error when trying to connect to cluster, something
>>>>>>>> like 'tried n times' or replicated 0 times.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sun, May 12, 2013 at 7:28 PM, sam liu <sa...@gmail.com>wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I setup a cluster with 3 nodes, and after that I did not submit
>>>>>>>>> any job on it. But, after few days, I found the cluster is unhealthy:
>>>>>>>>> - No result returned after issuing command 'hadoop dfs -ls /' or
>>>>>>>>> 'hadoop dfsadmin -report' for a while
>>>>>>>>> - The page of 'http://namenode:50070' could not be opened as
>>>>>>>>> expected...
>>>>>>>>> - ...
>>>>>>>>>
>>>>>>>>> I did not find any usefull info in the logs, but found the avaible
>>>>>>>>> memory of the cluster nodes are very low at that time:
>>>>>>>>> - node1(NN,JT,DN,TT): 158 mb mem is available
>>>>>>>>> - node2(DN,TT): 75 mb mem is available
>>>>>>>>> - node3(DN,TT): 174 mb mem is available
>>>>>>>>>
>>>>>>>>> I guess the issue of my cluster is caused by lacking of memeory,
>>>>>>>>> and my questions are:
>>>>>>>>> - Without running jobs, what's the minimum memory requirements to
>>>>>>>>> datanode and namenode?
>>>>>>>>> - How to define the minimum memeory for datanode and namenode?
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>> Sam Liu
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Nitin Pawar
>>>>
>>>
>>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>


-- 
Nitin Pawar

Re: The minimum memory requirements to datanode and namenode?

Posted by Nitin Pawar <ni...@gmail.com>.

4GB memory on NN? this will run out of memory in few days.

You will need to make sure your NN has atleast more than double RAM of your
DNs if you have a miniature  cluster.


On Mon, May 13, 2013 at 11:52 AM, sam liu <sa...@gmail.com> wrote:

> I can issue a command 'hadoop dfsadmin -report', but it did not return any
> result for a long time. Also, I can open the NN UI(http://namenode:50070),
> but it is always keeping in the connecting status, and could not return any
> cluster statistic.
>
> The mem of NN:
>                   total       used       free
> Mem:          3834       3686        148
>
> After running a top command, I can see following process are taking up the
> memory: namenode, jobtracker, tasktracker, hbase, ...
>
> I can restart the cluster, and then the cluster will be healthy. But this
> issue will probably occur in a few days later. I think it's caused by
> lacking of free/available mem, but do not know how many extra
> free/available mem of node is required, besides the necessary mem for
> running datanode/tasktracker process?
>
>
>
>
> 2013/5/13 Nitin Pawar <ni...@gmail.com>
>
>> just one node not having memory does not mean your cluster is down.
>>
>> Can you see your hdfs health on NN UI?
>>
>> how much memory do you have on NN? if there are no jobs running on the
>> cluster then you can safely restart datanode and tasktracker.
>>
>> Also run a top command and figure out which processes are taking up the
>> memory and for what purpose?
>>
>>
>> On Mon, May 13, 2013 at 11:28 AM, sam liu <sa...@gmail.com> wrote:
>>
>>> Nitin,
>>>
>>> In my cluster, the tasktracker and datanode already have been launched,
>>> and are still running now. But the free/available mem of node3 now is just
>>> 167 mb, and do you think it's the reason why my hadoop is unhealthy now(it
>>> does not return result of command 'hadoop dfs -ls /')?
>>>
>>>
>>> 2013/5/13 Nitin Pawar <ni...@gmail.com>
>>>
>>>> Sam,
>>>>
>>>> There is no formula for determining how much memory one should give to
>>>> datanode and tasktracker. Ther formula is available for how many slots you
>>>> want to have on a machine.
>>>>
>>>> In my prior experience, we did give 512MB memory each to a datanode and
>>>> tasktracker.
>>>>
>>>>
>>>> On Mon, May 13, 2013 at 11:18 AM, sam liu <sa...@gmail.com>wrote:
>>>>
>>>>> For node3, the memory is:
>>>>>                    total       used       free     shared
>>>>> buffers     cached
>>>>> Mem:          3834       3666        167          0        187
>>>>> 1136
>>>>> -/+ buffers/cache:       2342       1491
>>>>> Swap:         8196          0       8196
>>>>>
>>>>> To a 3 nodes cluster as mine, what's the required minimum
>>>>> free/available memory for the datanode process and tasktracker process,
>>>>> without running any map/reduce task?
>>>>> Any formula to determine it?
>>>>>
>>>>>
>>>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>>>>
>>>>>> can you tell specs of node3. Even on a test/demo cluster, anything
>>>>>> below 4 GB ram makes the node almost inaccessible as per my experience.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sun, May 12, 2013 at 8:25 PM, sam liu <sa...@gmail.com>wrote:
>>>>>>
>>>>>>> Got some exceptions on node3:
>>>>>>> 1. datanode log:
>>>>>>> 2013-04-17 11:13:44,719 INFO
>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
>>>>>>> blk_2478755809192724446_1477 received exception
>>>>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
>>>>>>> channel to be ready for read. ch :
>>>>>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>>>>>> 9.50.102.79:50010]
>>>>>>> 2013-04-17 11:13:44,721 ERROR
>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>>>>>>> 9.50.102.80:50010,
>>>>>>> storageID=DS-2038715921-9.50.102.80-50010-1366091297051, infoPort=50075,
>>>>>>> ipcPort=50020):DataXceiver
>>>>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting
>>>>>>> for channel to be ready for read. ch :
>>>>>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>>>>>> 9.50.102.79:50010]
>>>>>>>         at
>>>>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>>>>>>>         at
>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>>>>         at
>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>>>>         at
>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
>>>>>>>         at
>>>>>>> java.io.DataInputStream.readShort(DataInputStream.java:306)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:359)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
>>>>>>>         at java.lang.Thread.run(Thread.java:738)
>>>>>>> 2013-04-17 11:13:44,818 INFO
>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>>>>>>> blk_8413378381769505032_1477 src: /9.50.102.81:35279 dest: /
>>>>>>> 9.50.102.80:50010
>>>>>>>
>>>>>>>
>>>>>>> 2. tasktracker log:
>>>>>>> 2013-04-23 11:48:26,783 INFO
>>>>>>> org.apache.hadoop.mapred.UserLogCleaner: Deleting user log path
>>>>>>> job_201304152248_0011
>>>>>>> 2013-04-30 14:48:15,506 ERROR org.apache.hadoop.mapred.TaskTracker:
>>>>>>> Caught exception: java.io.IOException: Call to node1/
>>>>>>> 9.50.102.81:9001 failed on local exception: java.io.IOException:
>>>>>>> Connection reset by peer
>>>>>>>         at
>>>>>>> org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
>>>>>>>         at org.apache.hadoop.ipc.Client.call(Client.java:1112)
>>>>>>>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
>>>>>>>         at org.apache.hadoop.mapred.$Proxy2.heartbeat(Unknown Source)
>>>>>>>         at
>>>>>>> org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:2008)
>>>>>>>         at
>>>>>>> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1802)
>>>>>>>         at
>>>>>>> org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2654)
>>>>>>>         at
>>>>>>> org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3909)
>>>>>>> Caused by: java.io.IOException: Connection reset by peer
>>>>>>>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>>>>>>>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:33)
>>>>>>>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:210)
>>>>>>>         at sun.nio.ch.IOUtil.read(IOUtil.java:183)
>>>>>>>         at
>>>>>>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:257)
>>>>>>>         at
>>>>>>> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>>>>>>>         at
>>>>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>>>>>>>         at
>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>>>>         at
>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>>>>         at java.io.FilterInputStream.read(FilterInputStream.java:127)
>>>>>>>         at
>>>>>>> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:361)
>>>>>>>         at
>>>>>>> java.io.BufferedInputStream.fill(BufferedInputStream.java:229)
>>>>>>>         at
>>>>>>> java.io.BufferedInputStream.read(BufferedInputStream.java:248)
>>>>>>>         at java.io.DataInputStream.readInt(DataInputStream.java:381)
>>>>>>>         at
>>>>>>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841)
>>>>>>>         at
>>>>>>> org.apache.hadoop.ipc.Client$Connection.run(Client.java:786)
>>>>>>>
>>>>>>> 2013-04-30 14:48:15,517 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>>> Resending 'status' to 'node1' with reponseId '-12904
>>>>>>> 2013-04-30 14:48:16,404 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>>> SHUTDOWN_MSG:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>>>>>>
>>>>>>>> do you get any error when trying to connect to cluster, something
>>>>>>>> like 'tried n times' or replicated 0 times.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sun, May 12, 2013 at 7:28 PM, sam liu <sa...@gmail.com>wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I setup a cluster with 3 nodes, and after that I did not submit
>>>>>>>>> any job on it. But, after few days, I found the cluster is unhealthy:
>>>>>>>>> - No result returned after issuing command 'hadoop dfs -ls /' or
>>>>>>>>> 'hadoop dfsadmin -report' for a while
>>>>>>>>> - The page of 'http://namenode:50070' could not be opened as
>>>>>>>>> expected...
>>>>>>>>> - ...
>>>>>>>>>
>>>>>>>>> I did not find any usefull info in the logs, but found the avaible
>>>>>>>>> memory of the cluster nodes are very low at that time:
>>>>>>>>> - node1(NN,JT,DN,TT): 158 mb mem is available
>>>>>>>>> - node2(DN,TT): 75 mb mem is available
>>>>>>>>> - node3(DN,TT): 174 mb mem is available
>>>>>>>>>
>>>>>>>>> I guess the issue of my cluster is caused by lacking of memeory,
>>>>>>>>> and my questions are:
>>>>>>>>> - Without running jobs, what's the minimum memory requirements to
>>>>>>>>> datanode and namenode?
>>>>>>>>> - How to define the minimum memeory for datanode and namenode?
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>> Sam Liu
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Nitin Pawar
>>>>
>>>
>>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>


-- 
Nitin Pawar

Re: The minimum memory requirements to datanode and namenode?

Posted by Nitin Pawar <ni...@gmail.com>.

4GB memory on NN? this will run out of memory in few days.

You will need to make sure your NN has atleast more than double RAM of your
DNs if you have a miniature  cluster.


On Mon, May 13, 2013 at 11:52 AM, sam liu <sa...@gmail.com> wrote:

> I can issue a command 'hadoop dfsadmin -report', but it did not return any
> result for a long time. Also, I can open the NN UI(http://namenode:50070),
> but it is always keeping in the connecting status, and could not return any
> cluster statistic.
>
> The mem of NN:
>                   total       used       free
> Mem:          3834       3686        148
>
> After running a top command, I can see following process are taking up the
> memory: namenode, jobtracker, tasktracker, hbase, ...
>
> I can restart the cluster, and then the cluster will be healthy. But this
> issue will probably occur in a few days later. I think it's caused by
> lacking of free/available mem, but do not know how many extra
> free/available mem of node is required, besides the necessary mem for
> running datanode/tasktracker process?
>
>
>
>
> 2013/5/13 Nitin Pawar <ni...@gmail.com>
>
>> just one node not having memory does not mean your cluster is down.
>>
>> Can you see your hdfs health on NN UI?
>>
>> how much memory do you have on NN? if there are no jobs running on the
>> cluster then you can safely restart datanode and tasktracker.
>>
>> Also run a top command and figure out which processes are taking up the
>> memory and for what purpose?
>>
>>
>> On Mon, May 13, 2013 at 11:28 AM, sam liu <sa...@gmail.com> wrote:
>>
>>> Nitin,
>>>
>>> In my cluster, the tasktracker and datanode already have been launched,
>>> and are still running now. But the free/available mem of node3 now is just
>>> 167 mb, and do you think it's the reason why my hadoop is unhealthy now(it
>>> does not return result of command 'hadoop dfs -ls /')?
>>>
>>>
>>> 2013/5/13 Nitin Pawar <ni...@gmail.com>
>>>
>>>> Sam,
>>>>
>>>> There is no formula for determining how much memory one should give to
>>>> datanode and tasktracker. Ther formula is available for how many slots you
>>>> want to have on a machine.
>>>>
>>>> In my prior experience, we did give 512MB memory each to a datanode and
>>>> tasktracker.
>>>>
>>>>
>>>> On Mon, May 13, 2013 at 11:18 AM, sam liu <sa...@gmail.com>wrote:
>>>>
>>>>> For node3, the memory is:
>>>>>                    total       used       free     shared
>>>>> buffers     cached
>>>>> Mem:          3834       3666        167          0        187
>>>>> 1136
>>>>> -/+ buffers/cache:       2342       1491
>>>>> Swap:         8196          0       8196
>>>>>
>>>>> To a 3 nodes cluster as mine, what's the required minimum
>>>>> free/available memory for the datanode process and tasktracker process,
>>>>> without running any map/reduce task?
>>>>> Any formula to determine it?
>>>>>
>>>>>
>>>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>>>>
>>>>>> can you tell specs of node3. Even on a test/demo cluster, anything
>>>>>> below 4 GB ram makes the node almost inaccessible as per my experience.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sun, May 12, 2013 at 8:25 PM, sam liu <sa...@gmail.com>wrote:
>>>>>>
>>>>>>> Got some exceptions on node3:
>>>>>>> 1. datanode log:
>>>>>>> 2013-04-17 11:13:44,719 INFO
>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
>>>>>>> blk_2478755809192724446_1477 received exception
>>>>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
>>>>>>> channel to be ready for read. ch :
>>>>>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>>>>>> 9.50.102.79:50010]
>>>>>>> 2013-04-17 11:13:44,721 ERROR
>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>>>>>>> 9.50.102.80:50010,
>>>>>>> storageID=DS-2038715921-9.50.102.80-50010-1366091297051, infoPort=50075,
>>>>>>> ipcPort=50020):DataXceiver
>>>>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting
>>>>>>> for channel to be ready for read. ch :
>>>>>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>>>>>> 9.50.102.79:50010]
>>>>>>>         at
>>>>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>>>>>>>         at
>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>>>>         at
>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>>>>         at
>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
>>>>>>>         at
>>>>>>> java.io.DataInputStream.readShort(DataInputStream.java:306)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:359)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
>>>>>>>         at java.lang.Thread.run(Thread.java:738)
>>>>>>> 2013-04-17 11:13:44,818 INFO
>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>>>>>>> blk_8413378381769505032_1477 src: /9.50.102.81:35279 dest: /
>>>>>>> 9.50.102.80:50010
>>>>>>>
>>>>>>>
>>>>>>> 2. tasktracker log:
>>>>>>> 2013-04-23 11:48:26,783 INFO
>>>>>>> org.apache.hadoop.mapred.UserLogCleaner: Deleting user log path
>>>>>>> job_201304152248_0011
>>>>>>> 2013-04-30 14:48:15,506 ERROR org.apache.hadoop.mapred.TaskTracker:
>>>>>>> Caught exception: java.io.IOException: Call to node1/
>>>>>>> 9.50.102.81:9001 failed on local exception: java.io.IOException:
>>>>>>> Connection reset by peer
>>>>>>>         at
>>>>>>> org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
>>>>>>>         at org.apache.hadoop.ipc.Client.call(Client.java:1112)
>>>>>>>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
>>>>>>>         at org.apache.hadoop.mapred.$Proxy2.heartbeat(Unknown Source)
>>>>>>>         at
>>>>>>> org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:2008)
>>>>>>>         at
>>>>>>> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1802)
>>>>>>>         at
>>>>>>> org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2654)
>>>>>>>         at
>>>>>>> org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3909)
>>>>>>> Caused by: java.io.IOException: Connection reset by peer
>>>>>>>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>>>>>>>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:33)
>>>>>>>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:210)
>>>>>>>         at sun.nio.ch.IOUtil.read(IOUtil.java:183)
>>>>>>>         at
>>>>>>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:257)
>>>>>>>         at
>>>>>>> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>>>>>>>         at
>>>>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>>>>>>>         at
>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>>>>         at
>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>>>>         at java.io.FilterInputStream.read(FilterInputStream.java:127)
>>>>>>>         at
>>>>>>> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:361)
>>>>>>>         at
>>>>>>> java.io.BufferedInputStream.fill(BufferedInputStream.java:229)
>>>>>>>         at
>>>>>>> java.io.BufferedInputStream.read(BufferedInputStream.java:248)
>>>>>>>         at java.io.DataInputStream.readInt(DataInputStream.java:381)
>>>>>>>         at
>>>>>>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841)
>>>>>>>         at
>>>>>>> org.apache.hadoop.ipc.Client$Connection.run(Client.java:786)
>>>>>>>
>>>>>>> 2013-04-30 14:48:15,517 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>>> Resending 'status' to 'node1' with reponseId '-12904
>>>>>>> 2013-04-30 14:48:16,404 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>>> SHUTDOWN_MSG:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>>>>>>
>>>>>>>> do you get any error when trying to connect to cluster, something
>>>>>>>> like 'tried n times' or replicated 0 times.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sun, May 12, 2013 at 7:28 PM, sam liu <sa...@gmail.com>wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I setup a cluster with 3 nodes, and after that I did not submit
>>>>>>>>> any job on it. But, after few days, I found the cluster is unhealthy:
>>>>>>>>> - No result returned after issuing command 'hadoop dfs -ls /' or
>>>>>>>>> 'hadoop dfsadmin -report' for a while
>>>>>>>>> - The page of 'http://namenode:50070' could not be opened as
>>>>>>>>> expected...
>>>>>>>>> - ...
>>>>>>>>>
>>>>>>>>> I did not find any usefull info in the logs, but found the avaible
>>>>>>>>> memory of the cluster nodes are very low at that time:
>>>>>>>>> - node1(NN,JT,DN,TT): 158 mb mem is available
>>>>>>>>> - node2(DN,TT): 75 mb mem is available
>>>>>>>>> - node3(DN,TT): 174 mb mem is available
>>>>>>>>>
>>>>>>>>> I guess the issue of my cluster is caused by lacking of memeory,
>>>>>>>>> and my questions are:
>>>>>>>>> - Without running jobs, what's the minimum memory requirements to
>>>>>>>>> datanode and namenode?
>>>>>>>>> - How to define the minimum memeory for datanode and namenode?
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>> Sam Liu
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Nitin Pawar
>>>>
>>>
>>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>


-- 
Nitin Pawar

Re: The minimum memory requirements to datanode and namenode?

Posted by Nitin Pawar <ni...@gmail.com>.

4GB memory on NN? this will run out of memory in few days.

You will need to make sure your NN has atleast more than double RAM of your
DNs if you have a miniature  cluster.


On Mon, May 13, 2013 at 11:52 AM, sam liu <sa...@gmail.com> wrote:

> I can issue a command 'hadoop dfsadmin -report', but it did not return any
> result for a long time. Also, I can open the NN UI(http://namenode:50070),
> but it is always keeping in the connecting status, and could not return any
> cluster statistic.
>
> The mem of NN:
>                   total       used       free
> Mem:          3834       3686        148
>
> After running a top command, I can see following process are taking up the
> memory: namenode, jobtracker, tasktracker, hbase, ...
>
> I can restart the cluster, and then the cluster will be healthy. But this
> issue will probably occur in a few days later. I think it's caused by
> lacking of free/available mem, but do not know how many extra
> free/available mem of node is required, besides the necessary mem for
> running datanode/tasktracker process?
>
>
>
>
> 2013/5/13 Nitin Pawar <ni...@gmail.com>
>
>> just one node not having memory does not mean your cluster is down.
>>
>> Can you see your hdfs health on NN UI?
>>
>> how much memory do you have on NN? if there are no jobs running on the
>> cluster then you can safely restart datanode and tasktracker.
>>
>> Also run a top command and figure out which processes are taking up the
>> memory and for what purpose?
>>
>>
>> On Mon, May 13, 2013 at 11:28 AM, sam liu <sa...@gmail.com> wrote:
>>
>>> Nitin,
>>>
>>> In my cluster, the tasktracker and datanode already have been launched,
>>> and are still running now. But the free/available mem of node3 now is just
>>> 167 mb, and do you think it's the reason why my hadoop is unhealthy now(it
>>> does not return result of command 'hadoop dfs -ls /')?
>>>
>>>
>>> 2013/5/13 Nitin Pawar <ni...@gmail.com>
>>>
>>>> Sam,
>>>>
>>>> There is no formula for determining how much memory one should give to
>>>> datanode and tasktracker. Ther formula is available for how many slots you
>>>> want to have on a machine.
>>>>
>>>> In my prior experience, we did give 512MB memory each to a datanode and
>>>> tasktracker.
>>>>
>>>>
>>>> On Mon, May 13, 2013 at 11:18 AM, sam liu <sa...@gmail.com>wrote:
>>>>
>>>>> For node3, the memory is:
>>>>>                    total       used       free     shared
>>>>> buffers     cached
>>>>> Mem:          3834       3666        167          0        187
>>>>> 1136
>>>>> -/+ buffers/cache:       2342       1491
>>>>> Swap:         8196          0       8196
>>>>>
>>>>> To a 3 nodes cluster as mine, what's the required minimum
>>>>> free/available memory for the datanode process and tasktracker process,
>>>>> without running any map/reduce task?
>>>>> Any formula to determine it?
>>>>>
>>>>>
>>>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>>>>
>>>>>> can you tell specs of node3. Even on a test/demo cluster, anything
>>>>>> below 4 GB ram makes the node almost inaccessible as per my experience.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sun, May 12, 2013 at 8:25 PM, sam liu <sa...@gmail.com>wrote:
>>>>>>
>>>>>>> Got some exceptions on node3:
>>>>>>> 1. datanode log:
>>>>>>> 2013-04-17 11:13:44,719 INFO
>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
>>>>>>> blk_2478755809192724446_1477 received exception
>>>>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
>>>>>>> channel to be ready for read. ch :
>>>>>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>>>>>> 9.50.102.79:50010]
>>>>>>> 2013-04-17 11:13:44,721 ERROR
>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>>>>>>> 9.50.102.80:50010,
>>>>>>> storageID=DS-2038715921-9.50.102.80-50010-1366091297051, infoPort=50075,
>>>>>>> ipcPort=50020):DataXceiver
>>>>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting
>>>>>>> for channel to be ready for read. ch :
>>>>>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>>>>>> 9.50.102.79:50010]
>>>>>>>         at
>>>>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>>>>>>>         at
>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>>>>         at
>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>>>>         at
>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
>>>>>>>         at
>>>>>>> java.io.DataInputStream.readShort(DataInputStream.java:306)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:359)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
>>>>>>>         at java.lang.Thread.run(Thread.java:738)
>>>>>>> 2013-04-17 11:13:44,818 INFO
>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>>>>>>> blk_8413378381769505032_1477 src: /9.50.102.81:35279 dest: /
>>>>>>> 9.50.102.80:50010
>>>>>>>
>>>>>>>
>>>>>>> 2. tasktracker log:
>>>>>>> 2013-04-23 11:48:26,783 INFO
>>>>>>> org.apache.hadoop.mapred.UserLogCleaner: Deleting user log path
>>>>>>> job_201304152248_0011
>>>>>>> 2013-04-30 14:48:15,506 ERROR org.apache.hadoop.mapred.TaskTracker:
>>>>>>> Caught exception: java.io.IOException: Call to node1/
>>>>>>> 9.50.102.81:9001 failed on local exception: java.io.IOException:
>>>>>>> Connection reset by peer
>>>>>>>         at
>>>>>>> org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
>>>>>>>         at org.apache.hadoop.ipc.Client.call(Client.java:1112)
>>>>>>>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
>>>>>>>         at org.apache.hadoop.mapred.$Proxy2.heartbeat(Unknown Source)
>>>>>>>         at
>>>>>>> org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:2008)
>>>>>>>         at
>>>>>>> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1802)
>>>>>>>         at
>>>>>>> org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2654)
>>>>>>>         at
>>>>>>> org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3909)
>>>>>>> Caused by: java.io.IOException: Connection reset by peer
>>>>>>>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>>>>>>>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:33)
>>>>>>>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:210)
>>>>>>>         at sun.nio.ch.IOUtil.read(IOUtil.java:183)
>>>>>>>         at
>>>>>>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:257)
>>>>>>>         at
>>>>>>> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>>>>>>>         at
>>>>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>>>>>>>         at
>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>>>>         at
>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>>>>         at java.io.FilterInputStream.read(FilterInputStream.java:127)
>>>>>>>         at
>>>>>>> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:361)
>>>>>>>         at
>>>>>>> java.io.BufferedInputStream.fill(BufferedInputStream.java:229)
>>>>>>>         at
>>>>>>> java.io.BufferedInputStream.read(BufferedInputStream.java:248)
>>>>>>>         at java.io.DataInputStream.readInt(DataInputStream.java:381)
>>>>>>>         at
>>>>>>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841)
>>>>>>>         at
>>>>>>> org.apache.hadoop.ipc.Client$Connection.run(Client.java:786)
>>>>>>>
>>>>>>> 2013-04-30 14:48:15,517 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>>> Resending 'status' to 'node1' with reponseId '-12904
>>>>>>> 2013-04-30 14:48:16,404 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>>> SHUTDOWN_MSG:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>>>>>>
>>>>>>>> do you get any error when trying to connect to cluster, something
>>>>>>>> like 'tried n times' or replicated 0 times.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sun, May 12, 2013 at 7:28 PM, sam liu <sa...@gmail.com>wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I setup a cluster with 3 nodes, and after that I did not submit
>>>>>>>>> any job on it. But, after few days, I found the cluster is unhealthy:
>>>>>>>>> - No result returned after issuing command 'hadoop dfs -ls /' or
>>>>>>>>> 'hadoop dfsadmin -report' for a while
>>>>>>>>> - The page of 'http://namenode:50070' could not be opened as
>>>>>>>>> expected...
>>>>>>>>> - ...
>>>>>>>>>
>>>>>>>>> I did not find any usefull info in the logs, but found the avaible
>>>>>>>>> memory of the cluster nodes are very low at that time:
>>>>>>>>> - node1(NN,JT,DN,TT): 158 mb mem is available
>>>>>>>>> - node2(DN,TT): 75 mb mem is available
>>>>>>>>> - node3(DN,TT): 174 mb mem is available
>>>>>>>>>
>>>>>>>>> I guess the issue of my cluster is caused by lacking of memeory,
>>>>>>>>> and my questions are:
>>>>>>>>> - Without running jobs, what's the minimum memory requirements to
>>>>>>>>> datanode and namenode?
>>>>>>>>> - How to define the minimum memeory for datanode and namenode?
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>> Sam Liu
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Nitin Pawar
>>>>
>>>
>>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>


-- 
Nitin Pawar

Re: The minimum memory requirements to datanode and namenode?

Posted by sam liu <sa...@gmail.com>.

I can issue a command 'hadoop dfsadmin -report', but it did not return any
result for a long time. Also, I can open the NN UI(http://namenode:50070),
but it is always keeping in the connecting status, and could not return any
cluster statistic.

The mem of NN:
                  total       used       free
Mem:          3834       3686        148

After running a top command, I can see following process are taking up the
memory: namenode, jobtracker, tasktracker, hbase, ...

I can restart the cluster, and then the cluster will be healthy. But this
issue will probably occur in a few days later. I think it's caused by
lacking of free/available mem, but do not know how many extra
free/available mem of node is required, besides the necessary mem for
running datanode/tasktracker process?




2013/5/13 Nitin Pawar <ni...@gmail.com>

> just one node not having memory does not mean your cluster is down.
>
> Can you see your hdfs health on NN UI?
>
> how much memory do you have on NN? if there are no jobs running on the
> cluster then you can safely restart datanode and tasktracker.
>
> Also run a top command and figure out which processes are taking up the
> memory and for what purpose?
>
>
> On Mon, May 13, 2013 at 11:28 AM, sam liu <sa...@gmail.com> wrote:
>
>> Nitin,
>>
>> In my cluster, the tasktracker and datanode already have been launched,
>> and are still running now. But the free/available mem of node3 now is just
>> 167 mb, and do you think it's the reason why my hadoop is unhealthy now(it
>> does not return result of command 'hadoop dfs -ls /')?
>>
>>
>> 2013/5/13 Nitin Pawar <ni...@gmail.com>
>>
>>> Sam,
>>>
>>> There is no formula for determining how much memory one should give to
>>> datanode and tasktracker. Ther formula is available for how many slots you
>>> want to have on a machine.
>>>
>>> In my prior experience, we did give 512MB memory each to a datanode and
>>> tasktracker.
>>>
>>>
>>> On Mon, May 13, 2013 at 11:18 AM, sam liu <sa...@gmail.com>wrote:
>>>
>>>> For node3, the memory is:
>>>>                    total       used       free     shared
>>>> buffers     cached
>>>> Mem:          3834       3666        167          0        187
>>>> 1136
>>>> -/+ buffers/cache:       2342       1491
>>>> Swap:         8196          0       8196
>>>>
>>>> To a 3 nodes cluster as mine, what's the required minimum
>>>> free/available memory for the datanode process and tasktracker process,
>>>> without running any map/reduce task?
>>>> Any formula to determine it?
>>>>
>>>>
>>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>>>
>>>>> can you tell specs of node3. Even on a test/demo cluster, anything
>>>>> below 4 GB ram makes the node almost inaccessible as per my experience.
>>>>>
>>>>>
>>>>>
>>>>> On Sun, May 12, 2013 at 8:25 PM, sam liu <sa...@gmail.com>wrote:
>>>>>
>>>>>> Got some exceptions on node3:
>>>>>> 1. datanode log:
>>>>>> 2013-04-17 11:13:44,719 INFO
>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
>>>>>> blk_2478755809192724446_1477 received exception
>>>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
>>>>>> channel to be ready for read. ch :
>>>>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>>>>> 9.50.102.79:50010]
>>>>>> 2013-04-17 11:13:44,721 ERROR
>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>>>>>> 9.50.102.80:50010,
>>>>>> storageID=DS-2038715921-9.50.102.80-50010-1366091297051, infoPort=50075,
>>>>>> ipcPort=50020):DataXceiver
>>>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting
>>>>>> for channel to be ready for read. ch :
>>>>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>>>>> 9.50.102.79:50010]
>>>>>>         at
>>>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>>>>>>         at
>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>>>         at
>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>>>         at
>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
>>>>>>         at java.io.DataInputStream.readShort(DataInputStream.java:306)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:359)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
>>>>>>         at java.lang.Thread.run(Thread.java:738)
>>>>>> 2013-04-17 11:13:44,818 INFO
>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>>>>>> blk_8413378381769505032_1477 src: /9.50.102.81:35279 dest: /
>>>>>> 9.50.102.80:50010
>>>>>>
>>>>>>
>>>>>> 2. tasktracker log:
>>>>>> 2013-04-23 11:48:26,783 INFO org.apache.hadoop.mapred.UserLogCleaner:
>>>>>> Deleting user log path job_201304152248_0011
>>>>>> 2013-04-30 14:48:15,506 ERROR org.apache.hadoop.mapred.TaskTracker:
>>>>>> Caught exception: java.io.IOException: Call to node1/9.50.102.81:9001failed on local exception: java.io.IOException: Connection reset by peer
>>>>>>         at
>>>>>> org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
>>>>>>         at org.apache.hadoop.ipc.Client.call(Client.java:1112)
>>>>>>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
>>>>>>         at org.apache.hadoop.mapred.$Proxy2.heartbeat(Unknown Source)
>>>>>>         at
>>>>>> org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:2008)
>>>>>>         at
>>>>>> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1802)
>>>>>>         at
>>>>>> org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2654)
>>>>>>         at
>>>>>> org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3909)
>>>>>> Caused by: java.io.IOException: Connection reset by peer
>>>>>>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>>>>>>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:33)
>>>>>>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:210)
>>>>>>         at sun.nio.ch.IOUtil.read(IOUtil.java:183)
>>>>>>         at
>>>>>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:257)
>>>>>>         at
>>>>>> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>>>>>>         at
>>>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>>>>>>         at
>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>>>         at
>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>>>         at java.io.FilterInputStream.read(FilterInputStream.java:127)
>>>>>>         at
>>>>>> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:361)
>>>>>>         at
>>>>>> java.io.BufferedInputStream.fill(BufferedInputStream.java:229)
>>>>>>         at
>>>>>> java.io.BufferedInputStream.read(BufferedInputStream.java:248)
>>>>>>         at java.io.DataInputStream.readInt(DataInputStream.java:381)
>>>>>>         at
>>>>>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841)
>>>>>>         at
>>>>>> org.apache.hadoop.ipc.Client$Connection.run(Client.java:786)
>>>>>>
>>>>>> 2013-04-30 14:48:15,517 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>> Resending 'status' to 'node1' with reponseId '-12904
>>>>>> 2013-04-30 14:48:16,404 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>> SHUTDOWN_MSG:
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>>>>>
>>>>>>> do you get any error when trying to connect to cluster, something
>>>>>>> like 'tried n times' or replicated 0 times.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sun, May 12, 2013 at 7:28 PM, sam liu <sa...@gmail.com>wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I setup a cluster with 3 nodes, and after that I did not submit any
>>>>>>>> job on it. But, after few days, I found the cluster is unhealthy:
>>>>>>>> - No result returned after issuing command 'hadoop dfs -ls /' or
>>>>>>>> 'hadoop dfsadmin -report' for a while
>>>>>>>> - The page of 'http://namenode:50070' could not be opened as
>>>>>>>> expected...
>>>>>>>> - ...
>>>>>>>>
>>>>>>>> I did not find any usefull info in the logs, but found the avaible
>>>>>>>> memory of the cluster nodes are very low at that time:
>>>>>>>> - node1(NN,JT,DN,TT): 158 mb mem is available
>>>>>>>> - node2(DN,TT): 75 mb mem is available
>>>>>>>> - node3(DN,TT): 174 mb mem is available
>>>>>>>>
>>>>>>>> I guess the issue of my cluster is caused by lacking of memeory,
>>>>>>>> and my questions are:
>>>>>>>> - Without running jobs, what's the minimum memory requirements to
>>>>>>>> datanode and namenode?
>>>>>>>> - How to define the minimum memeory for datanode and namenode?
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>> Sam Liu
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>
>
> --
> Nitin Pawar
>

Re: The minimum memory requirements to datanode and namenode?

Posted by sam liu <sa...@gmail.com>.

I can issue a command 'hadoop dfsadmin -report', but it did not return any
result for a long time. Also, I can open the NN UI(http://namenode:50070),
but it is always keeping in the connecting status, and could not return any
cluster statistic.

The mem of NN:
                  total       used       free
Mem:          3834       3686        148

After running a top command, I can see following process are taking up the
memory: namenode, jobtracker, tasktracker, hbase, ...

I can restart the cluster, and then the cluster will be healthy. But this
issue will probably occur in a few days later. I think it's caused by
lacking of free/available mem, but do not know how many extra
free/available mem of node is required, besides the necessary mem for
running datanode/tasktracker process?




2013/5/13 Nitin Pawar <ni...@gmail.com>

> just one node not having memory does not mean your cluster is down.
>
> Can you see your hdfs health on NN UI?
>
> how much memory do you have on NN? if there are no jobs running on the
> cluster then you can safely restart datanode and tasktracker.
>
> Also run a top command and figure out which processes are taking up the
> memory and for what purpose?
>
>
> On Mon, May 13, 2013 at 11:28 AM, sam liu <sa...@gmail.com> wrote:
>
>> Nitin,
>>
>> In my cluster, the tasktracker and datanode already have been launched,
>> and are still running now. But the free/available mem of node3 now is just
>> 167 mb, and do you think it's the reason why my hadoop is unhealthy now(it
>> does not return result of command 'hadoop dfs -ls /')?
>>
>>
>> 2013/5/13 Nitin Pawar <ni...@gmail.com>
>>
>>> Sam,
>>>
>>> There is no formula for determining how much memory one should give to
>>> datanode and tasktracker. Ther formula is available for how many slots you
>>> want to have on a machine.
>>>
>>> In my prior experience, we did give 512MB memory each to a datanode and
>>> tasktracker.
>>>
>>>
>>> On Mon, May 13, 2013 at 11:18 AM, sam liu <sa...@gmail.com>wrote:
>>>
>>>> For node3, the memory is:
>>>>                    total       used       free     shared
>>>> buffers     cached
>>>> Mem:          3834       3666        167          0        187
>>>> 1136
>>>> -/+ buffers/cache:       2342       1491
>>>> Swap:         8196          0       8196
>>>>
>>>> To a 3 nodes cluster as mine, what's the required minimum
>>>> free/available memory for the datanode process and tasktracker process,
>>>> without running any map/reduce task?
>>>> Any formula to determine it?
>>>>
>>>>
>>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>>>
>>>>> can you tell specs of node3. Even on a test/demo cluster, anything
>>>>> below 4 GB ram makes the node almost inaccessible as per my experience.
>>>>>
>>>>>
>>>>>
>>>>> On Sun, May 12, 2013 at 8:25 PM, sam liu <sa...@gmail.com>wrote:
>>>>>
>>>>>> Got some exceptions on node3:
>>>>>> 1. datanode log:
>>>>>> 2013-04-17 11:13:44,719 INFO
>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
>>>>>> blk_2478755809192724446_1477 received exception
>>>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
>>>>>> channel to be ready for read. ch :
>>>>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>>>>> 9.50.102.79:50010]
>>>>>> 2013-04-17 11:13:44,721 ERROR
>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>>>>>> 9.50.102.80:50010,
>>>>>> storageID=DS-2038715921-9.50.102.80-50010-1366091297051, infoPort=50075,
>>>>>> ipcPort=50020):DataXceiver
>>>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting
>>>>>> for channel to be ready for read. ch :
>>>>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>>>>> 9.50.102.79:50010]
>>>>>>         at
>>>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>>>>>>         at
>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>>>         at
>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>>>         at
>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
>>>>>>         at java.io.DataInputStream.readShort(DataInputStream.java:306)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:359)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
>>>>>>         at java.lang.Thread.run(Thread.java:738)
>>>>>> 2013-04-17 11:13:44,818 INFO
>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>>>>>> blk_8413378381769505032_1477 src: /9.50.102.81:35279 dest: /
>>>>>> 9.50.102.80:50010
>>>>>>
>>>>>>
>>>>>> 2. tasktracker log:
>>>>>> 2013-04-23 11:48:26,783 INFO org.apache.hadoop.mapred.UserLogCleaner:
>>>>>> Deleting user log path job_201304152248_0011
>>>>>> 2013-04-30 14:48:15,506 ERROR org.apache.hadoop.mapred.TaskTracker:
>>>>>> Caught exception: java.io.IOException: Call to node1/9.50.102.81:9001failed on local exception: java.io.IOException: Connection reset by peer
>>>>>>         at
>>>>>> org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
>>>>>>         at org.apache.hadoop.ipc.Client.call(Client.java:1112)
>>>>>>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
>>>>>>         at org.apache.hadoop.mapred.$Proxy2.heartbeat(Unknown Source)
>>>>>>         at
>>>>>> org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:2008)
>>>>>>         at
>>>>>> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1802)
>>>>>>         at
>>>>>> org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2654)
>>>>>>         at
>>>>>> org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3909)
>>>>>> Caused by: java.io.IOException: Connection reset by peer
>>>>>>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>>>>>>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:33)
>>>>>>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:210)
>>>>>>         at sun.nio.ch.IOUtil.read(IOUtil.java:183)
>>>>>>         at
>>>>>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:257)
>>>>>>         at
>>>>>> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>>>>>>         at
>>>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>>>>>>         at
>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>>>         at
>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>>>         at java.io.FilterInputStream.read(FilterInputStream.java:127)
>>>>>>         at
>>>>>> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:361)
>>>>>>         at
>>>>>> java.io.BufferedInputStream.fill(BufferedInputStream.java:229)
>>>>>>         at
>>>>>> java.io.BufferedInputStream.read(BufferedInputStream.java:248)
>>>>>>         at java.io.DataInputStream.readInt(DataInputStream.java:381)
>>>>>>         at
>>>>>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841)
>>>>>>         at
>>>>>> org.apache.hadoop.ipc.Client$Connection.run(Client.java:786)
>>>>>>
>>>>>> 2013-04-30 14:48:15,517 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>> Resending 'status' to 'node1' with reponseId '-12904
>>>>>> 2013-04-30 14:48:16,404 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>> SHUTDOWN_MSG:
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>>>>>
>>>>>>> do you get any error when trying to connect to cluster, something
>>>>>>> like 'tried n times' or replicated 0 times.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sun, May 12, 2013 at 7:28 PM, sam liu <sa...@gmail.com>wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I setup a cluster with 3 nodes, and after that I did not submit any
>>>>>>>> job on it. But, after few days, I found the cluster is unhealthy:
>>>>>>>> - No result returned after issuing command 'hadoop dfs -ls /' or
>>>>>>>> 'hadoop dfsadmin -report' for a while
>>>>>>>> - The page of 'http://namenode:50070' could not be opened as
>>>>>>>> expected...
>>>>>>>> - ...
>>>>>>>>
>>>>>>>> I did not find any usefull info in the logs, but found the avaible
>>>>>>>> memory of the cluster nodes are very low at that time:
>>>>>>>> - node1(NN,JT,DN,TT): 158 mb mem is available
>>>>>>>> - node2(DN,TT): 75 mb mem is available
>>>>>>>> - node3(DN,TT): 174 mb mem is available
>>>>>>>>
>>>>>>>> I guess the issue of my cluster is caused by lacking of memeory,
>>>>>>>> and my questions are:
>>>>>>>> - Without running jobs, what's the minimum memory requirements to
>>>>>>>> datanode and namenode?
>>>>>>>> - How to define the minimum memeory for datanode and namenode?
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>> Sam Liu
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>
>
> --
> Nitin Pawar
>

Re: The minimum memory requirements to datanode and namenode?

Posted by sam liu <sa...@gmail.com>.

I can issue a command 'hadoop dfsadmin -report', but it did not return any
result for a long time. Also, I can open the NN UI(http://namenode:50070),
but it is always keeping in the connecting status, and could not return any
cluster statistic.

The mem of NN:
                  total       used       free
Mem:          3834       3686        148

After running a top command, I can see following process are taking up the
memory: namenode, jobtracker, tasktracker, hbase, ...

I can restart the cluster, and then the cluster will be healthy. But this
issue will probably occur in a few days later. I think it's caused by
lacking of free/available mem, but do not know how many extra
free/available mem of node is required, besides the necessary mem for
running datanode/tasktracker process?




2013/5/13 Nitin Pawar <ni...@gmail.com>

> just one node not having memory does not mean your cluster is down.
>
> Can you see your hdfs health on NN UI?
>
> how much memory do you have on NN? if there are no jobs running on the
> cluster then you can safely restart datanode and tasktracker.
>
> Also run a top command and figure out which processes are taking up the
> memory and for what purpose?
>
>
> On Mon, May 13, 2013 at 11:28 AM, sam liu <sa...@gmail.com> wrote:
>
>> Nitin,
>>
>> In my cluster, the tasktracker and datanode already have been launched,
>> and are still running now. But the free/available mem of node3 now is just
>> 167 mb, and do you think it's the reason why my hadoop is unhealthy now(it
>> does not return result of command 'hadoop dfs -ls /')?
>>
>>
>> 2013/5/13 Nitin Pawar <ni...@gmail.com>
>>
>>> Sam,
>>>
>>> There is no formula for determining how much memory one should give to
>>> datanode and tasktracker. Ther formula is available for how many slots you
>>> want to have on a machine.
>>>
>>> In my prior experience, we did give 512MB memory each to a datanode and
>>> tasktracker.
>>>
>>>
>>> On Mon, May 13, 2013 at 11:18 AM, sam liu <sa...@gmail.com>wrote:
>>>
>>>> For node3, the memory is:
>>>>                    total       used       free     shared
>>>> buffers     cached
>>>> Mem:          3834       3666        167          0        187
>>>> 1136
>>>> -/+ buffers/cache:       2342       1491
>>>> Swap:         8196          0       8196
>>>>
>>>> To a 3 nodes cluster as mine, what's the required minimum
>>>> free/available memory for the datanode process and tasktracker process,
>>>> without running any map/reduce task?
>>>> Any formula to determine it?
>>>>
>>>>
>>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>>>
>>>>> can you tell specs of node3. Even on a test/demo cluster, anything
>>>>> below 4 GB ram makes the node almost inaccessible as per my experience.
>>>>>
>>>>>
>>>>>
>>>>> On Sun, May 12, 2013 at 8:25 PM, sam liu <sa...@gmail.com>wrote:
>>>>>
>>>>>> Got some exceptions on node3:
>>>>>> 1. datanode log:
>>>>>> 2013-04-17 11:13:44,719 INFO
>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
>>>>>> blk_2478755809192724446_1477 received exception
>>>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
>>>>>> channel to be ready for read. ch :
>>>>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>>>>> 9.50.102.79:50010]
>>>>>> 2013-04-17 11:13:44,721 ERROR
>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>>>>>> 9.50.102.80:50010,
>>>>>> storageID=DS-2038715921-9.50.102.80-50010-1366091297051, infoPort=50075,
>>>>>> ipcPort=50020):DataXceiver
>>>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting
>>>>>> for channel to be ready for read. ch :
>>>>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>>>>> 9.50.102.79:50010]
>>>>>>         at
>>>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>>>>>>         at
>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>>>         at
>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>>>         at
>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
>>>>>>         at java.io.DataInputStream.readShort(DataInputStream.java:306)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:359)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
>>>>>>         at java.lang.Thread.run(Thread.java:738)
>>>>>> 2013-04-17 11:13:44,818 INFO
>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>>>>>> blk_8413378381769505032_1477 src: /9.50.102.81:35279 dest: /
>>>>>> 9.50.102.80:50010
>>>>>>
>>>>>>
>>>>>> 2. tasktracker log:
>>>>>> 2013-04-23 11:48:26,783 INFO org.apache.hadoop.mapred.UserLogCleaner:
>>>>>> Deleting user log path job_201304152248_0011
>>>>>> 2013-04-30 14:48:15,506 ERROR org.apache.hadoop.mapred.TaskTracker:
>>>>>> Caught exception: java.io.IOException: Call to node1/9.50.102.81:9001failed on local exception: java.io.IOException: Connection reset by peer
>>>>>>         at
>>>>>> org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
>>>>>>         at org.apache.hadoop.ipc.Client.call(Client.java:1112)
>>>>>>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
>>>>>>         at org.apache.hadoop.mapred.$Proxy2.heartbeat(Unknown Source)
>>>>>>         at
>>>>>> org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:2008)
>>>>>>         at
>>>>>> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1802)
>>>>>>         at
>>>>>> org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2654)
>>>>>>         at
>>>>>> org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3909)
>>>>>> Caused by: java.io.IOException: Connection reset by peer
>>>>>>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>>>>>>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:33)
>>>>>>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:210)
>>>>>>         at sun.nio.ch.IOUtil.read(IOUtil.java:183)
>>>>>>         at
>>>>>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:257)
>>>>>>         at
>>>>>> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>>>>>>         at
>>>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>>>>>>         at
>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>>>         at
>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>>>         at java.io.FilterInputStream.read(FilterInputStream.java:127)
>>>>>>         at
>>>>>> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:361)
>>>>>>         at
>>>>>> java.io.BufferedInputStream.fill(BufferedInputStream.java:229)
>>>>>>         at
>>>>>> java.io.BufferedInputStream.read(BufferedInputStream.java:248)
>>>>>>         at java.io.DataInputStream.readInt(DataInputStream.java:381)
>>>>>>         at
>>>>>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841)
>>>>>>         at
>>>>>> org.apache.hadoop.ipc.Client$Connection.run(Client.java:786)
>>>>>>
>>>>>> 2013-04-30 14:48:15,517 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>> Resending 'status' to 'node1' with reponseId '-12904
>>>>>> 2013-04-30 14:48:16,404 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>> SHUTDOWN_MSG:
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>>>>>
>>>>>>> do you get any error when trying to connect to cluster, something
>>>>>>> like 'tried n times' or replicated 0 times.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sun, May 12, 2013 at 7:28 PM, sam liu <sa...@gmail.com>wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I setup a cluster with 3 nodes, and after that I did not submit any
>>>>>>>> job on it. But, after few days, I found the cluster is unhealthy:
>>>>>>>> - No result returned after issuing command 'hadoop dfs -ls /' or
>>>>>>>> 'hadoop dfsadmin -report' for a while
>>>>>>>> - The page of 'http://namenode:50070' could not be opened as
>>>>>>>> expected...
>>>>>>>> - ...
>>>>>>>>
>>>>>>>> I did not find any usefull info in the logs, but found the avaible
>>>>>>>> memory of the cluster nodes are very low at that time:
>>>>>>>> - node1(NN,JT,DN,TT): 158 mb mem is available
>>>>>>>> - node2(DN,TT): 75 mb mem is available
>>>>>>>> - node3(DN,TT): 174 mb mem is available
>>>>>>>>
>>>>>>>> I guess the issue of my cluster is caused by lacking of memeory,
>>>>>>>> and my questions are:
>>>>>>>> - Without running jobs, what's the minimum memory requirements to
>>>>>>>> datanode and namenode?
>>>>>>>> - How to define the minimum memeory for datanode and namenode?
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>> Sam Liu
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>
>
> --
> Nitin Pawar
>

Re: The minimum memory requirements to datanode and namenode?

Posted by sam liu <sa...@gmail.com>.

I can issue a command 'hadoop dfsadmin -report', but it did not return any
result for a long time. Also, I can open the NN UI(http://namenode:50070),
but it is always keeping in the connecting status, and could not return any
cluster statistic.

The mem of NN:
                  total       used       free
Mem:          3834       3686        148

After running a top command, I can see following process are taking up the
memory: namenode, jobtracker, tasktracker, hbase, ...

I can restart the cluster, and then the cluster will be healthy. But this
issue will probably occur in a few days later. I think it's caused by
lacking of free/available mem, but do not know how many extra
free/available mem of node is required, besides the necessary mem for
running datanode/tasktracker process?




2013/5/13 Nitin Pawar <ni...@gmail.com>

> just one node not having memory does not mean your cluster is down.
>
> Can you see your hdfs health on NN UI?
>
> how much memory do you have on NN? if there are no jobs running on the
> cluster then you can safely restart datanode and tasktracker.
>
> Also run a top command and figure out which processes are taking up the
> memory and for what purpose?
>
>
> On Mon, May 13, 2013 at 11:28 AM, sam liu <sa...@gmail.com> wrote:
>
>> Nitin,
>>
>> In my cluster, the tasktracker and datanode already have been launched,
>> and are still running now. But the free/available mem of node3 now is just
>> 167 mb, and do you think it's the reason why my hadoop is unhealthy now(it
>> does not return result of command 'hadoop dfs -ls /')?
>>
>>
>> 2013/5/13 Nitin Pawar <ni...@gmail.com>
>>
>>> Sam,
>>>
>>> There is no formula for determining how much memory one should give to
>>> datanode and tasktracker. Ther formula is available for how many slots you
>>> want to have on a machine.
>>>
>>> In my prior experience, we did give 512MB memory each to a datanode and
>>> tasktracker.
>>>
>>>
>>> On Mon, May 13, 2013 at 11:18 AM, sam liu <sa...@gmail.com>wrote:
>>>
>>>> For node3, the memory is:
>>>>                    total       used       free     shared
>>>> buffers     cached
>>>> Mem:          3834       3666        167          0        187
>>>> 1136
>>>> -/+ buffers/cache:       2342       1491
>>>> Swap:         8196          0       8196
>>>>
>>>> To a 3 nodes cluster as mine, what's the required minimum
>>>> free/available memory for the datanode process and tasktracker process,
>>>> without running any map/reduce task?
>>>> Any formula to determine it?
>>>>
>>>>
>>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>>>
>>>>> can you tell specs of node3. Even on a test/demo cluster, anything
>>>>> below 4 GB ram makes the node almost inaccessible as per my experience.
>>>>>
>>>>>
>>>>>
>>>>> On Sun, May 12, 2013 at 8:25 PM, sam liu <sa...@gmail.com>wrote:
>>>>>
>>>>>> Got some exceptions on node3:
>>>>>> 1. datanode log:
>>>>>> 2013-04-17 11:13:44,719 INFO
>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
>>>>>> blk_2478755809192724446_1477 received exception
>>>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
>>>>>> channel to be ready for read. ch :
>>>>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>>>>> 9.50.102.79:50010]
>>>>>> 2013-04-17 11:13:44,721 ERROR
>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>>>>>> 9.50.102.80:50010,
>>>>>> storageID=DS-2038715921-9.50.102.80-50010-1366091297051, infoPort=50075,
>>>>>> ipcPort=50020):DataXceiver
>>>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting
>>>>>> for channel to be ready for read. ch :
>>>>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>>>>> 9.50.102.79:50010]
>>>>>>         at
>>>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>>>>>>         at
>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>>>         at
>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>>>         at
>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
>>>>>>         at java.io.DataInputStream.readShort(DataInputStream.java:306)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:359)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
>>>>>>         at java.lang.Thread.run(Thread.java:738)
>>>>>> 2013-04-17 11:13:44,818 INFO
>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>>>>>> blk_8413378381769505032_1477 src: /9.50.102.81:35279 dest: /
>>>>>> 9.50.102.80:50010
>>>>>>
>>>>>>
>>>>>> 2. tasktracker log:
>>>>>> 2013-04-23 11:48:26,783 INFO org.apache.hadoop.mapred.UserLogCleaner:
>>>>>> Deleting user log path job_201304152248_0011
>>>>>> 2013-04-30 14:48:15,506 ERROR org.apache.hadoop.mapred.TaskTracker:
>>>>>> Caught exception: java.io.IOException: Call to node1/9.50.102.81:9001failed on local exception: java.io.IOException: Connection reset by peer
>>>>>>         at
>>>>>> org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
>>>>>>         at org.apache.hadoop.ipc.Client.call(Client.java:1112)
>>>>>>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
>>>>>>         at org.apache.hadoop.mapred.$Proxy2.heartbeat(Unknown Source)
>>>>>>         at
>>>>>> org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:2008)
>>>>>>         at
>>>>>> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1802)
>>>>>>         at
>>>>>> org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2654)
>>>>>>         at
>>>>>> org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3909)
>>>>>> Caused by: java.io.IOException: Connection reset by peer
>>>>>>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>>>>>>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:33)
>>>>>>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:210)
>>>>>>         at sun.nio.ch.IOUtil.read(IOUtil.java:183)
>>>>>>         at
>>>>>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:257)
>>>>>>         at
>>>>>> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>>>>>>         at
>>>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>>>>>>         at
>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>>>         at
>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>>>         at java.io.FilterInputStream.read(FilterInputStream.java:127)
>>>>>>         at
>>>>>> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:361)
>>>>>>         at
>>>>>> java.io.BufferedInputStream.fill(BufferedInputStream.java:229)
>>>>>>         at
>>>>>> java.io.BufferedInputStream.read(BufferedInputStream.java:248)
>>>>>>         at java.io.DataInputStream.readInt(DataInputStream.java:381)
>>>>>>         at
>>>>>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841)
>>>>>>         at
>>>>>> org.apache.hadoop.ipc.Client$Connection.run(Client.java:786)
>>>>>>
>>>>>> 2013-04-30 14:48:15,517 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>> Resending 'status' to 'node1' with reponseId '-12904
>>>>>> 2013-04-30 14:48:16,404 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>> SHUTDOWN_MSG:
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>>>>>
>>>>>>> do you get any error when trying to connect to cluster, something
>>>>>>> like 'tried n times' or replicated 0 times.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sun, May 12, 2013 at 7:28 PM, sam liu <sa...@gmail.com>wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I setup a cluster with 3 nodes, and after that I did not submit any
>>>>>>>> job on it. But, after few days, I found the cluster is unhealthy:
>>>>>>>> - No result returned after issuing command 'hadoop dfs -ls /' or
>>>>>>>> 'hadoop dfsadmin -report' for a while
>>>>>>>> - The page of 'http://namenode:50070' could not be opened as
>>>>>>>> expected...
>>>>>>>> - ...
>>>>>>>>
>>>>>>>> I did not find any usefull info in the logs, but found the avaible
>>>>>>>> memory of the cluster nodes are very low at that time:
>>>>>>>> - node1(NN,JT,DN,TT): 158 mb mem is available
>>>>>>>> - node2(DN,TT): 75 mb mem is available
>>>>>>>> - node3(DN,TT): 174 mb mem is available
>>>>>>>>
>>>>>>>> I guess the issue of my cluster is caused by lacking of memeory,
>>>>>>>> and my questions are:
>>>>>>>> - Without running jobs, what's the minimum memory requirements to
>>>>>>>> datanode and namenode?
>>>>>>>> - How to define the minimum memeory for datanode and namenode?
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>> Sam Liu
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>
>
> --
> Nitin Pawar
>

Re: The minimum memory requirements to datanode and namenode?

Posted by Nitin Pawar <ni...@gmail.com>.

just one node not having memory does not mean your cluster is down.

Can you see your hdfs health on NN UI?

how much memory do you have on NN? if there are no jobs running on the
cluster then you can safely restart datanode and tasktracker.

Also run a top command and figure out which processes are taking up the
memory and for what purpose?


On Mon, May 13, 2013 at 11:28 AM, sam liu <sa...@gmail.com> wrote:

> Nitin,
>
> In my cluster, the tasktracker and datanode already have been launched,
> and are still running now. But the free/available mem of node3 now is just
> 167 mb, and do you think it's the reason why my hadoop is unhealthy now(it
> does not return result of command 'hadoop dfs -ls /')?
>
>
> 2013/5/13 Nitin Pawar <ni...@gmail.com>
>
>> Sam,
>>
>> There is no formula for determining how much memory one should give to
>> datanode and tasktracker. Ther formula is available for how many slots you
>> want to have on a machine.
>>
>> In my prior experience, we did give 512MB memory each to a datanode and
>> tasktracker.
>>
>>
>> On Mon, May 13, 2013 at 11:18 AM, sam liu <sa...@gmail.com> wrote:
>>
>>> For node3, the memory is:
>>>                    total       used       free     shared    buffers
>>> cached
>>> Mem:          3834       3666        167          0        187       1136
>>> -/+ buffers/cache:       2342       1491
>>> Swap:         8196          0       8196
>>>
>>> To a 3 nodes cluster as mine, what's the required minimum free/available
>>> memory for the datanode process and tasktracker process, without running
>>> any map/reduce task?
>>> Any formula to determine it?
>>>
>>>
>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>>
>>>> can you tell specs of node3. Even on a test/demo cluster, anything
>>>> below 4 GB ram makes the node almost inaccessible as per my experience.
>>>>
>>>>
>>>>
>>>> On Sun, May 12, 2013 at 8:25 PM, sam liu <sa...@gmail.com>wrote:
>>>>
>>>>> Got some exceptions on node3:
>>>>> 1. datanode log:
>>>>> 2013-04-17 11:13:44,719 INFO
>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
>>>>> blk_2478755809192724446_1477 received exception
>>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
>>>>> channel to be ready for read. ch :
>>>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>>>> 9.50.102.79:50010]
>>>>> 2013-04-17 11:13:44,721 ERROR
>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>>>>> 9.50.102.80:50010,
>>>>> storageID=DS-2038715921-9.50.102.80-50010-1366091297051, infoPort=50075,
>>>>> ipcPort=50020):DataXceiver
>>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting
>>>>> for channel to be ready for read. ch :
>>>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>>>> 9.50.102.79:50010]
>>>>>         at
>>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>>>>>         at
>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>>         at
>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>>         at
>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
>>>>>         at java.io.DataInputStream.readShort(DataInputStream.java:306)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:359)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
>>>>>         at java.lang.Thread.run(Thread.java:738)
>>>>> 2013-04-17 11:13:44,818 INFO
>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>>>>> blk_8413378381769505032_1477 src: /9.50.102.81:35279 dest: /
>>>>> 9.50.102.80:50010
>>>>>
>>>>>
>>>>> 2. tasktracker log:
>>>>> 2013-04-23 11:48:26,783 INFO org.apache.hadoop.mapred.UserLogCleaner:
>>>>> Deleting user log path job_201304152248_0011
>>>>> 2013-04-30 14:48:15,506 ERROR org.apache.hadoop.mapred.TaskTracker:
>>>>> Caught exception: java.io.IOException: Call to node1/9.50.102.81:9001failed on local exception: java.io.IOException: Connection reset by peer
>>>>>         at org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
>>>>>         at org.apache.hadoop.ipc.Client.call(Client.java:1112)
>>>>>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
>>>>>         at org.apache.hadoop.mapred.$Proxy2.heartbeat(Unknown Source)
>>>>>         at
>>>>> org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:2008)
>>>>>         at
>>>>> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1802)
>>>>>         at
>>>>> org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2654)
>>>>>         at
>>>>> org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3909)
>>>>> Caused by: java.io.IOException: Connection reset by peer
>>>>>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>>>>>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:33)
>>>>>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:210)
>>>>>         at sun.nio.ch.IOUtil.read(IOUtil.java:183)
>>>>>         at
>>>>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:257)
>>>>>         at
>>>>> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>>>>>         at
>>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>>>>>         at
>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>>         at
>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>>         at java.io.FilterInputStream.read(FilterInputStream.java:127)
>>>>>         at
>>>>> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:361)
>>>>>         at
>>>>> java.io.BufferedInputStream.fill(BufferedInputStream.java:229)
>>>>>         at
>>>>> java.io.BufferedInputStream.read(BufferedInputStream.java:248)
>>>>>         at java.io.DataInputStream.readInt(DataInputStream.java:381)
>>>>>         at
>>>>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841)
>>>>>         at org.apache.hadoop.ipc.Client$Connection.run(Client.java:786)
>>>>>
>>>>> 2013-04-30 14:48:15,517 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>> Resending 'status' to 'node1' with reponseId '-12904
>>>>> 2013-04-30 14:48:16,404 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>> SHUTDOWN_MSG:
>>>>>
>>>>>
>>>>>
>>>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>>>>
>>>>>> do you get any error when trying to connect to cluster, something
>>>>>> like 'tried n times' or replicated 0 times.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sun, May 12, 2013 at 7:28 PM, sam liu <sa...@gmail.com>wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I setup a cluster with 3 nodes, and after that I did not submit any
>>>>>>> job on it. But, after few days, I found the cluster is unhealthy:
>>>>>>> - No result returned after issuing command 'hadoop dfs -ls /' or
>>>>>>> 'hadoop dfsadmin -report' for a while
>>>>>>> - The page of 'http://namenode:50070' could not be opened as
>>>>>>> expected...
>>>>>>> - ...
>>>>>>>
>>>>>>> I did not find any usefull info in the logs, but found the avaible
>>>>>>> memory of the cluster nodes are very low at that time:
>>>>>>> - node1(NN,JT,DN,TT): 158 mb mem is available
>>>>>>> - node2(DN,TT): 75 mb mem is available
>>>>>>> - node3(DN,TT): 174 mb mem is available
>>>>>>>
>>>>>>> I guess the issue of my cluster is caused by lacking of memeory, and
>>>>>>> my questions are:
>>>>>>> - Without running jobs, what's the minimum memory requirements to
>>>>>>> datanode and namenode?
>>>>>>> - How to define the minimum memeory for datanode and namenode?
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> Sam Liu
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>


-- 
Nitin Pawar

Re: The minimum memory requirements to datanode and namenode?

Posted by Nitin Pawar <ni...@gmail.com>.

just one node not having memory does not mean your cluster is down.

Can you see your hdfs health on NN UI?

how much memory do you have on NN? if there are no jobs running on the
cluster then you can safely restart datanode and tasktracker.

Also run a top command and figure out which processes are taking up the
memory and for what purpose?


On Mon, May 13, 2013 at 11:28 AM, sam liu <sa...@gmail.com> wrote:

> Nitin,
>
> In my cluster, the tasktracker and datanode already have been launched,
> and are still running now. But the free/available mem of node3 now is just
> 167 mb, and do you think it's the reason why my hadoop is unhealthy now(it
> does not return result of command 'hadoop dfs -ls /')?
>
>
> 2013/5/13 Nitin Pawar <ni...@gmail.com>
>
>> Sam,
>>
>> There is no formula for determining how much memory one should give to
>> datanode and tasktracker. Ther formula is available for how many slots you
>> want to have on a machine.
>>
>> In my prior experience, we did give 512MB memory each to a datanode and
>> tasktracker.
>>
>>
>> On Mon, May 13, 2013 at 11:18 AM, sam liu <sa...@gmail.com> wrote:
>>
>>> For node3, the memory is:
>>>                    total       used       free     shared    buffers
>>> cached
>>> Mem:          3834       3666        167          0        187       1136
>>> -/+ buffers/cache:       2342       1491
>>> Swap:         8196          0       8196
>>>
>>> To a 3 nodes cluster as mine, what's the required minimum free/available
>>> memory for the datanode process and tasktracker process, without running
>>> any map/reduce task?
>>> Any formula to determine it?
>>>
>>>
>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>>
>>>> can you tell specs of node3. Even on a test/demo cluster, anything
>>>> below 4 GB ram makes the node almost inaccessible as per my experience.
>>>>
>>>>
>>>>
>>>> On Sun, May 12, 2013 at 8:25 PM, sam liu <sa...@gmail.com>wrote:
>>>>
>>>>> Got some exceptions on node3:
>>>>> 1. datanode log:
>>>>> 2013-04-17 11:13:44,719 INFO
>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
>>>>> blk_2478755809192724446_1477 received exception
>>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
>>>>> channel to be ready for read. ch :
>>>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>>>> 9.50.102.79:50010]
>>>>> 2013-04-17 11:13:44,721 ERROR
>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>>>>> 9.50.102.80:50010,
>>>>> storageID=DS-2038715921-9.50.102.80-50010-1366091297051, infoPort=50075,
>>>>> ipcPort=50020):DataXceiver
>>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting
>>>>> for channel to be ready for read. ch :
>>>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>>>> 9.50.102.79:50010]
>>>>>         at
>>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>>>>>         at
>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>>         at
>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>>         at
>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
>>>>>         at java.io.DataInputStream.readShort(DataInputStream.java:306)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:359)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
>>>>>         at java.lang.Thread.run(Thread.java:738)
>>>>> 2013-04-17 11:13:44,818 INFO
>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>>>>> blk_8413378381769505032_1477 src: /9.50.102.81:35279 dest: /
>>>>> 9.50.102.80:50010
>>>>>
>>>>>
>>>>> 2. tasktracker log:
>>>>> 2013-04-23 11:48:26,783 INFO org.apache.hadoop.mapred.UserLogCleaner:
>>>>> Deleting user log path job_201304152248_0011
>>>>> 2013-04-30 14:48:15,506 ERROR org.apache.hadoop.mapred.TaskTracker:
>>>>> Caught exception: java.io.IOException: Call to node1/9.50.102.81:9001failed on local exception: java.io.IOException: Connection reset by peer
>>>>>         at org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
>>>>>         at org.apache.hadoop.ipc.Client.call(Client.java:1112)
>>>>>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
>>>>>         at org.apache.hadoop.mapred.$Proxy2.heartbeat(Unknown Source)
>>>>>         at
>>>>> org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:2008)
>>>>>         at
>>>>> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1802)
>>>>>         at
>>>>> org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2654)
>>>>>         at
>>>>> org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3909)
>>>>> Caused by: java.io.IOException: Connection reset by peer
>>>>>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>>>>>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:33)
>>>>>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:210)
>>>>>         at sun.nio.ch.IOUtil.read(IOUtil.java:183)
>>>>>         at
>>>>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:257)
>>>>>         at
>>>>> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>>>>>         at
>>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>>>>>         at
>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>>         at
>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>>         at java.io.FilterInputStream.read(FilterInputStream.java:127)
>>>>>         at
>>>>> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:361)
>>>>>         at
>>>>> java.io.BufferedInputStream.fill(BufferedInputStream.java:229)
>>>>>         at
>>>>> java.io.BufferedInputStream.read(BufferedInputStream.java:248)
>>>>>         at java.io.DataInputStream.readInt(DataInputStream.java:381)
>>>>>         at
>>>>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841)
>>>>>         at org.apache.hadoop.ipc.Client$Connection.run(Client.java:786)
>>>>>
>>>>> 2013-04-30 14:48:15,517 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>> Resending 'status' to 'node1' with reponseId '-12904
>>>>> 2013-04-30 14:48:16,404 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>> SHUTDOWN_MSG:
>>>>>
>>>>>
>>>>>
>>>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>>>>
>>>>>> do you get any error when trying to connect to cluster, something
>>>>>> like 'tried n times' or replicated 0 times.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sun, May 12, 2013 at 7:28 PM, sam liu <sa...@gmail.com>wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I setup a cluster with 3 nodes, and after that I did not submit any
>>>>>>> job on it. But, after few days, I found the cluster is unhealthy:
>>>>>>> - No result returned after issuing command 'hadoop dfs -ls /' or
>>>>>>> 'hadoop dfsadmin -report' for a while
>>>>>>> - The page of 'http://namenode:50070' could not be opened as
>>>>>>> expected...
>>>>>>> - ...
>>>>>>>
>>>>>>> I did not find any usefull info in the logs, but found the avaible
>>>>>>> memory of the cluster nodes are very low at that time:
>>>>>>> - node1(NN,JT,DN,TT): 158 mb mem is available
>>>>>>> - node2(DN,TT): 75 mb mem is available
>>>>>>> - node3(DN,TT): 174 mb mem is available
>>>>>>>
>>>>>>> I guess the issue of my cluster is caused by lacking of memeory, and
>>>>>>> my questions are:
>>>>>>> - Without running jobs, what's the minimum memory requirements to
>>>>>>> datanode and namenode?
>>>>>>> - How to define the minimum memeory for datanode and namenode?
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> Sam Liu
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>


-- 
Nitin Pawar

Re: The minimum memory requirements to datanode and namenode?

Posted by Nitin Pawar <ni...@gmail.com>.

just one node not having memory does not mean your cluster is down.

Can you see your hdfs health on NN UI?

how much memory do you have on NN? if there are no jobs running on the
cluster then you can safely restart datanode and tasktracker.

Also run a top command and figure out which processes are taking up the
memory and for what purpose?


On Mon, May 13, 2013 at 11:28 AM, sam liu <sa...@gmail.com> wrote:

> Nitin,
>
> In my cluster, the tasktracker and datanode already have been launched,
> and are still running now. But the free/available mem of node3 now is just
> 167 mb, and do you think it's the reason why my hadoop is unhealthy now(it
> does not return result of command 'hadoop dfs -ls /')?
>
>
> 2013/5/13 Nitin Pawar <ni...@gmail.com>
>
>> Sam,
>>
>> There is no formula for determining how much memory one should give to
>> datanode and tasktracker. Ther formula is available for how many slots you
>> want to have on a machine.
>>
>> In my prior experience, we did give 512MB memory each to a datanode and
>> tasktracker.
>>
>>
>> On Mon, May 13, 2013 at 11:18 AM, sam liu <sa...@gmail.com> wrote:
>>
>>> For node3, the memory is:
>>>                    total       used       free     shared    buffers
>>> cached
>>> Mem:          3834       3666        167          0        187       1136
>>> -/+ buffers/cache:       2342       1491
>>> Swap:         8196          0       8196
>>>
>>> To a 3 nodes cluster as mine, what's the required minimum free/available
>>> memory for the datanode process and tasktracker process, without running
>>> any map/reduce task?
>>> Any formula to determine it?
>>>
>>>
>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>>
>>>> can you tell specs of node3. Even on a test/demo cluster, anything
>>>> below 4 GB ram makes the node almost inaccessible as per my experience.
>>>>
>>>>
>>>>
>>>> On Sun, May 12, 2013 at 8:25 PM, sam liu <sa...@gmail.com>wrote:
>>>>
>>>>> Got some exceptions on node3:
>>>>> 1. datanode log:
>>>>> 2013-04-17 11:13:44,719 INFO
>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
>>>>> blk_2478755809192724446_1477 received exception
>>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
>>>>> channel to be ready for read. ch :
>>>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>>>> 9.50.102.79:50010]
>>>>> 2013-04-17 11:13:44,721 ERROR
>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>>>>> 9.50.102.80:50010,
>>>>> storageID=DS-2038715921-9.50.102.80-50010-1366091297051, infoPort=50075,
>>>>> ipcPort=50020):DataXceiver
>>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting
>>>>> for channel to be ready for read. ch :
>>>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>>>> 9.50.102.79:50010]
>>>>>         at
>>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>>>>>         at
>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>>         at
>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>>         at
>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
>>>>>         at java.io.DataInputStream.readShort(DataInputStream.java:306)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:359)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
>>>>>         at java.lang.Thread.run(Thread.java:738)
>>>>> 2013-04-17 11:13:44,818 INFO
>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>>>>> blk_8413378381769505032_1477 src: /9.50.102.81:35279 dest: /
>>>>> 9.50.102.80:50010
>>>>>
>>>>>
>>>>> 2. tasktracker log:
>>>>> 2013-04-23 11:48:26,783 INFO org.apache.hadoop.mapred.UserLogCleaner:
>>>>> Deleting user log path job_201304152248_0011
>>>>> 2013-04-30 14:48:15,506 ERROR org.apache.hadoop.mapred.TaskTracker:
>>>>> Caught exception: java.io.IOException: Call to node1/9.50.102.81:9001failed on local exception: java.io.IOException: Connection reset by peer
>>>>>         at org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
>>>>>         at org.apache.hadoop.ipc.Client.call(Client.java:1112)
>>>>>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
>>>>>         at org.apache.hadoop.mapred.$Proxy2.heartbeat(Unknown Source)
>>>>>         at
>>>>> org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:2008)
>>>>>         at
>>>>> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1802)
>>>>>         at
>>>>> org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2654)
>>>>>         at
>>>>> org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3909)
>>>>> Caused by: java.io.IOException: Connection reset by peer
>>>>>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>>>>>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:33)
>>>>>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:210)
>>>>>         at sun.nio.ch.IOUtil.read(IOUtil.java:183)
>>>>>         at
>>>>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:257)
>>>>>         at
>>>>> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>>>>>         at
>>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>>>>>         at
>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>>         at
>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>>         at java.io.FilterInputStream.read(FilterInputStream.java:127)
>>>>>         at
>>>>> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:361)
>>>>>         at
>>>>> java.io.BufferedInputStream.fill(BufferedInputStream.java:229)
>>>>>         at
>>>>> java.io.BufferedInputStream.read(BufferedInputStream.java:248)
>>>>>         at java.io.DataInputStream.readInt(DataInputStream.java:381)
>>>>>         at
>>>>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841)
>>>>>         at org.apache.hadoop.ipc.Client$Connection.run(Client.java:786)
>>>>>
>>>>> 2013-04-30 14:48:15,517 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>> Resending 'status' to 'node1' with reponseId '-12904
>>>>> 2013-04-30 14:48:16,404 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>> SHUTDOWN_MSG:
>>>>>
>>>>>
>>>>>
>>>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>>>>
>>>>>> do you get any error when trying to connect to cluster, something
>>>>>> like 'tried n times' or replicated 0 times.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sun, May 12, 2013 at 7:28 PM, sam liu <sa...@gmail.com>wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I setup a cluster with 3 nodes, and after that I did not submit any
>>>>>>> job on it. But, after few days, I found the cluster is unhealthy:
>>>>>>> - No result returned after issuing command 'hadoop dfs -ls /' or
>>>>>>> 'hadoop dfsadmin -report' for a while
>>>>>>> - The page of 'http://namenode:50070' could not be opened as
>>>>>>> expected...
>>>>>>> - ...
>>>>>>>
>>>>>>> I did not find any usefull info in the logs, but found the avaible
>>>>>>> memory of the cluster nodes are very low at that time:
>>>>>>> - node1(NN,JT,DN,TT): 158 mb mem is available
>>>>>>> - node2(DN,TT): 75 mb mem is available
>>>>>>> - node3(DN,TT): 174 mb mem is available
>>>>>>>
>>>>>>> I guess the issue of my cluster is caused by lacking of memeory, and
>>>>>>> my questions are:
>>>>>>> - Without running jobs, what's the minimum memory requirements to
>>>>>>> datanode and namenode?
>>>>>>> - How to define the minimum memeory for datanode and namenode?
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> Sam Liu
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>


-- 
Nitin Pawar

Re: The minimum memory requirements to datanode and namenode?

Posted by Nitin Pawar <ni...@gmail.com>.

just one node not having memory does not mean your cluster is down.

Can you see your hdfs health on NN UI?

how much memory do you have on NN? if there are no jobs running on the
cluster then you can safely restart datanode and tasktracker.

Also run a top command and figure out which processes are taking up the
memory and for what purpose?


On Mon, May 13, 2013 at 11:28 AM, sam liu <sa...@gmail.com> wrote:

> Nitin,
>
> In my cluster, the tasktracker and datanode already have been launched,
> and are still running now. But the free/available mem of node3 now is just
> 167 mb, and do you think it's the reason why my hadoop is unhealthy now(it
> does not return result of command 'hadoop dfs -ls /')?
>
>
> 2013/5/13 Nitin Pawar <ni...@gmail.com>
>
>> Sam,
>>
>> There is no formula for determining how much memory one should give to
>> datanode and tasktracker. Ther formula is available for how many slots you
>> want to have on a machine.
>>
>> In my prior experience, we did give 512MB memory each to a datanode and
>> tasktracker.
>>
>>
>> On Mon, May 13, 2013 at 11:18 AM, sam liu <sa...@gmail.com> wrote:
>>
>>> For node3, the memory is:
>>>                    total       used       free     shared    buffers
>>> cached
>>> Mem:          3834       3666        167          0        187       1136
>>> -/+ buffers/cache:       2342       1491
>>> Swap:         8196          0       8196
>>>
>>> To a 3 nodes cluster as mine, what's the required minimum free/available
>>> memory for the datanode process and tasktracker process, without running
>>> any map/reduce task?
>>> Any formula to determine it?
>>>
>>>
>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>>
>>>> can you tell specs of node3. Even on a test/demo cluster, anything
>>>> below 4 GB ram makes the node almost inaccessible as per my experience.
>>>>
>>>>
>>>>
>>>> On Sun, May 12, 2013 at 8:25 PM, sam liu <sa...@gmail.com>wrote:
>>>>
>>>>> Got some exceptions on node3:
>>>>> 1. datanode log:
>>>>> 2013-04-17 11:13:44,719 INFO
>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
>>>>> blk_2478755809192724446_1477 received exception
>>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
>>>>> channel to be ready for read. ch :
>>>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>>>> 9.50.102.79:50010]
>>>>> 2013-04-17 11:13:44,721 ERROR
>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>>>>> 9.50.102.80:50010,
>>>>> storageID=DS-2038715921-9.50.102.80-50010-1366091297051, infoPort=50075,
>>>>> ipcPort=50020):DataXceiver
>>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting
>>>>> for channel to be ready for read. ch :
>>>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>>>> 9.50.102.79:50010]
>>>>>         at
>>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>>>>>         at
>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>>         at
>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>>         at
>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
>>>>>         at java.io.DataInputStream.readShort(DataInputStream.java:306)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:359)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
>>>>>         at java.lang.Thread.run(Thread.java:738)
>>>>> 2013-04-17 11:13:44,818 INFO
>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>>>>> blk_8413378381769505032_1477 src: /9.50.102.81:35279 dest: /
>>>>> 9.50.102.80:50010
>>>>>
>>>>>
>>>>> 2. tasktracker log:
>>>>> 2013-04-23 11:48:26,783 INFO org.apache.hadoop.mapred.UserLogCleaner:
>>>>> Deleting user log path job_201304152248_0011
>>>>> 2013-04-30 14:48:15,506 ERROR org.apache.hadoop.mapred.TaskTracker:
>>>>> Caught exception: java.io.IOException: Call to node1/9.50.102.81:9001failed on local exception: java.io.IOException: Connection reset by peer
>>>>>         at org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
>>>>>         at org.apache.hadoop.ipc.Client.call(Client.java:1112)
>>>>>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
>>>>>         at org.apache.hadoop.mapred.$Proxy2.heartbeat(Unknown Source)
>>>>>         at
>>>>> org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:2008)
>>>>>         at
>>>>> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1802)
>>>>>         at
>>>>> org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2654)
>>>>>         at
>>>>> org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3909)
>>>>> Caused by: java.io.IOException: Connection reset by peer
>>>>>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>>>>>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:33)
>>>>>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:210)
>>>>>         at sun.nio.ch.IOUtil.read(IOUtil.java:183)
>>>>>         at
>>>>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:257)
>>>>>         at
>>>>> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>>>>>         at
>>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>>>>>         at
>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>>         at
>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>>         at java.io.FilterInputStream.read(FilterInputStream.java:127)
>>>>>         at
>>>>> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:361)
>>>>>         at
>>>>> java.io.BufferedInputStream.fill(BufferedInputStream.java:229)
>>>>>         at
>>>>> java.io.BufferedInputStream.read(BufferedInputStream.java:248)
>>>>>         at java.io.DataInputStream.readInt(DataInputStream.java:381)
>>>>>         at
>>>>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841)
>>>>>         at org.apache.hadoop.ipc.Client$Connection.run(Client.java:786)
>>>>>
>>>>> 2013-04-30 14:48:15,517 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>> Resending 'status' to 'node1' with reponseId '-12904
>>>>> 2013-04-30 14:48:16,404 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>> SHUTDOWN_MSG:
>>>>>
>>>>>
>>>>>
>>>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>>>>
>>>>>> do you get any error when trying to connect to cluster, something
>>>>>> like 'tried n times' or replicated 0 times.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sun, May 12, 2013 at 7:28 PM, sam liu <sa...@gmail.com>wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I setup a cluster with 3 nodes, and after that I did not submit any
>>>>>>> job on it. But, after few days, I found the cluster is unhealthy:
>>>>>>> - No result returned after issuing command 'hadoop dfs -ls /' or
>>>>>>> 'hadoop dfsadmin -report' for a while
>>>>>>> - The page of 'http://namenode:50070' could not be opened as
>>>>>>> expected...
>>>>>>> - ...
>>>>>>>
>>>>>>> I did not find any usefull info in the logs, but found the avaible
>>>>>>> memory of the cluster nodes are very low at that time:
>>>>>>> - node1(NN,JT,DN,TT): 158 mb mem is available
>>>>>>> - node2(DN,TT): 75 mb mem is available
>>>>>>> - node3(DN,TT): 174 mb mem is available
>>>>>>>
>>>>>>> I guess the issue of my cluster is caused by lacking of memeory, and
>>>>>>> my questions are:
>>>>>>> - Without running jobs, what's the minimum memory requirements to
>>>>>>> datanode and namenode?
>>>>>>> - How to define the minimum memeory for datanode and namenode?
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> Sam Liu
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>


-- 
Nitin Pawar

Re: The minimum memory requirements to datanode and namenode?

Posted by sam liu <sa...@gmail.com>.

Nitin,

In my cluster, the tasktracker and datanode already have been launched, and
are still running now. But the free/available mem of node3 now is just 167
mb, and do you think it's the reason why my hadoop is unhealthy now(it does
not return result of command 'hadoop dfs -ls /')?


2013/5/13 Nitin Pawar <ni...@gmail.com>

> Sam,
>
> There is no formula for determining how much memory one should give to
> datanode and tasktracker. Ther formula is available for how many slots you
> want to have on a machine.
>
> In my prior experience, we did give 512MB memory each to a datanode and
> tasktracker.
>
>
> On Mon, May 13, 2013 at 11:18 AM, sam liu <sa...@gmail.com> wrote:
>
>> For node3, the memory is:
>>                    total       used       free     shared    buffers
>> cached
>> Mem:          3834       3666        167          0        187       1136
>> -/+ buffers/cache:       2342       1491
>> Swap:         8196          0       8196
>>
>> To a 3 nodes cluster as mine, what's the required minimum free/available
>> memory for the datanode process and tasktracker process, without running
>> any map/reduce task?
>> Any formula to determine it?
>>
>>
>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>
>>> can you tell specs of node3. Even on a test/demo cluster, anything below
>>> 4 GB ram makes the node almost inaccessible as per my experience.
>>>
>>>
>>>
>>> On Sun, May 12, 2013 at 8:25 PM, sam liu <sa...@gmail.com> wrote:
>>>
>>>> Got some exceptions on node3:
>>>> 1. datanode log:
>>>> 2013-04-17 11:13:44,719 INFO
>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
>>>> blk_2478755809192724446_1477 received exception
>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
>>>> channel to be ready for read. ch :
>>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>>> 9.50.102.79:50010]
>>>> 2013-04-17 11:13:44,721 ERROR
>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>>>> 9.50.102.80:50010,
>>>> storageID=DS-2038715921-9.50.102.80-50010-1366091297051, infoPort=50075,
>>>> ipcPort=50020):DataXceiver
>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
>>>> channel to be ready for read. ch :
>>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>>> 9.50.102.79:50010]
>>>>         at
>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>>>>         at
>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>         at
>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>         at
>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
>>>>         at java.io.DataInputStream.readShort(DataInputStream.java:306)
>>>>         at
>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:359)
>>>>         at
>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
>>>>         at java.lang.Thread.run(Thread.java:738)
>>>> 2013-04-17 11:13:44,818 INFO
>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>>>> blk_8413378381769505032_1477 src: /9.50.102.81:35279 dest: /
>>>> 9.50.102.80:50010
>>>>
>>>>
>>>> 2. tasktracker log:
>>>> 2013-04-23 11:48:26,783 INFO org.apache.hadoop.mapred.UserLogCleaner:
>>>> Deleting user log path job_201304152248_0011
>>>> 2013-04-30 14:48:15,506 ERROR org.apache.hadoop.mapred.TaskTracker:
>>>> Caught exception: java.io.IOException: Call to node1/9.50.102.81:9001failed on local exception: java.io.IOException: Connection reset by peer
>>>>         at org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
>>>>         at org.apache.hadoop.ipc.Client.call(Client.java:1112)
>>>>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
>>>>         at org.apache.hadoop.mapred.$Proxy2.heartbeat(Unknown Source)
>>>>         at
>>>> org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:2008)
>>>>         at
>>>> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1802)
>>>>         at
>>>> org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2654)
>>>>         at
>>>> org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3909)
>>>> Caused by: java.io.IOException: Connection reset by peer
>>>>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>>>>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:33)
>>>>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:210)
>>>>         at sun.nio.ch.IOUtil.read(IOUtil.java:183)
>>>>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:257)
>>>>         at
>>>> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>>>>         at
>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>>>>         at
>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>         at
>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>         at java.io.FilterInputStream.read(FilterInputStream.java:127)
>>>>         at
>>>> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:361)
>>>>         at
>>>> java.io.BufferedInputStream.fill(BufferedInputStream.java:229)
>>>>         at
>>>> java.io.BufferedInputStream.read(BufferedInputStream.java:248)
>>>>         at java.io.DataInputStream.readInt(DataInputStream.java:381)
>>>>         at
>>>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841)
>>>>         at org.apache.hadoop.ipc.Client$Connection.run(Client.java:786)
>>>>
>>>> 2013-04-30 14:48:15,517 INFO org.apache.hadoop.mapred.TaskTracker:
>>>> Resending 'status' to 'node1' with reponseId '-12904
>>>> 2013-04-30 14:48:16,404 INFO org.apache.hadoop.mapred.TaskTracker:
>>>> SHUTDOWN_MSG:
>>>>
>>>>
>>>>
>>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>>>
>>>>> do you get any error when trying to connect to cluster, something like
>>>>> 'tried n times' or replicated 0 times.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sun, May 12, 2013 at 7:28 PM, sam liu <sa...@gmail.com>wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I setup a cluster with 3 nodes, and after that I did not submit any
>>>>>> job on it. But, after few days, I found the cluster is unhealthy:
>>>>>> - No result returned after issuing command 'hadoop dfs -ls /' or
>>>>>> 'hadoop dfsadmin -report' for a while
>>>>>> - The page of 'http://namenode:50070' could not be opened as
>>>>>> expected...
>>>>>> - ...
>>>>>>
>>>>>> I did not find any usefull info in the logs, but found the avaible
>>>>>> memory of the cluster nodes are very low at that time:
>>>>>> - node1(NN,JT,DN,TT): 158 mb mem is available
>>>>>> - node2(DN,TT): 75 mb mem is available
>>>>>> - node3(DN,TT): 174 mb mem is available
>>>>>>
>>>>>> I guess the issue of my cluster is caused by lacking of memeory, and
>>>>>> my questions are:
>>>>>> - Without running jobs, what's the minimum memory requirements to
>>>>>> datanode and namenode?
>>>>>> - How to define the minimum memeory for datanode and namenode?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> Sam Liu
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
>
> --
> Nitin Pawar
>

Re: The minimum memory requirements to datanode and namenode?

Posted by sam liu <sa...@gmail.com>.

Nitin,

In my cluster, the tasktracker and datanode already have been launched, and
are still running now. But the free/available mem of node3 now is just 167
mb, and do you think it's the reason why my hadoop is unhealthy now(it does
not return result of command 'hadoop dfs -ls /')?


2013/5/13 Nitin Pawar <ni...@gmail.com>

> Sam,
>
> There is no formula for determining how much memory one should give to
> datanode and tasktracker. Ther formula is available for how many slots you
> want to have on a machine.
>
> In my prior experience, we did give 512MB memory each to a datanode and
> tasktracker.
>
>
> On Mon, May 13, 2013 at 11:18 AM, sam liu <sa...@gmail.com> wrote:
>
>> For node3, the memory is:
>>                    total       used       free     shared    buffers
>> cached
>> Mem:          3834       3666        167          0        187       1136
>> -/+ buffers/cache:       2342       1491
>> Swap:         8196          0       8196
>>
>> To a 3 nodes cluster as mine, what's the required minimum free/available
>> memory for the datanode process and tasktracker process, without running
>> any map/reduce task?
>> Any formula to determine it?
>>
>>
>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>
>>> can you tell specs of node3. Even on a test/demo cluster, anything below
>>> 4 GB ram makes the node almost inaccessible as per my experience.
>>>
>>>
>>>
>>> On Sun, May 12, 2013 at 8:25 PM, sam liu <sa...@gmail.com> wrote:
>>>
>>>> Got some exceptions on node3:
>>>> 1. datanode log:
>>>> 2013-04-17 11:13:44,719 INFO
>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
>>>> blk_2478755809192724446_1477 received exception
>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
>>>> channel to be ready for read. ch :
>>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>>> 9.50.102.79:50010]
>>>> 2013-04-17 11:13:44,721 ERROR
>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>>>> 9.50.102.80:50010,
>>>> storageID=DS-2038715921-9.50.102.80-50010-1366091297051, infoPort=50075,
>>>> ipcPort=50020):DataXceiver
>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
>>>> channel to be ready for read. ch :
>>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>>> 9.50.102.79:50010]
>>>>         at
>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>>>>         at
>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>         at
>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>         at
>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
>>>>         at java.io.DataInputStream.readShort(DataInputStream.java:306)
>>>>         at
>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:359)
>>>>         at
>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
>>>>         at java.lang.Thread.run(Thread.java:738)
>>>> 2013-04-17 11:13:44,818 INFO
>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>>>> blk_8413378381769505032_1477 src: /9.50.102.81:35279 dest: /
>>>> 9.50.102.80:50010
>>>>
>>>>
>>>> 2. tasktracker log:
>>>> 2013-04-23 11:48:26,783 INFO org.apache.hadoop.mapred.UserLogCleaner:
>>>> Deleting user log path job_201304152248_0011
>>>> 2013-04-30 14:48:15,506 ERROR org.apache.hadoop.mapred.TaskTracker:
>>>> Caught exception: java.io.IOException: Call to node1/9.50.102.81:9001failed on local exception: java.io.IOException: Connection reset by peer
>>>>         at org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
>>>>         at org.apache.hadoop.ipc.Client.call(Client.java:1112)
>>>>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
>>>>         at org.apache.hadoop.mapred.$Proxy2.heartbeat(Unknown Source)
>>>>         at
>>>> org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:2008)
>>>>         at
>>>> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1802)
>>>>         at
>>>> org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2654)
>>>>         at
>>>> org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3909)
>>>> Caused by: java.io.IOException: Connection reset by peer
>>>>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>>>>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:33)
>>>>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:210)
>>>>         at sun.nio.ch.IOUtil.read(IOUtil.java:183)
>>>>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:257)
>>>>         at
>>>> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>>>>         at
>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>>>>         at
>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>         at
>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>         at java.io.FilterInputStream.read(FilterInputStream.java:127)
>>>>         at
>>>> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:361)
>>>>         at
>>>> java.io.BufferedInputStream.fill(BufferedInputStream.java:229)
>>>>         at
>>>> java.io.BufferedInputStream.read(BufferedInputStream.java:248)
>>>>         at java.io.DataInputStream.readInt(DataInputStream.java:381)
>>>>         at
>>>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841)
>>>>         at org.apache.hadoop.ipc.Client$Connection.run(Client.java:786)
>>>>
>>>> 2013-04-30 14:48:15,517 INFO org.apache.hadoop.mapred.TaskTracker:
>>>> Resending 'status' to 'node1' with reponseId '-12904
>>>> 2013-04-30 14:48:16,404 INFO org.apache.hadoop.mapred.TaskTracker:
>>>> SHUTDOWN_MSG:
>>>>
>>>>
>>>>
>>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>>>
>>>>> do you get any error when trying to connect to cluster, something like
>>>>> 'tried n times' or replicated 0 times.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sun, May 12, 2013 at 7:28 PM, sam liu <sa...@gmail.com>wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I setup a cluster with 3 nodes, and after that I did not submit any
>>>>>> job on it. But, after few days, I found the cluster is unhealthy:
>>>>>> - No result returned after issuing command 'hadoop dfs -ls /' or
>>>>>> 'hadoop dfsadmin -report' for a while
>>>>>> - The page of 'http://namenode:50070' could not be opened as
>>>>>> expected...
>>>>>> - ...
>>>>>>
>>>>>> I did not find any usefull info in the logs, but found the avaible
>>>>>> memory of the cluster nodes are very low at that time:
>>>>>> - node1(NN,JT,DN,TT): 158 mb mem is available
>>>>>> - node2(DN,TT): 75 mb mem is available
>>>>>> - node3(DN,TT): 174 mb mem is available
>>>>>>
>>>>>> I guess the issue of my cluster is caused by lacking of memeory, and
>>>>>> my questions are:
>>>>>> - Without running jobs, what's the minimum memory requirements to
>>>>>> datanode and namenode?
>>>>>> - How to define the minimum memeory for datanode and namenode?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> Sam Liu
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
>
> --
> Nitin Pawar
>

Re: The minimum memory requirements to datanode and namenode?

Posted by sam liu <sa...@gmail.com>.

Nitin,

In my cluster, the tasktracker and datanode already have been launched, and
are still running now. But the free/available mem of node3 now is just 167
mb, and do you think it's the reason why my hadoop is unhealthy now(it does
not return result of command 'hadoop dfs -ls /')?


2013/5/13 Nitin Pawar <ni...@gmail.com>

> Sam,
>
> There is no formula for determining how much memory one should give to
> datanode and tasktracker. Ther formula is available for how many slots you
> want to have on a machine.
>
> In my prior experience, we did give 512MB memory each to a datanode and
> tasktracker.
>
>
> On Mon, May 13, 2013 at 11:18 AM, sam liu <sa...@gmail.com> wrote:
>
>> For node3, the memory is:
>>                    total       used       free     shared    buffers
>> cached
>> Mem:          3834       3666        167          0        187       1136
>> -/+ buffers/cache:       2342       1491
>> Swap:         8196          0       8196
>>
>> To a 3 nodes cluster as mine, what's the required minimum free/available
>> memory for the datanode process and tasktracker process, without running
>> any map/reduce task?
>> Any formula to determine it?
>>
>>
>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>
>>> can you tell specs of node3. Even on a test/demo cluster, anything below
>>> 4 GB ram makes the node almost inaccessible as per my experience.
>>>
>>>
>>>
>>> On Sun, May 12, 2013 at 8:25 PM, sam liu <sa...@gmail.com> wrote:
>>>
>>>> Got some exceptions on node3:
>>>> 1. datanode log:
>>>> 2013-04-17 11:13:44,719 INFO
>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
>>>> blk_2478755809192724446_1477 received exception
>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
>>>> channel to be ready for read. ch :
>>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>>> 9.50.102.79:50010]
>>>> 2013-04-17 11:13:44,721 ERROR
>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>>>> 9.50.102.80:50010,
>>>> storageID=DS-2038715921-9.50.102.80-50010-1366091297051, infoPort=50075,
>>>> ipcPort=50020):DataXceiver
>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
>>>> channel to be ready for read. ch :
>>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>>> 9.50.102.79:50010]
>>>>         at
>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>>>>         at
>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>         at
>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>         at
>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
>>>>         at java.io.DataInputStream.readShort(DataInputStream.java:306)
>>>>         at
>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:359)
>>>>         at
>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
>>>>         at java.lang.Thread.run(Thread.java:738)
>>>> 2013-04-17 11:13:44,818 INFO
>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>>>> blk_8413378381769505032_1477 src: /9.50.102.81:35279 dest: /
>>>> 9.50.102.80:50010
>>>>
>>>>
>>>> 2. tasktracker log:
>>>> 2013-04-23 11:48:26,783 INFO org.apache.hadoop.mapred.UserLogCleaner:
>>>> Deleting user log path job_201304152248_0011
>>>> 2013-04-30 14:48:15,506 ERROR org.apache.hadoop.mapred.TaskTracker:
>>>> Caught exception: java.io.IOException: Call to node1/9.50.102.81:9001failed on local exception: java.io.IOException: Connection reset by peer
>>>>         at org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
>>>>         at org.apache.hadoop.ipc.Client.call(Client.java:1112)
>>>>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
>>>>         at org.apache.hadoop.mapred.$Proxy2.heartbeat(Unknown Source)
>>>>         at
>>>> org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:2008)
>>>>         at
>>>> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1802)
>>>>         at
>>>> org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2654)
>>>>         at
>>>> org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3909)
>>>> Caused by: java.io.IOException: Connection reset by peer
>>>>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>>>>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:33)
>>>>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:210)
>>>>         at sun.nio.ch.IOUtil.read(IOUtil.java:183)
>>>>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:257)
>>>>         at
>>>> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>>>>         at
>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>>>>         at
>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>         at
>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>         at java.io.FilterInputStream.read(FilterInputStream.java:127)
>>>>         at
>>>> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:361)
>>>>         at
>>>> java.io.BufferedInputStream.fill(BufferedInputStream.java:229)
>>>>         at
>>>> java.io.BufferedInputStream.read(BufferedInputStream.java:248)
>>>>         at java.io.DataInputStream.readInt(DataInputStream.java:381)
>>>>         at
>>>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841)
>>>>         at org.apache.hadoop.ipc.Client$Connection.run(Client.java:786)
>>>>
>>>> 2013-04-30 14:48:15,517 INFO org.apache.hadoop.mapred.TaskTracker:
>>>> Resending 'status' to 'node1' with reponseId '-12904
>>>> 2013-04-30 14:48:16,404 INFO org.apache.hadoop.mapred.TaskTracker:
>>>> SHUTDOWN_MSG:
>>>>
>>>>
>>>>
>>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>>>
>>>>> do you get any error when trying to connect to cluster, something like
>>>>> 'tried n times' or replicated 0 times.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sun, May 12, 2013 at 7:28 PM, sam liu <sa...@gmail.com>wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I setup a cluster with 3 nodes, and after that I did not submit any
>>>>>> job on it. But, after few days, I found the cluster is unhealthy:
>>>>>> - No result returned after issuing command 'hadoop dfs -ls /' or
>>>>>> 'hadoop dfsadmin -report' for a while
>>>>>> - The page of 'http://namenode:50070' could not be opened as
>>>>>> expected...
>>>>>> - ...
>>>>>>
>>>>>> I did not find any usefull info in the logs, but found the avaible
>>>>>> memory of the cluster nodes are very low at that time:
>>>>>> - node1(NN,JT,DN,TT): 158 mb mem is available
>>>>>> - node2(DN,TT): 75 mb mem is available
>>>>>> - node3(DN,TT): 174 mb mem is available
>>>>>>
>>>>>> I guess the issue of my cluster is caused by lacking of memeory, and
>>>>>> my questions are:
>>>>>> - Without running jobs, what's the minimum memory requirements to
>>>>>> datanode and namenode?
>>>>>> - How to define the minimum memeory for datanode and namenode?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> Sam Liu
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
>
> --
> Nitin Pawar
>

Re: The minimum memory requirements to datanode and namenode?

Posted by sam liu <sa...@gmail.com>.

Nitin,

In my cluster, the tasktracker and datanode already have been launched, and
are still running now. But the free/available mem of node3 now is just 167
mb, and do you think it's the reason why my hadoop is unhealthy now(it does
not return result of command 'hadoop dfs -ls /')?


2013/5/13 Nitin Pawar <ni...@gmail.com>

> Sam,
>
> There is no formula for determining how much memory one should give to
> datanode and tasktracker. Ther formula is available for how many slots you
> want to have on a machine.
>
> In my prior experience, we did give 512MB memory each to a datanode and
> tasktracker.
>
>
> On Mon, May 13, 2013 at 11:18 AM, sam liu <sa...@gmail.com> wrote:
>
>> For node3, the memory is:
>>                    total       used       free     shared    buffers
>> cached
>> Mem:          3834       3666        167          0        187       1136
>> -/+ buffers/cache:       2342       1491
>> Swap:         8196          0       8196
>>
>> To a 3 nodes cluster as mine, what's the required minimum free/available
>> memory for the datanode process and tasktracker process, without running
>> any map/reduce task?
>> Any formula to determine it?
>>
>>
>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>
>>> can you tell specs of node3. Even on a test/demo cluster, anything below
>>> 4 GB ram makes the node almost inaccessible as per my experience.
>>>
>>>
>>>
>>> On Sun, May 12, 2013 at 8:25 PM, sam liu <sa...@gmail.com> wrote:
>>>
>>>> Got some exceptions on node3:
>>>> 1. datanode log:
>>>> 2013-04-17 11:13:44,719 INFO
>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
>>>> blk_2478755809192724446_1477 received exception
>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
>>>> channel to be ready for read. ch :
>>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>>> 9.50.102.79:50010]
>>>> 2013-04-17 11:13:44,721 ERROR
>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>>>> 9.50.102.80:50010,
>>>> storageID=DS-2038715921-9.50.102.80-50010-1366091297051, infoPort=50075,
>>>> ipcPort=50020):DataXceiver
>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
>>>> channel to be ready for read. ch :
>>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>>> 9.50.102.79:50010]
>>>>         at
>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>>>>         at
>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>         at
>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>         at
>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
>>>>         at java.io.DataInputStream.readShort(DataInputStream.java:306)
>>>>         at
>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:359)
>>>>         at
>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
>>>>         at java.lang.Thread.run(Thread.java:738)
>>>> 2013-04-17 11:13:44,818 INFO
>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>>>> blk_8413378381769505032_1477 src: /9.50.102.81:35279 dest: /
>>>> 9.50.102.80:50010
>>>>
>>>>
>>>> 2. tasktracker log:
>>>> 2013-04-23 11:48:26,783 INFO org.apache.hadoop.mapred.UserLogCleaner:
>>>> Deleting user log path job_201304152248_0011
>>>> 2013-04-30 14:48:15,506 ERROR org.apache.hadoop.mapred.TaskTracker:
>>>> Caught exception: java.io.IOException: Call to node1/9.50.102.81:9001failed on local exception: java.io.IOException: Connection reset by peer
>>>>         at org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
>>>>         at org.apache.hadoop.ipc.Client.call(Client.java:1112)
>>>>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
>>>>         at org.apache.hadoop.mapred.$Proxy2.heartbeat(Unknown Source)
>>>>         at
>>>> org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:2008)
>>>>         at
>>>> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1802)
>>>>         at
>>>> org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2654)
>>>>         at
>>>> org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3909)
>>>> Caused by: java.io.IOException: Connection reset by peer
>>>>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>>>>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:33)
>>>>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:210)
>>>>         at sun.nio.ch.IOUtil.read(IOUtil.java:183)
>>>>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:257)
>>>>         at
>>>> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>>>>         at
>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>>>>         at
>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>         at
>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>         at java.io.FilterInputStream.read(FilterInputStream.java:127)
>>>>         at
>>>> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:361)
>>>>         at
>>>> java.io.BufferedInputStream.fill(BufferedInputStream.java:229)
>>>>         at
>>>> java.io.BufferedInputStream.read(BufferedInputStream.java:248)
>>>>         at java.io.DataInputStream.readInt(DataInputStream.java:381)
>>>>         at
>>>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841)
>>>>         at org.apache.hadoop.ipc.Client$Connection.run(Client.java:786)
>>>>
>>>> 2013-04-30 14:48:15,517 INFO org.apache.hadoop.mapred.TaskTracker:
>>>> Resending 'status' to 'node1' with reponseId '-12904
>>>> 2013-04-30 14:48:16,404 INFO org.apache.hadoop.mapred.TaskTracker:
>>>> SHUTDOWN_MSG:
>>>>
>>>>
>>>>
>>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>>>
>>>>> do you get any error when trying to connect to cluster, something like
>>>>> 'tried n times' or replicated 0 times.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sun, May 12, 2013 at 7:28 PM, sam liu <sa...@gmail.com>wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I setup a cluster with 3 nodes, and after that I did not submit any
>>>>>> job on it. But, after few days, I found the cluster is unhealthy:
>>>>>> - No result returned after issuing command 'hadoop dfs -ls /' or
>>>>>> 'hadoop dfsadmin -report' for a while
>>>>>> - The page of 'http://namenode:50070' could not be opened as
>>>>>> expected...
>>>>>> - ...
>>>>>>
>>>>>> I did not find any usefull info in the logs, but found the avaible
>>>>>> memory of the cluster nodes are very low at that time:
>>>>>> - node1(NN,JT,DN,TT): 158 mb mem is available
>>>>>> - node2(DN,TT): 75 mb mem is available
>>>>>> - node3(DN,TT): 174 mb mem is available
>>>>>>
>>>>>> I guess the issue of my cluster is caused by lacking of memeory, and
>>>>>> my questions are:
>>>>>> - Without running jobs, what's the minimum memory requirements to
>>>>>> datanode and namenode?
>>>>>> - How to define the minimum memeory for datanode and namenode?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> Sam Liu
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
>
> --
> Nitin Pawar
>

Re: The minimum memory requirements to datanode and namenode?

Posted by Nitin Pawar <ni...@gmail.com>.

Sam,

There is no formula for determining how much memory one should give to
datanode and tasktracker. Ther formula is available for how many slots you
want to have on a machine.

In my prior experience, we did give 512MB memory each to a datanode and
tasktracker.


On Mon, May 13, 2013 at 11:18 AM, sam liu <sa...@gmail.com> wrote:

> For node3, the memory is:
>                    total       used       free     shared    buffers
> cached
> Mem:          3834       3666        167          0        187       1136
> -/+ buffers/cache:       2342       1491
> Swap:         8196          0       8196
>
> To a 3 nodes cluster as mine, what's the required minimum free/available
> memory for the datanode process and tasktracker process, without running
> any map/reduce task?
> Any formula to determine it?
>
>
> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>
>> can you tell specs of node3. Even on a test/demo cluster, anything below
>> 4 GB ram makes the node almost inaccessible as per my experience.
>>
>>
>>
>> On Sun, May 12, 2013 at 8:25 PM, sam liu <sa...@gmail.com> wrote:
>>
>>> Got some exceptions on node3:
>>> 1. datanode log:
>>> 2013-04-17 11:13:44,719 INFO
>>> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
>>> blk_2478755809192724446_1477 received exception
>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
>>> channel to be ready for read. ch :
>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>> 9.50.102.79:50010]
>>> 2013-04-17 11:13:44,721 ERROR
>>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>>> 9.50.102.80:50010,
>>> storageID=DS-2038715921-9.50.102.80-50010-1366091297051, infoPort=50075,
>>> ipcPort=50020):DataXceiver
>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
>>> channel to be ready for read. ch :
>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>> 9.50.102.79:50010]
>>>         at
>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>>>         at
>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>         at
>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>         at
>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
>>>         at java.io.DataInputStream.readShort(DataInputStream.java:306)
>>>         at
>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:359)
>>>         at
>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
>>>         at java.lang.Thread.run(Thread.java:738)
>>> 2013-04-17 11:13:44,818 INFO
>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>>> blk_8413378381769505032_1477 src: /9.50.102.81:35279 dest: /
>>> 9.50.102.80:50010
>>>
>>>
>>> 2. tasktracker log:
>>> 2013-04-23 11:48:26,783 INFO org.apache.hadoop.mapred.UserLogCleaner:
>>> Deleting user log path job_201304152248_0011
>>> 2013-04-30 14:48:15,506 ERROR org.apache.hadoop.mapred.TaskTracker:
>>> Caught exception: java.io.IOException: Call to node1/9.50.102.81:9001failed on local exception: java.io.IOException: Connection reset by peer
>>>         at org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
>>>         at org.apache.hadoop.ipc.Client.call(Client.java:1112)
>>>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
>>>         at org.apache.hadoop.mapred.$Proxy2.heartbeat(Unknown Source)
>>>         at
>>> org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:2008)
>>>         at
>>> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1802)
>>>         at
>>> org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2654)
>>>         at
>>> org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3909)
>>> Caused by: java.io.IOException: Connection reset by peer
>>>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>>>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:33)
>>>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:210)
>>>         at sun.nio.ch.IOUtil.read(IOUtil.java:183)
>>>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:257)
>>>         at
>>> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>>>         at
>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>>>         at
>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>         at
>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>         at java.io.FilterInputStream.read(FilterInputStream.java:127)
>>>         at
>>> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:361)
>>>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:229)
>>>         at java.io.BufferedInputStream.read(BufferedInputStream.java:248)
>>>         at java.io.DataInputStream.readInt(DataInputStream.java:381)
>>>         at
>>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841)
>>>         at org.apache.hadoop.ipc.Client$Connection.run(Client.java:786)
>>>
>>> 2013-04-30 14:48:15,517 INFO org.apache.hadoop.mapred.TaskTracker:
>>> Resending 'status' to 'node1' with reponseId '-12904
>>> 2013-04-30 14:48:16,404 INFO org.apache.hadoop.mapred.TaskTracker:
>>> SHUTDOWN_MSG:
>>>
>>>
>>>
>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>>
>>>> do you get any error when trying to connect to cluster, something like
>>>> 'tried n times' or replicated 0 times.
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, May 12, 2013 at 7:28 PM, sam liu <sa...@gmail.com>wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I setup a cluster with 3 nodes, and after that I did not submit any
>>>>> job on it. But, after few days, I found the cluster is unhealthy:
>>>>> - No result returned after issuing command 'hadoop dfs -ls /' or
>>>>> 'hadoop dfsadmin -report' for a while
>>>>> - The page of 'http://namenode:50070' could not be opened as
>>>>> expected...
>>>>> - ...
>>>>>
>>>>> I did not find any usefull info in the logs, but found the avaible
>>>>> memory of the cluster nodes are very low at that time:
>>>>> - node1(NN,JT,DN,TT): 158 mb mem is available
>>>>> - node2(DN,TT): 75 mb mem is available
>>>>> - node3(DN,TT): 174 mb mem is available
>>>>>
>>>>> I guess the issue of my cluster is caused by lacking of memeory, and
>>>>> my questions are:
>>>>> - Without running jobs, what's the minimum memory requirements to
>>>>> datanode and namenode?
>>>>> - How to define the minimum memeory for datanode and namenode?
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Sam Liu
>>>>>
>>>>
>>>>
>>>
>>
>


-- 
Nitin Pawar

Re: The minimum memory requirements to datanode and namenode?

Posted by Nitin Pawar <ni...@gmail.com>.

Sam,

There is no formula for determining how much memory one should give to
datanode and tasktracker. Ther formula is available for how many slots you
want to have on a machine.

In my prior experience, we did give 512MB memory each to a datanode and
tasktracker.


On Mon, May 13, 2013 at 11:18 AM, sam liu <sa...@gmail.com> wrote:

> For node3, the memory is:
>                    total       used       free     shared    buffers
> cached
> Mem:          3834       3666        167          0        187       1136
> -/+ buffers/cache:       2342       1491
> Swap:         8196          0       8196
>
> To a 3 nodes cluster as mine, what's the required minimum free/available
> memory for the datanode process and tasktracker process, without running
> any map/reduce task?
> Any formula to determine it?
>
>
> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>
>> can you tell specs of node3. Even on a test/demo cluster, anything below
>> 4 GB ram makes the node almost inaccessible as per my experience.
>>
>>
>>
>> On Sun, May 12, 2013 at 8:25 PM, sam liu <sa...@gmail.com> wrote:
>>
>>> Got some exceptions on node3:
>>> 1. datanode log:
>>> 2013-04-17 11:13:44,719 INFO
>>> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
>>> blk_2478755809192724446_1477 received exception
>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
>>> channel to be ready for read. ch :
>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>> 9.50.102.79:50010]
>>> 2013-04-17 11:13:44,721 ERROR
>>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>>> 9.50.102.80:50010,
>>> storageID=DS-2038715921-9.50.102.80-50010-1366091297051, infoPort=50075,
>>> ipcPort=50020):DataXceiver
>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
>>> channel to be ready for read. ch :
>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>> 9.50.102.79:50010]
>>>         at
>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>>>         at
>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>         at
>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>         at
>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
>>>         at java.io.DataInputStream.readShort(DataInputStream.java:306)
>>>         at
>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:359)
>>>         at
>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
>>>         at java.lang.Thread.run(Thread.java:738)
>>> 2013-04-17 11:13:44,818 INFO
>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>>> blk_8413378381769505032_1477 src: /9.50.102.81:35279 dest: /
>>> 9.50.102.80:50010
>>>
>>>
>>> 2. tasktracker log:
>>> 2013-04-23 11:48:26,783 INFO org.apache.hadoop.mapred.UserLogCleaner:
>>> Deleting user log path job_201304152248_0011
>>> 2013-04-30 14:48:15,506 ERROR org.apache.hadoop.mapred.TaskTracker:
>>> Caught exception: java.io.IOException: Call to node1/9.50.102.81:9001failed on local exception: java.io.IOException: Connection reset by peer
>>>         at org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
>>>         at org.apache.hadoop.ipc.Client.call(Client.java:1112)
>>>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
>>>         at org.apache.hadoop.mapred.$Proxy2.heartbeat(Unknown Source)
>>>         at
>>> org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:2008)
>>>         at
>>> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1802)
>>>         at
>>> org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2654)
>>>         at
>>> org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3909)
>>> Caused by: java.io.IOException: Connection reset by peer
>>>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>>>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:33)
>>>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:210)
>>>         at sun.nio.ch.IOUtil.read(IOUtil.java:183)
>>>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:257)
>>>         at
>>> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>>>         at
>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>>>         at
>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>         at
>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>         at java.io.FilterInputStream.read(FilterInputStream.java:127)
>>>         at
>>> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:361)
>>>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:229)
>>>         at java.io.BufferedInputStream.read(BufferedInputStream.java:248)
>>>         at java.io.DataInputStream.readInt(DataInputStream.java:381)
>>>         at
>>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841)
>>>         at org.apache.hadoop.ipc.Client$Connection.run(Client.java:786)
>>>
>>> 2013-04-30 14:48:15,517 INFO org.apache.hadoop.mapred.TaskTracker:
>>> Resending 'status' to 'node1' with reponseId '-12904
>>> 2013-04-30 14:48:16,404 INFO org.apache.hadoop.mapred.TaskTracker:
>>> SHUTDOWN_MSG:
>>>
>>>
>>>
>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>>
>>>> do you get any error when trying to connect to cluster, something like
>>>> 'tried n times' or replicated 0 times.
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, May 12, 2013 at 7:28 PM, sam liu <sa...@gmail.com>wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I setup a cluster with 3 nodes, and after that I did not submit any
>>>>> job on it. But, after few days, I found the cluster is unhealthy:
>>>>> - No result returned after issuing command 'hadoop dfs -ls /' or
>>>>> 'hadoop dfsadmin -report' for a while
>>>>> - The page of 'http://namenode:50070' could not be opened as
>>>>> expected...
>>>>> - ...
>>>>>
>>>>> I did not find any usefull info in the logs, but found the avaible
>>>>> memory of the cluster nodes are very low at that time:
>>>>> - node1(NN,JT,DN,TT): 158 mb mem is available
>>>>> - node2(DN,TT): 75 mb mem is available
>>>>> - node3(DN,TT): 174 mb mem is available
>>>>>
>>>>> I guess the issue of my cluster is caused by lacking of memeory, and
>>>>> my questions are:
>>>>> - Without running jobs, what's the minimum memory requirements to
>>>>> datanode and namenode?
>>>>> - How to define the minimum memeory for datanode and namenode?
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Sam Liu
>>>>>
>>>>
>>>>
>>>
>>
>


-- 
Nitin Pawar

Re: The minimum memory requirements to datanode and namenode?

Posted by Nitin Pawar <ni...@gmail.com>.

Sam,

There is no formula for determining how much memory one should give to
datanode and tasktracker. Ther formula is available for how many slots you
want to have on a machine.

In my prior experience, we did give 512MB memory each to a datanode and
tasktracker.


On Mon, May 13, 2013 at 11:18 AM, sam liu <sa...@gmail.com> wrote:

> For node3, the memory is:
>                    total       used       free     shared    buffers
> cached
> Mem:          3834       3666        167          0        187       1136
> -/+ buffers/cache:       2342       1491
> Swap:         8196          0       8196
>
> To a 3 nodes cluster as mine, what's the required minimum free/available
> memory for the datanode process and tasktracker process, without running
> any map/reduce task?
> Any formula to determine it?
>
>
> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>
>> can you tell specs of node3. Even on a test/demo cluster, anything below
>> 4 GB ram makes the node almost inaccessible as per my experience.
>>
>>
>>
>> On Sun, May 12, 2013 at 8:25 PM, sam liu <sa...@gmail.com> wrote:
>>
>>> Got some exceptions on node3:
>>> 1. datanode log:
>>> 2013-04-17 11:13:44,719 INFO
>>> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
>>> blk_2478755809192724446_1477 received exception
>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
>>> channel to be ready for read. ch :
>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>> 9.50.102.79:50010]
>>> 2013-04-17 11:13:44,721 ERROR
>>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>>> 9.50.102.80:50010,
>>> storageID=DS-2038715921-9.50.102.80-50010-1366091297051, infoPort=50075,
>>> ipcPort=50020):DataXceiver
>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
>>> channel to be ready for read. ch :
>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>> 9.50.102.79:50010]
>>>         at
>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>>>         at
>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>         at
>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>         at
>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
>>>         at java.io.DataInputStream.readShort(DataInputStream.java:306)
>>>         at
>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:359)
>>>         at
>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
>>>         at java.lang.Thread.run(Thread.java:738)
>>> 2013-04-17 11:13:44,818 INFO
>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>>> blk_8413378381769505032_1477 src: /9.50.102.81:35279 dest: /
>>> 9.50.102.80:50010
>>>
>>>
>>> 2. tasktracker log:
>>> 2013-04-23 11:48:26,783 INFO org.apache.hadoop.mapred.UserLogCleaner:
>>> Deleting user log path job_201304152248_0011
>>> 2013-04-30 14:48:15,506 ERROR org.apache.hadoop.mapred.TaskTracker:
>>> Caught exception: java.io.IOException: Call to node1/9.50.102.81:9001failed on local exception: java.io.IOException: Connection reset by peer
>>>         at org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
>>>         at org.apache.hadoop.ipc.Client.call(Client.java:1112)
>>>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
>>>         at org.apache.hadoop.mapred.$Proxy2.heartbeat(Unknown Source)
>>>         at
>>> org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:2008)
>>>         at
>>> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1802)
>>>         at
>>> org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2654)
>>>         at
>>> org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3909)
>>> Caused by: java.io.IOException: Connection reset by peer
>>>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>>>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:33)
>>>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:210)
>>>         at sun.nio.ch.IOUtil.read(IOUtil.java:183)
>>>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:257)
>>>         at
>>> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>>>         at
>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>>>         at
>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>         at
>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>         at java.io.FilterInputStream.read(FilterInputStream.java:127)
>>>         at
>>> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:361)
>>>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:229)
>>>         at java.io.BufferedInputStream.read(BufferedInputStream.java:248)
>>>         at java.io.DataInputStream.readInt(DataInputStream.java:381)
>>>         at
>>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841)
>>>         at org.apache.hadoop.ipc.Client$Connection.run(Client.java:786)
>>>
>>> 2013-04-30 14:48:15,517 INFO org.apache.hadoop.mapred.TaskTracker:
>>> Resending 'status' to 'node1' with reponseId '-12904
>>> 2013-04-30 14:48:16,404 INFO org.apache.hadoop.mapred.TaskTracker:
>>> SHUTDOWN_MSG:
>>>
>>>
>>>
>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>>
>>>> do you get any error when trying to connect to cluster, something like
>>>> 'tried n times' or replicated 0 times.
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, May 12, 2013 at 7:28 PM, sam liu <sa...@gmail.com>wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I setup a cluster with 3 nodes, and after that I did not submit any
>>>>> job on it. But, after few days, I found the cluster is unhealthy:
>>>>> - No result returned after issuing command 'hadoop dfs -ls /' or
>>>>> 'hadoop dfsadmin -report' for a while
>>>>> - The page of 'http://namenode:50070' could not be opened as
>>>>> expected...
>>>>> - ...
>>>>>
>>>>> I did not find any usefull info in the logs, but found the avaible
>>>>> memory of the cluster nodes are very low at that time:
>>>>> - node1(NN,JT,DN,TT): 158 mb mem is available
>>>>> - node2(DN,TT): 75 mb mem is available
>>>>> - node3(DN,TT): 174 mb mem is available
>>>>>
>>>>> I guess the issue of my cluster is caused by lacking of memeory, and
>>>>> my questions are:
>>>>> - Without running jobs, what's the minimum memory requirements to
>>>>> datanode and namenode?
>>>>> - How to define the minimum memeory for datanode and namenode?
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Sam Liu
>>>>>
>>>>
>>>>
>>>
>>
>


-- 
Nitin Pawar

Re: The minimum memory requirements to datanode and namenode?

Posted by Nitin Pawar <ni...@gmail.com>.

Sam,

There is no formula for determining how much memory one should give to
datanode and tasktracker. Ther formula is available for how many slots you
want to have on a machine.

In my prior experience, we did give 512MB memory each to a datanode and
tasktracker.


On Mon, May 13, 2013 at 11:18 AM, sam liu <sa...@gmail.com> wrote:

> For node3, the memory is:
>                    total       used       free     shared    buffers
> cached
> Mem:          3834       3666        167          0        187       1136
> -/+ buffers/cache:       2342       1491
> Swap:         8196          0       8196
>
> To a 3 nodes cluster as mine, what's the required minimum free/available
> memory for the datanode process and tasktracker process, without running
> any map/reduce task?
> Any formula to determine it?
>
>
> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>
>> can you tell specs of node3. Even on a test/demo cluster, anything below
>> 4 GB ram makes the node almost inaccessible as per my experience.
>>
>>
>>
>> On Sun, May 12, 2013 at 8:25 PM, sam liu <sa...@gmail.com> wrote:
>>
>>> Got some exceptions on node3:
>>> 1. datanode log:
>>> 2013-04-17 11:13:44,719 INFO
>>> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
>>> blk_2478755809192724446_1477 received exception
>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
>>> channel to be ready for read. ch :
>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>> 9.50.102.79:50010]
>>> 2013-04-17 11:13:44,721 ERROR
>>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>>> 9.50.102.80:50010,
>>> storageID=DS-2038715921-9.50.102.80-50010-1366091297051, infoPort=50075,
>>> ipcPort=50020):DataXceiver
>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
>>> channel to be ready for read. ch :
>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>> 9.50.102.79:50010]
>>>         at
>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>>>         at
>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>         at
>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>         at
>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
>>>         at java.io.DataInputStream.readShort(DataInputStream.java:306)
>>>         at
>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:359)
>>>         at
>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
>>>         at java.lang.Thread.run(Thread.java:738)
>>> 2013-04-17 11:13:44,818 INFO
>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>>> blk_8413378381769505032_1477 src: /9.50.102.81:35279 dest: /
>>> 9.50.102.80:50010
>>>
>>>
>>> 2. tasktracker log:
>>> 2013-04-23 11:48:26,783 INFO org.apache.hadoop.mapred.UserLogCleaner:
>>> Deleting user log path job_201304152248_0011
>>> 2013-04-30 14:48:15,506 ERROR org.apache.hadoop.mapred.TaskTracker:
>>> Caught exception: java.io.IOException: Call to node1/9.50.102.81:9001failed on local exception: java.io.IOException: Connection reset by peer
>>>         at org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
>>>         at org.apache.hadoop.ipc.Client.call(Client.java:1112)
>>>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
>>>         at org.apache.hadoop.mapred.$Proxy2.heartbeat(Unknown Source)
>>>         at
>>> org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:2008)
>>>         at
>>> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1802)
>>>         at
>>> org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2654)
>>>         at
>>> org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3909)
>>> Caused by: java.io.IOException: Connection reset by peer
>>>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>>>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:33)
>>>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:210)
>>>         at sun.nio.ch.IOUtil.read(IOUtil.java:183)
>>>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:257)
>>>         at
>>> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>>>         at
>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>>>         at
>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>         at
>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>         at java.io.FilterInputStream.read(FilterInputStream.java:127)
>>>         at
>>> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:361)
>>>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:229)
>>>         at java.io.BufferedInputStream.read(BufferedInputStream.java:248)
>>>         at java.io.DataInputStream.readInt(DataInputStream.java:381)
>>>         at
>>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841)
>>>         at org.apache.hadoop.ipc.Client$Connection.run(Client.java:786)
>>>
>>> 2013-04-30 14:48:15,517 INFO org.apache.hadoop.mapred.TaskTracker:
>>> Resending 'status' to 'node1' with reponseId '-12904
>>> 2013-04-30 14:48:16,404 INFO org.apache.hadoop.mapred.TaskTracker:
>>> SHUTDOWN_MSG:
>>>
>>>
>>>
>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>>
>>>> do you get any error when trying to connect to cluster, something like
>>>> 'tried n times' or replicated 0 times.
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, May 12, 2013 at 7:28 PM, sam liu <sa...@gmail.com>wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I setup a cluster with 3 nodes, and after that I did not submit any
>>>>> job on it. But, after few days, I found the cluster is unhealthy:
>>>>> - No result returned after issuing command 'hadoop dfs -ls /' or
>>>>> 'hadoop dfsadmin -report' for a while
>>>>> - The page of 'http://namenode:50070' could not be opened as
>>>>> expected...
>>>>> - ...
>>>>>
>>>>> I did not find any usefull info in the logs, but found the avaible
>>>>> memory of the cluster nodes are very low at that time:
>>>>> - node1(NN,JT,DN,TT): 158 mb mem is available
>>>>> - node2(DN,TT): 75 mb mem is available
>>>>> - node3(DN,TT): 174 mb mem is available
>>>>>
>>>>> I guess the issue of my cluster is caused by lacking of memeory, and
>>>>> my questions are:
>>>>> - Without running jobs, what's the minimum memory requirements to
>>>>> datanode and namenode?
>>>>> - How to define the minimum memeory for datanode and namenode?
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Sam Liu
>>>>>
>>>>
>>>>
>>>
>>
>


-- 
Nitin Pawar

Re: The minimum memory requirements to datanode and namenode?

Posted by sam liu <sa...@gmail.com>.

For node3, the memory is:
                   total       used       free     shared    buffers
cached
Mem:          3834       3666        167          0        187       1136
-/+ buffers/cache:       2342       1491
Swap:         8196          0       8196

To a 3 nodes cluster as mine, what's the required minimum free/available
memory for the datanode process and tasktracker process, without running
any map/reduce task?
Any formula to determine it?


2013/5/13 Rishi Yadav <ri...@infoobjects.com>

> can you tell specs of node3. Even on a test/demo cluster, anything below 4
> GB ram makes the node almost inaccessible as per my experience.
>
>
>
> On Sun, May 12, 2013 at 8:25 PM, sam liu <sa...@gmail.com> wrote:
>
>> Got some exceptions on node3:
>> 1. datanode log:
>> 2013-04-17 11:13:44,719 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
>> blk_2478755809192724446_1477 received exception
>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
>> channel to be ready for read. ch :
>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>> 9.50.102.79:50010]
>> 2013-04-17 11:13:44,721 ERROR
>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>> 9.50.102.80:50010,
>> storageID=DS-2038715921-9.50.102.80-50010-1366091297051, infoPort=50075,
>> ipcPort=50020):DataXceiver
>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
>> channel to be ready for read. ch :
>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>> 9.50.102.79:50010]
>>         at
>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>>         at
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>         at
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>         at
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
>>         at java.io.DataInputStream.readShort(DataInputStream.java:306)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:359)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
>>         at java.lang.Thread.run(Thread.java:738)
>> 2013-04-17 11:13:44,818 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>> blk_8413378381769505032_1477 src: /9.50.102.81:35279 dest: /
>> 9.50.102.80:50010
>>
>>
>> 2. tasktracker log:
>> 2013-04-23 11:48:26,783 INFO org.apache.hadoop.mapred.UserLogCleaner:
>> Deleting user log path job_201304152248_0011
>> 2013-04-30 14:48:15,506 ERROR org.apache.hadoop.mapred.TaskTracker:
>> Caught exception: java.io.IOException: Call to node1/9.50.102.81:9001failed on local exception: java.io.IOException: Connection reset by peer
>>         at org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
>>         at org.apache.hadoop.ipc.Client.call(Client.java:1112)
>>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
>>         at org.apache.hadoop.mapred.$Proxy2.heartbeat(Unknown Source)
>>         at
>> org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:2008)
>>         at
>> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1802)
>>         at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2654)
>>         at
>> org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3909)
>> Caused by: java.io.IOException: Connection reset by peer
>>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:33)
>>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:210)
>>         at sun.nio.ch.IOUtil.read(IOUtil.java:183)
>>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:257)
>>         at
>> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>>         at
>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>>         at
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>         at
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>         at java.io.FilterInputStream.read(FilterInputStream.java:127)
>>         at
>> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:361)
>>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:229)
>>         at java.io.BufferedInputStream.read(BufferedInputStream.java:248)
>>         at java.io.DataInputStream.readInt(DataInputStream.java:381)
>>         at
>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841)
>>         at org.apache.hadoop.ipc.Client$Connection.run(Client.java:786)
>>
>> 2013-04-30 14:48:15,517 INFO org.apache.hadoop.mapred.TaskTracker:
>> Resending 'status' to 'node1' with reponseId '-12904
>> 2013-04-30 14:48:16,404 INFO org.apache.hadoop.mapred.TaskTracker:
>> SHUTDOWN_MSG:
>>
>>
>>
>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>
>>> do you get any error when trying to connect to cluster, something like
>>> 'tried n times' or replicated 0 times.
>>>
>>>
>>>
>>>
>>> On Sun, May 12, 2013 at 7:28 PM, sam liu <sa...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I setup a cluster with 3 nodes, and after that I did not submit any job
>>>> on it. But, after few days, I found the cluster is unhealthy:
>>>> - No result returned after issuing command 'hadoop dfs -ls /' or
>>>> 'hadoop dfsadmin -report' for a while
>>>> - The page of 'http://namenode:50070' could not be opened as
>>>> expected...
>>>> - ...
>>>>
>>>> I did not find any usefull info in the logs, but found the avaible
>>>> memory of the cluster nodes are very low at that time:
>>>> - node1(NN,JT,DN,TT): 158 mb mem is available
>>>> - node2(DN,TT): 75 mb mem is available
>>>> - node3(DN,TT): 174 mb mem is available
>>>>
>>>> I guess the issue of my cluster is caused by lacking of memeory, and my
>>>> questions are:
>>>> - Without running jobs, what's the minimum memory requirements to
>>>> datanode and namenode?
>>>> - How to define the minimum memeory for datanode and namenode?
>>>>
>>>> Thanks!
>>>>
>>>> Sam Liu
>>>>
>>>
>>>
>>
>

Re: The minimum memory requirements to datanode and namenode?

Posted by sam liu <sa...@gmail.com>.

For node3, the memory is:
                   total       used       free     shared    buffers
cached
Mem:          3834       3666        167          0        187       1136
-/+ buffers/cache:       2342       1491
Swap:         8196          0       8196

To a 3 nodes cluster as mine, what's the required minimum free/available
memory for the datanode process and tasktracker process, without running
any map/reduce task?
Any formula to determine it?


2013/5/13 Rishi Yadav <ri...@infoobjects.com>

> can you tell specs of node3. Even on a test/demo cluster, anything below 4
> GB ram makes the node almost inaccessible as per my experience.
>
>
>
> On Sun, May 12, 2013 at 8:25 PM, sam liu <sa...@gmail.com> wrote:
>
>> Got some exceptions on node3:
>> 1. datanode log:
>> 2013-04-17 11:13:44,719 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
>> blk_2478755809192724446_1477 received exception
>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
>> channel to be ready for read. ch :
>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>> 9.50.102.79:50010]
>> 2013-04-17 11:13:44,721 ERROR
>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>> 9.50.102.80:50010,
>> storageID=DS-2038715921-9.50.102.80-50010-1366091297051, infoPort=50075,
>> ipcPort=50020):DataXceiver
>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
>> channel to be ready for read. ch :
>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>> 9.50.102.79:50010]
>>         at
>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>>         at
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>         at
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>         at
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
>>         at java.io.DataInputStream.readShort(DataInputStream.java:306)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:359)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
>>         at java.lang.Thread.run(Thread.java:738)
>> 2013-04-17 11:13:44,818 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>> blk_8413378381769505032_1477 src: /9.50.102.81:35279 dest: /
>> 9.50.102.80:50010
>>
>>
>> 2. tasktracker log:
>> 2013-04-23 11:48:26,783 INFO org.apache.hadoop.mapred.UserLogCleaner:
>> Deleting user log path job_201304152248_0011
>> 2013-04-30 14:48:15,506 ERROR org.apache.hadoop.mapred.TaskTracker:
>> Caught exception: java.io.IOException: Call to node1/9.50.102.81:9001failed on local exception: java.io.IOException: Connection reset by peer
>>         at org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
>>         at org.apache.hadoop.ipc.Client.call(Client.java:1112)
>>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
>>         at org.apache.hadoop.mapred.$Proxy2.heartbeat(Unknown Source)
>>         at
>> org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:2008)
>>         at
>> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1802)
>>         at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2654)
>>         at
>> org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3909)
>> Caused by: java.io.IOException: Connection reset by peer
>>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:33)
>>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:210)
>>         at sun.nio.ch.IOUtil.read(IOUtil.java:183)
>>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:257)
>>         at
>> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>>         at
>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>>         at
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>         at
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>         at java.io.FilterInputStream.read(FilterInputStream.java:127)
>>         at
>> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:361)
>>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:229)
>>         at java.io.BufferedInputStream.read(BufferedInputStream.java:248)
>>         at java.io.DataInputStream.readInt(DataInputStream.java:381)
>>         at
>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841)
>>         at org.apache.hadoop.ipc.Client$Connection.run(Client.java:786)
>>
>> 2013-04-30 14:48:15,517 INFO org.apache.hadoop.mapred.TaskTracker:
>> Resending 'status' to 'node1' with reponseId '-12904
>> 2013-04-30 14:48:16,404 INFO org.apache.hadoop.mapred.TaskTracker:
>> SHUTDOWN_MSG:
>>
>>
>>
>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>
>>> do you get any error when trying to connect to cluster, something like
>>> 'tried n times' or replicated 0 times.
>>>
>>>
>>>
>>>
>>> On Sun, May 12, 2013 at 7:28 PM, sam liu <sa...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I setup a cluster with 3 nodes, and after that I did not submit any job
>>>> on it. But, after few days, I found the cluster is unhealthy:
>>>> - No result returned after issuing command 'hadoop dfs -ls /' or
>>>> 'hadoop dfsadmin -report' for a while
>>>> - The page of 'http://namenode:50070' could not be opened as
>>>> expected...
>>>> - ...
>>>>
>>>> I did not find any usefull info in the logs, but found the avaible
>>>> memory of the cluster nodes are very low at that time:
>>>> - node1(NN,JT,DN,TT): 158 mb mem is available
>>>> - node2(DN,TT): 75 mb mem is available
>>>> - node3(DN,TT): 174 mb mem is available
>>>>
>>>> I guess the issue of my cluster is caused by lacking of memeory, and my
>>>> questions are:
>>>> - Without running jobs, what's the minimum memory requirements to
>>>> datanode and namenode?
>>>> - How to define the minimum memeory for datanode and namenode?
>>>>
>>>> Thanks!
>>>>
>>>> Sam Liu
>>>>
>>>
>>>
>>
>

Re: The minimum memory requirements to datanode and namenode?

Posted by sam liu <sa...@gmail.com>.

For node3, the memory is:
                   total       used       free     shared    buffers
cached
Mem:          3834       3666        167          0        187       1136
-/+ buffers/cache:       2342       1491
Swap:         8196          0       8196

To a 3 nodes cluster as mine, what's the required minimum free/available
memory for the datanode process and tasktracker process, without running
any map/reduce task?
Any formula to determine it?


2013/5/13 Rishi Yadav <ri...@infoobjects.com>

> can you tell specs of node3. Even on a test/demo cluster, anything below 4
> GB ram makes the node almost inaccessible as per my experience.
>
>
>
> On Sun, May 12, 2013 at 8:25 PM, sam liu <sa...@gmail.com> wrote:
>
>> Got some exceptions on node3:
>> 1. datanode log:
>> 2013-04-17 11:13:44,719 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
>> blk_2478755809192724446_1477 received exception
>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
>> channel to be ready for read. ch :
>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>> 9.50.102.79:50010]
>> 2013-04-17 11:13:44,721 ERROR
>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>> 9.50.102.80:50010,
>> storageID=DS-2038715921-9.50.102.80-50010-1366091297051, infoPort=50075,
>> ipcPort=50020):DataXceiver
>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
>> channel to be ready for read. ch :
>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>> 9.50.102.79:50010]
>>         at
>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>>         at
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>         at
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>         at
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
>>         at java.io.DataInputStream.readShort(DataInputStream.java:306)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:359)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
>>         at java.lang.Thread.run(Thread.java:738)
>> 2013-04-17 11:13:44,818 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>> blk_8413378381769505032_1477 src: /9.50.102.81:35279 dest: /
>> 9.50.102.80:50010
>>
>>
>> 2. tasktracker log:
>> 2013-04-23 11:48:26,783 INFO org.apache.hadoop.mapred.UserLogCleaner:
>> Deleting user log path job_201304152248_0011
>> 2013-04-30 14:48:15,506 ERROR org.apache.hadoop.mapred.TaskTracker:
>> Caught exception: java.io.IOException: Call to node1/9.50.102.81:9001failed on local exception: java.io.IOException: Connection reset by peer
>>         at org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
>>         at org.apache.hadoop.ipc.Client.call(Client.java:1112)
>>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
>>         at org.apache.hadoop.mapred.$Proxy2.heartbeat(Unknown Source)
>>         at
>> org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:2008)
>>         at
>> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1802)
>>         at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2654)
>>         at
>> org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3909)
>> Caused by: java.io.IOException: Connection reset by peer
>>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:33)
>>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:210)
>>         at sun.nio.ch.IOUtil.read(IOUtil.java:183)
>>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:257)
>>         at
>> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>>         at
>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>>         at
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>         at
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>         at java.io.FilterInputStream.read(FilterInputStream.java:127)
>>         at
>> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:361)
>>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:229)
>>         at java.io.BufferedInputStream.read(BufferedInputStream.java:248)
>>         at java.io.DataInputStream.readInt(DataInputStream.java:381)
>>         at
>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841)
>>         at org.apache.hadoop.ipc.Client$Connection.run(Client.java:786)
>>
>> 2013-04-30 14:48:15,517 INFO org.apache.hadoop.mapred.TaskTracker:
>> Resending 'status' to 'node1' with reponseId '-12904
>> 2013-04-30 14:48:16,404 INFO org.apache.hadoop.mapred.TaskTracker:
>> SHUTDOWN_MSG:
>>
>>
>>
>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>
>>> do you get any error when trying to connect to cluster, something like
>>> 'tried n times' or replicated 0 times.
>>>
>>>
>>>
>>>
>>> On Sun, May 12, 2013 at 7:28 PM, sam liu <sa...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I setup a cluster with 3 nodes, and after that I did not submit any job
>>>> on it. But, after few days, I found the cluster is unhealthy:
>>>> - No result returned after issuing command 'hadoop dfs -ls /' or
>>>> 'hadoop dfsadmin -report' for a while
>>>> - The page of 'http://namenode:50070' could not be opened as
>>>> expected...
>>>> - ...
>>>>
>>>> I did not find any usefull info in the logs, but found the avaible
>>>> memory of the cluster nodes are very low at that time:
>>>> - node1(NN,JT,DN,TT): 158 mb mem is available
>>>> - node2(DN,TT): 75 mb mem is available
>>>> - node3(DN,TT): 174 mb mem is available
>>>>
>>>> I guess the issue of my cluster is caused by lacking of memeory, and my
>>>> questions are:
>>>> - Without running jobs, what's the minimum memory requirements to
>>>> datanode and namenode?
>>>> - How to define the minimum memeory for datanode and namenode?
>>>>
>>>> Thanks!
>>>>
>>>> Sam Liu
>>>>
>>>
>>>
>>
>

Re: The minimum memory requirements to datanode and namenode?

Posted by sam liu <sa...@gmail.com>.

For node3, the memory is:
                   total       used       free     shared    buffers
cached
Mem:          3834       3666        167          0        187       1136
-/+ buffers/cache:       2342       1491
Swap:         8196          0       8196

To a 3 nodes cluster as mine, what's the required minimum free/available
memory for the datanode process and tasktracker process, without running
any map/reduce task?
Any formula to determine it?


2013/5/13 Rishi Yadav <ri...@infoobjects.com>

> can you tell specs of node3. Even on a test/demo cluster, anything below 4
> GB ram makes the node almost inaccessible as per my experience.
>
>
>
> On Sun, May 12, 2013 at 8:25 PM, sam liu <sa...@gmail.com> wrote:
>
>> Got some exceptions on node3:
>> 1. datanode log:
>> 2013-04-17 11:13:44,719 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
>> blk_2478755809192724446_1477 received exception
>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
>> channel to be ready for read. ch :
>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>> 9.50.102.79:50010]
>> 2013-04-17 11:13:44,721 ERROR
>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>> 9.50.102.80:50010,
>> storageID=DS-2038715921-9.50.102.80-50010-1366091297051, infoPort=50075,
>> ipcPort=50020):DataXceiver
>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
>> channel to be ready for read. ch :
>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>> 9.50.102.79:50010]
>>         at
>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>>         at
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>         at
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>         at
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
>>         at java.io.DataInputStream.readShort(DataInputStream.java:306)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:359)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
>>         at java.lang.Thread.run(Thread.java:738)
>> 2013-04-17 11:13:44,818 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>> blk_8413378381769505032_1477 src: /9.50.102.81:35279 dest: /
>> 9.50.102.80:50010
>>
>>
>> 2. tasktracker log:
>> 2013-04-23 11:48:26,783 INFO org.apache.hadoop.mapred.UserLogCleaner:
>> Deleting user log path job_201304152248_0011
>> 2013-04-30 14:48:15,506 ERROR org.apache.hadoop.mapred.TaskTracker:
>> Caught exception: java.io.IOException: Call to node1/9.50.102.81:9001failed on local exception: java.io.IOException: Connection reset by peer
>>         at org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
>>         at org.apache.hadoop.ipc.Client.call(Client.java:1112)
>>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
>>         at org.apache.hadoop.mapred.$Proxy2.heartbeat(Unknown Source)
>>         at
>> org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:2008)
>>         at
>> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1802)
>>         at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2654)
>>         at
>> org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3909)
>> Caused by: java.io.IOException: Connection reset by peer
>>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:33)
>>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:210)
>>         at sun.nio.ch.IOUtil.read(IOUtil.java:183)
>>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:257)
>>         at
>> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>>         at
>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>>         at
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>         at
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>         at java.io.FilterInputStream.read(FilterInputStream.java:127)
>>         at
>> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:361)
>>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:229)
>>         at java.io.BufferedInputStream.read(BufferedInputStream.java:248)
>>         at java.io.DataInputStream.readInt(DataInputStream.java:381)
>>         at
>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841)
>>         at org.apache.hadoop.ipc.Client$Connection.run(Client.java:786)
>>
>> 2013-04-30 14:48:15,517 INFO org.apache.hadoop.mapred.TaskTracker:
>> Resending 'status' to 'node1' with reponseId '-12904
>> 2013-04-30 14:48:16,404 INFO org.apache.hadoop.mapred.TaskTracker:
>> SHUTDOWN_MSG:
>>
>>
>>
>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>
>>> do you get any error when trying to connect to cluster, something like
>>> 'tried n times' or replicated 0 times.
>>>
>>>
>>>
>>>
>>> On Sun, May 12, 2013 at 7:28 PM, sam liu <sa...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I setup a cluster with 3 nodes, and after that I did not submit any job
>>>> on it. But, after few days, I found the cluster is unhealthy:
>>>> - No result returned after issuing command 'hadoop dfs -ls /' or
>>>> 'hadoop dfsadmin -report' for a while
>>>> - The page of 'http://namenode:50070' could not be opened as
>>>> expected...
>>>> - ...
>>>>
>>>> I did not find any usefull info in the logs, but found the avaible
>>>> memory of the cluster nodes are very low at that time:
>>>> - node1(NN,JT,DN,TT): 158 mb mem is available
>>>> - node2(DN,TT): 75 mb mem is available
>>>> - node3(DN,TT): 174 mb mem is available
>>>>
>>>> I guess the issue of my cluster is caused by lacking of memeory, and my
>>>> questions are:
>>>> - Without running jobs, what's the minimum memory requirements to
>>>> datanode and namenode?
>>>> - How to define the minimum memeory for datanode and namenode?
>>>>
>>>> Thanks!
>>>>
>>>> Sam Liu
>>>>
>>>
>>>
>>
>

Re: The minimum memory requirements to datanode and namenode?

Posted by Rishi Yadav <ri...@infoobjects.com>.

can you tell specs of node3. Even on a test/demo cluster, anything below 4
GB ram makes the node almost inaccessible as per my experience.



On Sun, May 12, 2013 at 8:25 PM, sam liu <sa...@gmail.com> wrote:

> Got some exceptions on node3:
> 1. datanode log:
> 2013-04-17 11:13:44,719 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
> blk_2478755809192724446_1477 received exception
> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
> channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
> 9.50.102.79:50010]
> 2013-04-17 11:13:44,721 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 9.50.102.80:50010,
> storageID=DS-2038715921-9.50.102.80-50010-1366091297051, infoPort=50075,
> ipcPort=50020):DataXceiver
> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
> channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
> 9.50.102.79:50010]
>         at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>         at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>         at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>         at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
>         at java.io.DataInputStream.readShort(DataInputStream.java:306)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:359)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
>         at java.lang.Thread.run(Thread.java:738)
> 2013-04-17 11:13:44,818 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_8413378381769505032_1477 src: /9.50.102.81:35279 dest: /
> 9.50.102.80:50010
>
>
> 2. tasktracker log:
> 2013-04-23 11:48:26,783 INFO org.apache.hadoop.mapred.UserLogCleaner:
> Deleting user log path job_201304152248_0011
> 2013-04-30 14:48:15,506 ERROR org.apache.hadoop.mapred.TaskTracker: Caught
> exception: java.io.IOException: Call to node1/9.50.102.81:9001 failed on
> local exception: java.io.IOException: Connection reset by peer
>         at org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1112)
>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
>         at org.apache.hadoop.mapred.$Proxy2.heartbeat(Unknown Source)
>         at
> org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:2008)
>         at
> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1802)
>         at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2654)
>         at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3909)
> Caused by: java.io.IOException: Connection reset by peer
>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:33)
>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:210)
>         at sun.nio.ch.IOUtil.read(IOUtil.java:183)
>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:257)
>         at
> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>         at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>         at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>         at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>         at java.io.FilterInputStream.read(FilterInputStream.java:127)
>         at
> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:361)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:229)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:248)
>         at java.io.DataInputStream.readInt(DataInputStream.java:381)
>         at
> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841)
>         at org.apache.hadoop.ipc.Client$Connection.run(Client.java:786)
>
> 2013-04-30 14:48:15,517 INFO org.apache.hadoop.mapred.TaskTracker:
> Resending 'status' to 'node1' with reponseId '-12904
> 2013-04-30 14:48:16,404 INFO org.apache.hadoop.mapred.TaskTracker:
> SHUTDOWN_MSG:
>
>
>
> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>
>> do you get any error when trying to connect to cluster, something like
>> 'tried n times' or replicated 0 times.
>>
>>
>>
>>
>> On Sun, May 12, 2013 at 7:28 PM, sam liu <sa...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I setup a cluster with 3 nodes, and after that I did not submit any job
>>> on it. But, after few days, I found the cluster is unhealthy:
>>> - No result returned after issuing command 'hadoop dfs -ls /' or 'hadoop
>>> dfsadmin -report' for a while
>>> - The page of 'http://namenode:50070' could not be opened as expected...
>>> - ...
>>>
>>> I did not find any usefull info in the logs, but found the avaible
>>> memory of the cluster nodes are very low at that time:
>>> - node1(NN,JT,DN,TT): 158 mb mem is available
>>> - node2(DN,TT): 75 mb mem is available
>>> - node3(DN,TT): 174 mb mem is available
>>>
>>> I guess the issue of my cluster is caused by lacking of memeory, and my
>>> questions are:
>>> - Without running jobs, what's the minimum memory requirements to
>>> datanode and namenode?
>>> - How to define the minimum memeory for datanode and namenode?
>>>
>>> Thanks!
>>>
>>> Sam Liu
>>>
>>
>>
>

Re: The minimum memory requirements to datanode and namenode?

Posted by Rishi Yadav <ri...@infoobjects.com>.

can you tell specs of node3. Even on a test/demo cluster, anything below 4
GB ram makes the node almost inaccessible as per my experience.



On Sun, May 12, 2013 at 8:25 PM, sam liu <sa...@gmail.com> wrote:

> Got some exceptions on node3:
> 1. datanode log:
> 2013-04-17 11:13:44,719 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
> blk_2478755809192724446_1477 received exception
> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
> channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
> 9.50.102.79:50010]
> 2013-04-17 11:13:44,721 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 9.50.102.80:50010,
> storageID=DS-2038715921-9.50.102.80-50010-1366091297051, infoPort=50075,
> ipcPort=50020):DataXceiver
> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
> channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
> 9.50.102.79:50010]
>         at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>         at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>         at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>         at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
>         at java.io.DataInputStream.readShort(DataInputStream.java:306)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:359)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
>         at java.lang.Thread.run(Thread.java:738)
> 2013-04-17 11:13:44,818 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_8413378381769505032_1477 src: /9.50.102.81:35279 dest: /
> 9.50.102.80:50010
>
>
> 2. tasktracker log:
> 2013-04-23 11:48:26,783 INFO org.apache.hadoop.mapred.UserLogCleaner:
> Deleting user log path job_201304152248_0011
> 2013-04-30 14:48:15,506 ERROR org.apache.hadoop.mapred.TaskTracker: Caught
> exception: java.io.IOException: Call to node1/9.50.102.81:9001 failed on
> local exception: java.io.IOException: Connection reset by peer
>         at org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1112)
>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
>         at org.apache.hadoop.mapred.$Proxy2.heartbeat(Unknown Source)
>         at
> org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:2008)
>         at
> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1802)
>         at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2654)
>         at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3909)
> Caused by: java.io.IOException: Connection reset by peer
>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:33)
>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:210)
>         at sun.nio.ch.IOUtil.read(IOUtil.java:183)
>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:257)
>         at
> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>         at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>         at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>         at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>         at java.io.FilterInputStream.read(FilterInputStream.java:127)
>         at
> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:361)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:229)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:248)
>         at java.io.DataInputStream.readInt(DataInputStream.java:381)
>         at
> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841)
>         at org.apache.hadoop.ipc.Client$Connection.run(Client.java:786)
>
> 2013-04-30 14:48:15,517 INFO org.apache.hadoop.mapred.TaskTracker:
> Resending 'status' to 'node1' with reponseId '-12904
> 2013-04-30 14:48:16,404 INFO org.apache.hadoop.mapred.TaskTracker:
> SHUTDOWN_MSG:
>
>
>
> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>
>> do you get any error when trying to connect to cluster, something like
>> 'tried n times' or replicated 0 times.
>>
>>
>>
>>
>> On Sun, May 12, 2013 at 7:28 PM, sam liu <sa...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I setup a cluster with 3 nodes, and after that I did not submit any job
>>> on it. But, after few days, I found the cluster is unhealthy:
>>> - No result returned after issuing command 'hadoop dfs -ls /' or 'hadoop
>>> dfsadmin -report' for a while
>>> - The page of 'http://namenode:50070' could not be opened as expected...
>>> - ...
>>>
>>> I did not find any usefull info in the logs, but found the avaible
>>> memory of the cluster nodes are very low at that time:
>>> - node1(NN,JT,DN,TT): 158 mb mem is available
>>> - node2(DN,TT): 75 mb mem is available
>>> - node3(DN,TT): 174 mb mem is available
>>>
>>> I guess the issue of my cluster is caused by lacking of memeory, and my
>>> questions are:
>>> - Without running jobs, what's the minimum memory requirements to
>>> datanode and namenode?
>>> - How to define the minimum memeory for datanode and namenode?
>>>
>>> Thanks!
>>>
>>> Sam Liu
>>>
>>
>>
>

Re: The minimum memory requirements to datanode and namenode?

Posted by Rishi Yadav <ri...@infoobjects.com>.

can you tell specs of node3. Even on a test/demo cluster, anything below 4
GB ram makes the node almost inaccessible as per my experience.



On Sun, May 12, 2013 at 8:25 PM, sam liu <sa...@gmail.com> wrote:

> Got some exceptions on node3:
> 1. datanode log:
> 2013-04-17 11:13:44,719 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
> blk_2478755809192724446_1477 received exception
> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
> channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
> 9.50.102.79:50010]
> 2013-04-17 11:13:44,721 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 9.50.102.80:50010,
> storageID=DS-2038715921-9.50.102.80-50010-1366091297051, infoPort=50075,
> ipcPort=50020):DataXceiver
> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
> channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
> 9.50.102.79:50010]
>         at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>         at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>         at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>         at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
>         at java.io.DataInputStream.readShort(DataInputStream.java:306)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:359)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
>         at java.lang.Thread.run(Thread.java:738)
> 2013-04-17 11:13:44,818 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_8413378381769505032_1477 src: /9.50.102.81:35279 dest: /
> 9.50.102.80:50010
>
>
> 2. tasktracker log:
> 2013-04-23 11:48:26,783 INFO org.apache.hadoop.mapred.UserLogCleaner:
> Deleting user log path job_201304152248_0011
> 2013-04-30 14:48:15,506 ERROR org.apache.hadoop.mapred.TaskTracker: Caught
> exception: java.io.IOException: Call to node1/9.50.102.81:9001 failed on
> local exception: java.io.IOException: Connection reset by peer
>         at org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1112)
>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
>         at org.apache.hadoop.mapred.$Proxy2.heartbeat(Unknown Source)
>         at
> org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:2008)
>         at
> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1802)
>         at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2654)
>         at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3909)
> Caused by: java.io.IOException: Connection reset by peer
>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:33)
>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:210)
>         at sun.nio.ch.IOUtil.read(IOUtil.java:183)
>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:257)
>         at
> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>         at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>         at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>         at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>         at java.io.FilterInputStream.read(FilterInputStream.java:127)
>         at
> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:361)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:229)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:248)
>         at java.io.DataInputStream.readInt(DataInputStream.java:381)
>         at
> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841)
>         at org.apache.hadoop.ipc.Client$Connection.run(Client.java:786)
>
> 2013-04-30 14:48:15,517 INFO org.apache.hadoop.mapred.TaskTracker:
> Resending 'status' to 'node1' with reponseId '-12904
> 2013-04-30 14:48:16,404 INFO org.apache.hadoop.mapred.TaskTracker:
> SHUTDOWN_MSG:
>
>
>
> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>
>> do you get any error when trying to connect to cluster, something like
>> 'tried n times' or replicated 0 times.
>>
>>
>>
>>
>> On Sun, May 12, 2013 at 7:28 PM, sam liu <sa...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I setup a cluster with 3 nodes, and after that I did not submit any job
>>> on it. But, after few days, I found the cluster is unhealthy:
>>> - No result returned after issuing command 'hadoop dfs -ls /' or 'hadoop
>>> dfsadmin -report' for a while
>>> - The page of 'http://namenode:50070' could not be opened as expected...
>>> - ...
>>>
>>> I did not find any usefull info in the logs, but found the avaible
>>> memory of the cluster nodes are very low at that time:
>>> - node1(NN,JT,DN,TT): 158 mb mem is available
>>> - node2(DN,TT): 75 mb mem is available
>>> - node3(DN,TT): 174 mb mem is available
>>>
>>> I guess the issue of my cluster is caused by lacking of memeory, and my
>>> questions are:
>>> - Without running jobs, what's the minimum memory requirements to
>>> datanode and namenode?
>>> - How to define the minimum memeory for datanode and namenode?
>>>
>>> Thanks!
>>>
>>> Sam Liu
>>>
>>
>>
>

Re: The minimum memory requirements to datanode and namenode?

Posted by Rishi Yadav <ri...@infoobjects.com>.

can you tell specs of node3. Even on a test/demo cluster, anything below 4
GB ram makes the node almost inaccessible as per my experience.



On Sun, May 12, 2013 at 8:25 PM, sam liu <sa...@gmail.com> wrote:

> Got some exceptions on node3:
> 1. datanode log:
> 2013-04-17 11:13:44,719 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
> blk_2478755809192724446_1477 received exception
> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
> channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
> 9.50.102.79:50010]
> 2013-04-17 11:13:44,721 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 9.50.102.80:50010,
> storageID=DS-2038715921-9.50.102.80-50010-1366091297051, infoPort=50075,
> ipcPort=50020):DataXceiver
> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
> channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
> 9.50.102.79:50010]
>         at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>         at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>         at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>         at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
>         at java.io.DataInputStream.readShort(DataInputStream.java:306)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:359)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
>         at java.lang.Thread.run(Thread.java:738)
> 2013-04-17 11:13:44,818 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_8413378381769505032_1477 src: /9.50.102.81:35279 dest: /
> 9.50.102.80:50010
>
>
> 2. tasktracker log:
> 2013-04-23 11:48:26,783 INFO org.apache.hadoop.mapred.UserLogCleaner:
> Deleting user log path job_201304152248_0011
> 2013-04-30 14:48:15,506 ERROR org.apache.hadoop.mapred.TaskTracker: Caught
> exception: java.io.IOException: Call to node1/9.50.102.81:9001 failed on
> local exception: java.io.IOException: Connection reset by peer
>         at org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1112)
>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
>         at org.apache.hadoop.mapred.$Proxy2.heartbeat(Unknown Source)
>         at
> org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:2008)
>         at
> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1802)
>         at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2654)
>         at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3909)
> Caused by: java.io.IOException: Connection reset by peer
>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:33)
>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:210)
>         at sun.nio.ch.IOUtil.read(IOUtil.java:183)
>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:257)
>         at
> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>         at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>         at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>         at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>         at java.io.FilterInputStream.read(FilterInputStream.java:127)
>         at
> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:361)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:229)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:248)
>         at java.io.DataInputStream.readInt(DataInputStream.java:381)
>         at
> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841)
>         at org.apache.hadoop.ipc.Client$Connection.run(Client.java:786)
>
> 2013-04-30 14:48:15,517 INFO org.apache.hadoop.mapred.TaskTracker:
> Resending 'status' to 'node1' with reponseId '-12904
> 2013-04-30 14:48:16,404 INFO org.apache.hadoop.mapred.TaskTracker:
> SHUTDOWN_MSG:
>
>
>
> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>
>> do you get any error when trying to connect to cluster, something like
>> 'tried n times' or replicated 0 times.
>>
>>
>>
>>
>> On Sun, May 12, 2013 at 7:28 PM, sam liu <sa...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I setup a cluster with 3 nodes, and after that I did not submit any job
>>> on it. But, after few days, I found the cluster is unhealthy:
>>> - No result returned after issuing command 'hadoop dfs -ls /' or 'hadoop
>>> dfsadmin -report' for a while
>>> - The page of 'http://namenode:50070' could not be opened as expected...
>>> - ...
>>>
>>> I did not find any usefull info in the logs, but found the avaible
>>> memory of the cluster nodes are very low at that time:
>>> - node1(NN,JT,DN,TT): 158 mb mem is available
>>> - node2(DN,TT): 75 mb mem is available
>>> - node3(DN,TT): 174 mb mem is available
>>>
>>> I guess the issue of my cluster is caused by lacking of memeory, and my
>>> questions are:
>>> - Without running jobs, what's the minimum memory requirements to
>>> datanode and namenode?
>>> - How to define the minimum memeory for datanode and namenode?
>>>
>>> Thanks!
>>>
>>> Sam Liu
>>>
>>
>>
>

Re: The minimum memory requirements to datanode and namenode?

Posted by sam liu <sa...@gmail.com>.

Got some exceptions on node3:
1. datanode log:
2013-04-17 11:13:44,719 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
blk_2478755809192724446_1477 received exception
java.net.SocketTimeoutException: 63000 millis timeout while waiting for
channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371 remote=/
9.50.102.79:50010]
2013-04-17 11:13:44,721 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
9.50.102.80:50010, storageID=DS-2038715921-9.50.102.80-50010-1366091297051,
infoPort=50075, ipcPort=50020):DataXceiver
java.net.SocketTimeoutException: 63000 millis timeout while waiting for
channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371 remote=/
9.50.102.79:50010]
        at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
        at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
        at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
        at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
        at java.io.DataInputStream.readShort(DataInputStream.java:306)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:359)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
        at java.lang.Thread.run(Thread.java:738)
2013-04-17 11:13:44,818 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_8413378381769505032_1477 src: /9.50.102.81:35279 dest: /
9.50.102.80:50010


2. tasktracker log:
2013-04-23 11:48:26,783 INFO org.apache.hadoop.mapred.UserLogCleaner:
Deleting user log path job_201304152248_0011
2013-04-30 14:48:15,506 ERROR org.apache.hadoop.mapred.TaskTracker: Caught
exception: java.io.IOException: Call to node1/9.50.102.81:9001 failed on
local exception: java.io.IOException: Connection reset by peer
        at org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
        at org.apache.hadoop.ipc.Client.call(Client.java:1112)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
        at org.apache.hadoop.mapred.$Proxy2.heartbeat(Unknown Source)
        at
org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:2008)
        at
org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1802)
        at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2654)
        at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3909)
Caused by: java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcher.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:33)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:210)
        at sun.nio.ch.IOUtil.read(IOUtil.java:183)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:257)
        at
org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
        at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
        at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
        at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
        at java.io.FilterInputStream.read(FilterInputStream.java:127)
        at
org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:361)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:229)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:248)
        at java.io.DataInputStream.readInt(DataInputStream.java:381)
        at
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:786)

2013-04-30 14:48:15,517 INFO org.apache.hadoop.mapred.TaskTracker:
Resending 'status' to 'node1' with reponseId '-12904
2013-04-30 14:48:16,404 INFO org.apache.hadoop.mapred.TaskTracker:
SHUTDOWN_MSG:



2013/5/13 Rishi Yadav <ri...@infoobjects.com>

> do you get any error when trying to connect to cluster, something like
> 'tried n times' or replicated 0 times.
>
>
>
>
> On Sun, May 12, 2013 at 7:28 PM, sam liu <sa...@gmail.com> wrote:
>
>> Hi,
>>
>> I setup a cluster with 3 nodes, and after that I did not submit any job
>> on it. But, after few days, I found the cluster is unhealthy:
>> - No result returned after issuing command 'hadoop dfs -ls /' or 'hadoop
>> dfsadmin -report' for a while
>> - The page of 'http://namenode:50070' could not be opened as expected...
>> - ...
>>
>> I did not find any usefull info in the logs, but found the avaible memory
>> of the cluster nodes are very low at that time:
>> - node1(NN,JT,DN,TT): 158 mb mem is available
>> - node2(DN,TT): 75 mb mem is available
>> - node3(DN,TT): 174 mb mem is available
>>
>> I guess the issue of my cluster is caused by lacking of memeory, and my
>> questions are:
>> - Without running jobs, what's the minimum memory requirements to
>> datanode and namenode?
>> - How to define the minimum memeory for datanode and namenode?
>>
>> Thanks!
>>
>> Sam Liu
>>
>
>

Re: The minimum memory requirements to datanode and namenode?

Posted by sam liu <sa...@gmail.com>.

Got some exceptions on node3:
1. datanode log:
2013-04-17 11:13:44,719 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
blk_2478755809192724446_1477 received exception
java.net.SocketTimeoutException: 63000 millis timeout while waiting for
channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371 remote=/
9.50.102.79:50010]
2013-04-17 11:13:44,721 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
9.50.102.80:50010, storageID=DS-2038715921-9.50.102.80-50010-1366091297051,
infoPort=50075, ipcPort=50020):DataXceiver
java.net.SocketTimeoutException: 63000 millis timeout while waiting for
channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371 remote=/
9.50.102.79:50010]
        at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
        at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
        at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
        at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
        at java.io.DataInputStream.readShort(DataInputStream.java:306)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:359)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
        at java.lang.Thread.run(Thread.java:738)
2013-04-17 11:13:44,818 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_8413378381769505032_1477 src: /9.50.102.81:35279 dest: /
9.50.102.80:50010


2. tasktracker log:
2013-04-23 11:48:26,783 INFO org.apache.hadoop.mapred.UserLogCleaner:
Deleting user log path job_201304152248_0011
2013-04-30 14:48:15,506 ERROR org.apache.hadoop.mapred.TaskTracker: Caught
exception: java.io.IOException: Call to node1/9.50.102.81:9001 failed on
local exception: java.io.IOException: Connection reset by peer
        at org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
        at org.apache.hadoop.ipc.Client.call(Client.java:1112)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
        at org.apache.hadoop.mapred.$Proxy2.heartbeat(Unknown Source)
        at
org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:2008)
        at
org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1802)
        at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2654)
        at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3909)
Caused by: java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcher.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:33)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:210)
        at sun.nio.ch.IOUtil.read(IOUtil.java:183)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:257)
        at
org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
        at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
        at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
        at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
        at java.io.FilterInputStream.read(FilterInputStream.java:127)
        at
org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:361)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:229)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:248)
        at java.io.DataInputStream.readInt(DataInputStream.java:381)
        at
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:786)

2013-04-30 14:48:15,517 INFO org.apache.hadoop.mapred.TaskTracker:
Resending 'status' to 'node1' with reponseId '-12904
2013-04-30 14:48:16,404 INFO org.apache.hadoop.mapred.TaskTracker:
SHUTDOWN_MSG:



2013/5/13 Rishi Yadav <ri...@infoobjects.com>

> do you get any error when trying to connect to cluster, something like
> 'tried n times' or replicated 0 times.
>
>
>
>
> On Sun, May 12, 2013 at 7:28 PM, sam liu <sa...@gmail.com> wrote:
>
>> Hi,
>>
>> I setup a cluster with 3 nodes, and after that I did not submit any job
>> on it. But, after few days, I found the cluster is unhealthy:
>> - No result returned after issuing command 'hadoop dfs -ls /' or 'hadoop
>> dfsadmin -report' for a while
>> - The page of 'http://namenode:50070' could not be opened as expected...
>> - ...
>>
>> I did not find any usefull info in the logs, but found the avaible memory
>> of the cluster nodes are very low at that time:
>> - node1(NN,JT,DN,TT): 158 mb mem is available
>> - node2(DN,TT): 75 mb mem is available
>> - node3(DN,TT): 174 mb mem is available
>>
>> I guess the issue of my cluster is caused by lacking of memeory, and my
>> questions are:
>> - Without running jobs, what's the minimum memory requirements to
>> datanode and namenode?
>> - How to define the minimum memeory for datanode and namenode?
>>
>> Thanks!
>>
>> Sam Liu
>>
>
>

Re: The minimum memory requirements to datanode and namenode?

Posted by sam liu <sa...@gmail.com>.

Got some exceptions on node3:
1. datanode log:
2013-04-17 11:13:44,719 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
blk_2478755809192724446_1477 received exception
java.net.SocketTimeoutException: 63000 millis timeout while waiting for
channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371 remote=/
9.50.102.79:50010]
2013-04-17 11:13:44,721 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
9.50.102.80:50010, storageID=DS-2038715921-9.50.102.80-50010-1366091297051,
infoPort=50075, ipcPort=50020):DataXceiver
java.net.SocketTimeoutException: 63000 millis timeout while waiting for
channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371 remote=/
9.50.102.79:50010]
        at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
        at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
        at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
        at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
        at java.io.DataInputStream.readShort(DataInputStream.java:306)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:359)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
        at java.lang.Thread.run(Thread.java:738)
2013-04-17 11:13:44,818 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_8413378381769505032_1477 src: /9.50.102.81:35279 dest: /
9.50.102.80:50010


2. tasktracker log:
2013-04-23 11:48:26,783 INFO org.apache.hadoop.mapred.UserLogCleaner:
Deleting user log path job_201304152248_0011
2013-04-30 14:48:15,506 ERROR org.apache.hadoop.mapred.TaskTracker: Caught
exception: java.io.IOException: Call to node1/9.50.102.81:9001 failed on
local exception: java.io.IOException: Connection reset by peer
        at org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
        at org.apache.hadoop.ipc.Client.call(Client.java:1112)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
        at org.apache.hadoop.mapred.$Proxy2.heartbeat(Unknown Source)
        at
org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:2008)
        at
org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1802)
        at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2654)
        at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3909)
Caused by: java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcher.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:33)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:210)
        at sun.nio.ch.IOUtil.read(IOUtil.java:183)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:257)
        at
org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
        at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
        at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
        at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
        at java.io.FilterInputStream.read(FilterInputStream.java:127)
        at
org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:361)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:229)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:248)
        at java.io.DataInputStream.readInt(DataInputStream.java:381)
        at
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:786)

2013-04-30 14:48:15,517 INFO org.apache.hadoop.mapred.TaskTracker:
Resending 'status' to 'node1' with reponseId '-12904
2013-04-30 14:48:16,404 INFO org.apache.hadoop.mapred.TaskTracker:
SHUTDOWN_MSG:



2013/5/13 Rishi Yadav <ri...@infoobjects.com>

> do you get any error when trying to connect to cluster, something like
> 'tried n times' or replicated 0 times.
>
>
>
>
> On Sun, May 12, 2013 at 7:28 PM, sam liu <sa...@gmail.com> wrote:
>
>> Hi,
>>
>> I setup a cluster with 3 nodes, and after that I did not submit any job
>> on it. But, after few days, I found the cluster is unhealthy:
>> - No result returned after issuing command 'hadoop dfs -ls /' or 'hadoop
>> dfsadmin -report' for a while
>> - The page of 'http://namenode:50070' could not be opened as expected...
>> - ...
>>
>> I did not find any usefull info in the logs, but found the avaible memory
>> of the cluster nodes are very low at that time:
>> - node1(NN,JT,DN,TT): 158 mb mem is available
>> - node2(DN,TT): 75 mb mem is available
>> - node3(DN,TT): 174 mb mem is available
>>
>> I guess the issue of my cluster is caused by lacking of memeory, and my
>> questions are:
>> - Without running jobs, what's the minimum memory requirements to
>> datanode and namenode?
>> - How to define the minimum memeory for datanode and namenode?
>>
>> Thanks!
>>
>> Sam Liu
>>
>
>

Re: The minimum memory requirements to datanode and namenode?

Posted by sam liu <sa...@gmail.com>.

Got some exceptions on node3:
1. datanode log:
2013-04-17 11:13:44,719 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
blk_2478755809192724446_1477 received exception
java.net.SocketTimeoutException: 63000 millis timeout while waiting for
channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371 remote=/
9.50.102.79:50010]
2013-04-17 11:13:44,721 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
9.50.102.80:50010, storageID=DS-2038715921-9.50.102.80-50010-1366091297051,
infoPort=50075, ipcPort=50020):DataXceiver
java.net.SocketTimeoutException: 63000 millis timeout while waiting for
channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371 remote=/
9.50.102.79:50010]
        at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
        at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
        at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
        at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
        at java.io.DataInputStream.readShort(DataInputStream.java:306)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:359)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
        at java.lang.Thread.run(Thread.java:738)
2013-04-17 11:13:44,818 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_8413378381769505032_1477 src: /9.50.102.81:35279 dest: /
9.50.102.80:50010


2. tasktracker log:
2013-04-23 11:48:26,783 INFO org.apache.hadoop.mapred.UserLogCleaner:
Deleting user log path job_201304152248_0011
2013-04-30 14:48:15,506 ERROR org.apache.hadoop.mapred.TaskTracker: Caught
exception: java.io.IOException: Call to node1/9.50.102.81:9001 failed on
local exception: java.io.IOException: Connection reset by peer
        at org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
        at org.apache.hadoop.ipc.Client.call(Client.java:1112)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
        at org.apache.hadoop.mapred.$Proxy2.heartbeat(Unknown Source)
        at
org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:2008)
        at
org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1802)
        at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2654)
        at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3909)
Caused by: java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcher.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:33)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:210)
        at sun.nio.ch.IOUtil.read(IOUtil.java:183)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:257)
        at
org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
        at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
        at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
        at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
        at java.io.FilterInputStream.read(FilterInputStream.java:127)
        at
org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:361)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:229)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:248)
        at java.io.DataInputStream.readInt(DataInputStream.java:381)
        at
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:786)

2013-04-30 14:48:15,517 INFO org.apache.hadoop.mapred.TaskTracker:
Resending 'status' to 'node1' with reponseId '-12904
2013-04-30 14:48:16,404 INFO org.apache.hadoop.mapred.TaskTracker:
SHUTDOWN_MSG:



2013/5/13 Rishi Yadav <ri...@infoobjects.com>

> do you get any error when trying to connect to cluster, something like
> 'tried n times' or replicated 0 times.
>
>
>
>
> On Sun, May 12, 2013 at 7:28 PM, sam liu <sa...@gmail.com> wrote:
>
>> Hi,
>>
>> I setup a cluster with 3 nodes, and after that I did not submit any job
>> on it. But, after few days, I found the cluster is unhealthy:
>> - No result returned after issuing command 'hadoop dfs -ls /' or 'hadoop
>> dfsadmin -report' for a while
>> - The page of 'http://namenode:50070' could not be opened as expected...
>> - ...
>>
>> I did not find any usefull info in the logs, but found the avaible memory
>> of the cluster nodes are very low at that time:
>> - node1(NN,JT,DN,TT): 158 mb mem is available
>> - node2(DN,TT): 75 mb mem is available
>> - node3(DN,TT): 174 mb mem is available
>>
>> I guess the issue of my cluster is caused by lacking of memeory, and my
>> questions are:
>> - Without running jobs, what's the minimum memory requirements to
>> datanode and namenode?
>> - How to define the minimum memeory for datanode and namenode?
>>
>> Thanks!
>>
>> Sam Liu
>>
>
>

Re: The minimum memory requirements to datanode and namenode?

Posted by Rishi Yadav <ri...@infoobjects.com>.

do you get any error when trying to connect to cluster, something like
'tried n times' or replicated 0 times.




On Sun, May 12, 2013 at 7:28 PM, sam liu <sa...@gmail.com> wrote:

> Hi,
>
> I setup a cluster with 3 nodes, and after that I did not submit any job on
> it. But, after few days, I found the cluster is unhealthy:
> - No result returned after issuing command 'hadoop dfs -ls /' or 'hadoop
> dfsadmin -report' for a while
> - The page of 'http://namenode:50070' could not be opened as expected...
> - ...
>
> I did not find any usefull info in the logs, but found the avaible memory
> of the cluster nodes are very low at that time:
> - node1(NN,JT,DN,TT): 158 mb mem is available
> - node2(DN,TT): 75 mb mem is available
> - node3(DN,TT): 174 mb mem is available
>
> I guess the issue of my cluster is caused by lacking of memeory, and my
> questions are:
> - Without running jobs, what's the minimum memory requirements to datanode
> and namenode?
> - How to define the minimum memeory for datanode and namenode?
>
> Thanks!
>
> Sam Liu
>

Re: The minimum memory requirements to datanode and namenode?

Posted by Rishi Yadav <ri...@infoobjects.com>.

do you get any error when trying to connect to cluster, something like
'tried n times' or replicated 0 times.




On Sun, May 12, 2013 at 7:28 PM, sam liu <sa...@gmail.com> wrote:

> Hi,
>
> I setup a cluster with 3 nodes, and after that I did not submit any job on
> it. But, after few days, I found the cluster is unhealthy:
> - No result returned after issuing command 'hadoop dfs -ls /' or 'hadoop
> dfsadmin -report' for a while
> - The page of 'http://namenode:50070' could not be opened as expected...
> - ...
>
> I did not find any usefull info in the logs, but found the avaible memory
> of the cluster nodes are very low at that time:
> - node1(NN,JT,DN,TT): 158 mb mem is available
> - node2(DN,TT): 75 mb mem is available
> - node3(DN,TT): 174 mb mem is available
>
> I guess the issue of my cluster is caused by lacking of memeory, and my
> questions are:
> - Without running jobs, what's the minimum memory requirements to datanode
> and namenode?
> - How to define the minimum memeory for datanode and namenode?
>
> Thanks!
>
> Sam Liu
>

Re: The minimum memory requirements to datanode and namenode?

Posted by Rishi Yadav <ri...@infoobjects.com>.

do you get any error when trying to connect to cluster, something like
'tried n times' or replicated 0 times.




On Sun, May 12, 2013 at 7:28 PM, sam liu <sa...@gmail.com> wrote:

> Hi,
>
> I setup a cluster with 3 nodes, and after that I did not submit any job on
> it. But, after few days, I found the cluster is unhealthy:
> - No result returned after issuing command 'hadoop dfs -ls /' or 'hadoop
> dfsadmin -report' for a while
> - The page of 'http://namenode:50070' could not be opened as expected...
> - ...
>
> I did not find any usefull info in the logs, but found the avaible memory
> of the cluster nodes are very low at that time:
> - node1(NN,JT,DN,TT): 158 mb mem is available
> - node2(DN,TT): 75 mb mem is available
> - node3(DN,TT): 174 mb mem is available
>
> I guess the issue of my cluster is caused by lacking of memeory, and my
> questions are:
> - Without running jobs, what's the minimum memory requirements to datanode
> and namenode?
> - How to define the minimum memeory for datanode and namenode?
>
> Thanks!
>
> Sam Liu
>

Re: The minimum memory requirements to datanode and namenode?

Posted by Rishi Yadav <ri...@infoobjects.com>.

do you get any error when trying to connect to cluster, something like
'tried n times' or replicated 0 times.




On Sun, May 12, 2013 at 7:28 PM, sam liu <sa...@gmail.com> wrote:

> Hi,
>
> I setup a cluster with 3 nodes, and after that I did not submit any job on
> it. But, after few days, I found the cluster is unhealthy:
> - No result returned after issuing command 'hadoop dfs -ls /' or 'hadoop
> dfsadmin -report' for a while
> - The page of 'http://namenode:50070' could not be opened as expected...
> - ...
>
> I did not find any usefull info in the logs, but found the avaible memory
> of the cluster nodes are very low at that time:
> - node1(NN,JT,DN,TT): 158 mb mem is available
> - node2(DN,TT): 75 mb mem is available
> - node3(DN,TT): 174 mb mem is available
>
> I guess the issue of my cluster is caused by lacking of memeory, and my
> questions are:
> - Without running jobs, what's the minimum memory requirements to datanode
> and namenode?
> - How to define the minimum memeory for datanode and namenode?
>
> Thanks!
>
> Sam Liu
>