You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Pat Ferrel <pa...@occamsmachete.com> on 2012/05/24 00:44:04 UTC

3 machine cluster trouble

I have a two machine cluster and am adding a new machine. The new node 
has a different location for hadoop.tmp.dir than the other two nodes and 
refuses to start the datanode when started in the cluster. When I change 
the location pointed to by hadoop.tmp.dir to be the same on all machines 
it starts up fine on all machines.

Shouldn't I be able to have the master and slave1 set as:
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>

And slave2 set as:
<property>
<name>hadoop.tmp.dir</name>
<value>/media/d2/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>

??? Slave2 runs standalone in single node mode just fine. Using 0.20.205.

Re: 3 machine cluster trouble

Posted by Pat Ferrel <pa...@occamsmachete.com>.

Oops, after a few trials I got an ERROR for incompatible builds 
versions. Copied code from the master, reformatted, et voila.

On 5/24/12 11:34 AM, Pat Ferrel wrote:
> ok, so all nodes are configured the same except for master/slave 
> differences. They are all running hdfs all daemons seem to be running 
> when I do a start-all.sh from the master. However the master 
> Map/Reduce Administration page shows only two live nodes. The HDFS 
> page shows 3.
>
> Looking at the log files on the new slave node I see no outright 
> errors but see this in the tasktracker log file. All machines have 8G 
> memory. I think the important part below is TaskTracker's 
> totalMemoryAllottedForTasks is -1. I've searched for others with this 
> problem but haven't found something for my case, which is just trying 
> to startup. No tasks have been run.
>
> 2012-05-24 11:20:46,786 INFO org.apache.hadoop.mapred.TaskTracker: 
> Starting tracker tracker_occam3:localhost/127.0.0.1:45700
> 2012-05-24 11:20:46,792 INFO org.apache.hadoop.mapred.TaskTracker: 
> Starting thread: Map-events fetcher for all reduce tasks on 
> tracker_occam3:localhost/127.0.0.1:45700
> 2012-05-24 11:20:46,792 INFO org.apache.hadoop.mapred.TaskTracker:  
> Using ResourceCalculatorPlugin : 
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@5abd09e8
> 2012-05-24 11:20:46,795 WARN org.apache.hadoop.mapred.TaskTracker: 
> TaskTracker's totalMemoryAllottedForTasks is -1. TaskMemoryManager is 
> disabled.
> 2012-05-24 11:20:46,795 INFO org.apache.hadoop.mapred.IndexCache: 
> IndexCache created with max memory = 10485760
> 2012-05-24 11:20:46,800 INFO org.apache.hadoop.mapred.TaskTracker: 
> Shutting down: Map-events fetcher for all reduce tasks on 
> tracker_occam3:localhost/127.0.0.1:45700
> 2012-05-24 11:20:46,800 INFO 
> org.apache.hadoop.filecache.TrackerDistributedCacheManager: Cleanup...
> java.lang.InterruptedException: sleep interrupted
>     at java.lang.Thread.sleep(Native Method)
>     at 
> org.apache.hadoop.filecache.TrackerDistributedCacheManager$CleanupThread.run(TrackerDistributedCacheManager.java:926)
> 2012-05-24 11:20:46,900 INFO org.apache.hadoop.ipc.Server: Stopping 
> server on 45700
> 2012-05-24 11:20:46,901 INFO org.apache.hadoop.ipc.Server: IPC Server 
> handler 3 on 45700: exiting
> 2012-05-24 11:20:46,901 INFO org.apache.hadoop.ipc.Server: IPC Server 
> handler 1 on 45700: exiting
> 2012-05-24 11:20:46,902 INFO org.apache.hadoop.ipc.Server: IPC Server 
> handler 2 on 45700: exiting
> 2012-05-24 11:20:46,902 INFO org.apache.hadoop.ipc.Server: Stopping 
> IPC Server listener on 45700
> 2012-05-24 11:20:46,901 INFO org.apache.hadoop.ipc.Server: IPC Server 
> handler 0 on 45700: exiting
> 2012-05-24 11:20:46,904 INFO 
> org.apache.hadoop.ipc.metrics.RpcInstrumentation: shut down
> 2012-05-24 11:20:46,904 INFO org.apache.hadoop.mapred.TaskTracker: 
> Shutting down StatusHttpServer
> 2012-05-24 11:20:46,904 INFO org.apache.hadoop.ipc.Server: IPC Server 
> handler 7 on 45700: exiting
> 2012-05-24 11:20:46,903 INFO org.apache.hadoop.ipc.Server: IPC Server 
> handler 6 on 45700: exiting
> 2012-05-24 11:20:46,903 INFO org.apache.hadoop.ipc.Server: IPC Server 
> handler 4 on 45700: exiting
> 2012-05-24 11:20:46,904 INFO org.apache.hadoop.ipc.Server: IPC Server 
> handler 5 on 45700: exiting
> 2012-05-24 11:20:46,904 INFO org.apache.hadoop.ipc.Server: Stopping 
> IPC Server Responder
> 2012-05-24 11:20:46,909 INFO org.mortbay.log: Stopped 
> SelectChannelConnector@0.0.0.0:50060
>
>
>
> On 5/23/12 3:55 PM, James Warren wrote:
>> Hi Pat -
>>
>> The setting for hadoop.tmp.dir is used both locally and on HDFS and
>> therefore should be consistent across your cluster.
>>
>> http://stackoverflow.com/questions/2354525/what-should-be-hadoop-tmp-dir
>>
>> cheers,
>> -James
>>
>> On Wed, May 23, 2012 at 3:44 PM, Pat Ferrel<pa...@occamsmachete.com>  
>> wrote:
>>
>>> I have a two machine cluster and am adding a new machine. The new 
>>> node has
>>> a different location for hadoop.tmp.dir than the other two nodes and
>>> refuses to start the datanode when started in the cluster. When I 
>>> change
>>> the location pointed to by hadoop.tmp.dir to be the same on all 
>>> machines it
>>> starts up fine on all machines.
>>>
>>> Shouldn't I be able to have the master and slave1 set as:
>>> <property>
>>> <name>hadoop.tmp.dir</name>
>>> <value>/app/hadoop/tmp</value>
>>> <description>A base for other temporary directories.</description>
>>> </property>
>>>
>>> And slave2 set as:
>>> <property>
>>> <name>hadoop.tmp.dir</name>
>>> <value>/media/d2/app/hadoop/**tmp</value>
>>> <description>A base for other temporary directories.</description>
>>> </property>
>>>
>>> ??? Slave2 runs standalone in single node mode just fine. Using 
>>> 0.20.205.
>>>

Re: 3 machine cluster trouble

Posted by Pat Ferrel <pa...@occamsmachete.com>.

ok, so all nodes are configured the same except for master/slave 
differences. They are all running hdfs all daemons seem to be running 
when I do a start-all.sh from the master. However the master Map/Reduce 
Administration page shows only two live nodes. The HDFS page shows 3.

Looking at the log files on the new slave node I see no outright errors 
but see this in the tasktracker log file. All machines have 8G memory. I 
think the important part below is TaskTracker's 
totalMemoryAllottedForTasks is -1. I've searched for others with this 
problem but haven't found something for my case, which is just trying to 
startup. No tasks have been run.

2012-05-24 11:20:46,786 INFO org.apache.hadoop.mapred.TaskTracker: 
Starting tracker tracker_occam3:localhost/127.0.0.1:45700
2012-05-24 11:20:46,792 INFO org.apache.hadoop.mapred.TaskTracker: 
Starting thread: Map-events fetcher for all reduce tasks on 
tracker_occam3:localhost/127.0.0.1:45700
2012-05-24 11:20:46,792 INFO org.apache.hadoop.mapred.TaskTracker:  
Using ResourceCalculatorPlugin : 
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@5abd09e8
2012-05-24 11:20:46,795 WARN org.apache.hadoop.mapred.TaskTracker: 
TaskTracker's totalMemoryAllottedForTasks is -1. TaskMemoryManager is 
disabled.
2012-05-24 11:20:46,795 INFO org.apache.hadoop.mapred.IndexCache: 
IndexCache created with max memory = 10485760
2012-05-24 11:20:46,800 INFO org.apache.hadoop.mapred.TaskTracker: 
Shutting down: Map-events fetcher for all reduce tasks on 
tracker_occam3:localhost/127.0.0.1:45700
2012-05-24 11:20:46,800 INFO 
org.apache.hadoop.filecache.TrackerDistributedCacheManager: Cleanup...
java.lang.InterruptedException: sleep interrupted
     at java.lang.Thread.sleep(Native Method)
     at 
org.apache.hadoop.filecache.TrackerDistributedCacheManager$CleanupThread.run(TrackerDistributedCacheManager.java:926)
2012-05-24 11:20:46,900 INFO org.apache.hadoop.ipc.Server: Stopping 
server on 45700
2012-05-24 11:20:46,901 INFO org.apache.hadoop.ipc.Server: IPC Server 
handler 3 on 45700: exiting
2012-05-24 11:20:46,901 INFO org.apache.hadoop.ipc.Server: IPC Server 
handler 1 on 45700: exiting
2012-05-24 11:20:46,902 INFO org.apache.hadoop.ipc.Server: IPC Server 
handler 2 on 45700: exiting
2012-05-24 11:20:46,902 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
Server listener on 45700
2012-05-24 11:20:46,901 INFO org.apache.hadoop.ipc.Server: IPC Server 
handler 0 on 45700: exiting
2012-05-24 11:20:46,904 INFO 
org.apache.hadoop.ipc.metrics.RpcInstrumentation: shut down
2012-05-24 11:20:46,904 INFO org.apache.hadoop.mapred.TaskTracker: 
Shutting down StatusHttpServer
2012-05-24 11:20:46,904 INFO org.apache.hadoop.ipc.Server: IPC Server 
handler 7 on 45700: exiting
2012-05-24 11:20:46,903 INFO org.apache.hadoop.ipc.Server: IPC Server 
handler 6 on 45700: exiting
2012-05-24 11:20:46,903 INFO org.apache.hadoop.ipc.Server: IPC Server 
handler 4 on 45700: exiting
2012-05-24 11:20:46,904 INFO org.apache.hadoop.ipc.Server: IPC Server 
handler 5 on 45700: exiting
2012-05-24 11:20:46,904 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
Server Responder
2012-05-24 11:20:46,909 INFO org.mortbay.log: Stopped 
SelectChannelConnector@0.0.0.0:50060



On 5/23/12 3:55 PM, James Warren wrote:
> Hi Pat -
>
> The setting for hadoop.tmp.dir is used both locally and on HDFS and
> therefore should be consistent across your cluster.
>
> http://stackoverflow.com/questions/2354525/what-should-be-hadoop-tmp-dir
>
> cheers,
> -James
>
> On Wed, May 23, 2012 at 3:44 PM, Pat Ferrel<pa...@occamsmachete.com>  wrote:
>
>> I have a two machine cluster and am adding a new machine. The new node has
>> a different location for hadoop.tmp.dir than the other two nodes and
>> refuses to start the datanode when started in the cluster. When I change
>> the location pointed to by hadoop.tmp.dir to be the same on all machines it
>> starts up fine on all machines.
>>
>> Shouldn't I be able to have the master and slave1 set as:
>> <property>
>> <name>hadoop.tmp.dir</name>
>> <value>/app/hadoop/tmp</value>
>> <description>A base for other temporary directories.</description>
>> </property>
>>
>> And slave2 set as:
>> <property>
>> <name>hadoop.tmp.dir</name>
>> <value>/media/d2/app/hadoop/**tmp</value>
>> <description>A base for other temporary directories.</description>
>> </property>
>>
>> ??? Slave2 runs standalone in single node mode just fine. Using 0.20.205.
>>

Re: 3 machine cluster trouble

Posted by James Warren <da...@gmail.com>.

Hi Pat -

The setting for hadoop.tmp.dir is used both locally and on HDFS and
therefore should be consistent across your cluster.

http://stackoverflow.com/questions/2354525/what-should-be-hadoop-tmp-dir

cheers,
-James

On Wed, May 23, 2012 at 3:44 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> I have a two machine cluster and am adding a new machine. The new node has
> a different location for hadoop.tmp.dir than the other two nodes and
> refuses to start the datanode when started in the cluster. When I change
> the location pointed to by hadoop.tmp.dir to be the same on all machines it
> starts up fine on all machines.
>
> Shouldn't I be able to have the master and slave1 set as:
> <property>
> <name>hadoop.tmp.dir</name>
> <value>/app/hadoop/tmp</value>
> <description>A base for other temporary directories.</description>
> </property>
>
> And slave2 set as:
> <property>
> <name>hadoop.tmp.dir</name>
> <value>/media/d2/app/hadoop/**tmp</value>
> <description>A base for other temporary directories.</description>
> </property>
>
> ??? Slave2 runs standalone in single node mode just fine. Using 0.20.205.
>