You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by C G <pa...@yahoo.com> on 2007/12/17 20:53:22 UTC

Problem bringing up TaskTracker on slave nodes...

Hi All:
   
  I am migrating from a small grid to a larger one.  The small grid runs fine with no issues.  On the larger grid, with nearly identical configuration files (just changing host names and file paths), I can get dfs to run, but not all the TaskTrackers.  Specifically, the task trackers on the slave nodes fail to initialize, failing on a bind error to the master node using port 0.
   
  The task tracker logs are below for the master (which starts up successfully) and from one of the slaves (which fails to start).  Any thoughts/comments would be most appreciated.
   
  Note that if I log in to the worker node, and using Python do a  socket.connect() to the master on the master's port (43913 for this run) I can connect successfully.  
   
  How do the slave nodes know what port to use when connecting? 
   
  Any help appreciate as I am tearing out what little is left of my hair :-).
   
  Thanks, 
  C G
   

       
---------------------------------
Looking for last minute shopping deals?  Find them fast with Yahoo! Search.

Re: Problem bringing up TaskTracker on slave nodes...

Posted by C G <pa...@yahoo.com>.
Just to close the loop on this, and to make sure someone else doesn't have the same problem, this turned out to be a case of cockpit error.
   
  I had mis-read the documentation concerning mapred.task.tracker.report.bindAddress and had set it to point to the master node.  I should have left this set to 127.0.0.1 and instead tweaked mapred.job.tracker and pointed that to the master node.  When I realized the error of my ways I changed it and everything came up on the first try.  
   
  I could nit about the documentation needing to be a bit more clear, but I'm writing this off as mostly a case of error due to sleep deprivation.
   
  Thanks,
  C G
  

Arun C Murthy <ar...@yahoo-inc.com> wrote:
  > 2007-12-17 14:03:59,756 ERROR org.apache.hadoop.mapred.TaskTracker: 
Can not start task tracker because java.net.BindException: Problem 
binding to master/10.2.13.1:0
> at org.apache.hadoop.ipc.Server.bind(Server.java:163)
> at org.apache.hadoop.ipc.Server$Listener.(Server.java:215)
> at org.apache.hadoop.ipc.Server.(Server.java:657)
> at org.apache.hadoop.ipc.RPC$Server.(RPC.java:363)
> at org.apache.hadoop.ipc.RPC.getServer(RPC.java:333)
> at 
org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:409)
> at 
org.apache.hadoop.mapred.TaskTracker.(TaskTracker.java:717)
> at 
org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:1880)

That tells me you need to check who is running on that port...

Arun

C G wrote:
> Sorry, forgot to paste these in.....
> 
> 
> 2007-12-17 14:03:59,371 INFO org.apache.hadoop.mapred.TaskTracker: STARTUP_MSG:
> /************************************************************
> STARTUP_MSG: Starting TaskTracker
> STARTUP_MSG: host = master/10.2.13.1
> STARTUP_MSG: args = []
> ************************************************************/
> 2007-12-17 14:03:59,481 INFO org.mortbay.util.Credential: Checking Resource aliases
> 2007-12-17 14:03:59,531 INFO org.mortbay.http.HttpServer: Version Jetty/5.1.4
> 2007-12-17 14:03:59,531 INFO org.mortbay.util.Container: Started HttpContext[/static,/static]
> 2007-12-17 14:03:59,531 INFO org.mortbay.util.Container: Started HttpContext[/logs,/logs]
> 2007-12-17 14:03:59,712 INFO org.mortbay.util.Container: Started org.mortbay.jetty.servlet.WebApplicationHandler@1f82982
> 2007-12-17 14:03:59,739 INFO org.mortbay.util.Container: Started WebApplicationContext[/,/]
> 2007-12-17 14:03:59,742 INFO org.mortbay.http.SocketListener: Started SocketListener on 0.0.0.0:50060
> 2007-12-17 14:03:59,742 INFO org.mortbay.util.Container: Started org.mortbay.jetty.Server@aa37a6
> 2007-12-17 14:03:59,747 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=TaskTracker, sessionId=
> 2007-12-17 14:03:59,756 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 43913: starting
> 2007-12-17 14:03:59,756 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 43913: starting
> 2007-12-17 14:03:59,757 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 43913: starting
> 2007-12-17 14:03:59,757 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 43913: starting
> 2007-12-17 14:03:59,757 INFO org.apache.hadoop.mapred.TaskTracker: TaskTracker up at: /10.2.13.1:43913
> 2007-12-17 14:03:59,757 INFO org.apache.hadoop.mapred.TaskTracker: Starting tracker master:/10.2.13.1:43913
> 2007-12-17 14:03:59,759 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 43913: starting
> 2007-12-17 14:03:59,784 INFO org.apache.hadoop.mapred.TaskTracker: Starting thread: Map-events fetcher for all reduce tasks on master:/10.2.13.1:43913
> 
> 2007-12-17 14:03:59,370 INFO org.apache.hadoop.mapred.TaskTracker: STARTUP_MSG:
> /************************************************************
> STARTUP_MSG: Starting TaskTracker
> STARTUP_MSG: host = worker1/10.2.14.1
> STARTUP_MSG: args = []
> ************************************************************/
> 2007-12-17 14:03:59,481 INFO org.mortbay.util.Credential: Checking Resource aliases
> 2007-12-17 14:03:59,529 INFO org.mortbay.http.HttpServer: Version Jetty/5.1.4
> 2007-12-17 14:03:59,530 INFO org.mortbay.util.Container: Started HttpContext[/static,/static]
> 2007-12-17 14:03:59,530 INFO org.mortbay.util.Container: Started HttpContext[/logs,/logs]
> 2007-12-17 14:03:59,714 INFO org.mortbay.util.Container: Started org.mortbay.jetty.servlet.WebApplicationHandler@1f82982
> 2007-12-17 14:03:59,741 INFO org.mortbay.util.Container: Started WebApplicationContext[/,/]
> 2007-12-17 14:03:59,744 INFO org.mortbay.http.SocketListener: Started SocketListener on 0.0.0.0:50060
> 2007-12-17 14:03:59,744 INFO org.mortbay.util.Container: Started org.mortbay.jetty.Server@aa37a6
> 2007-12-17 14:03:59,749 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=TaskTracker, sessionId=
> 2007-12-17 14:03:59,756 INFO org.apache.hadoop.mapred.TaskTracker: SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down TaskTracker at worker1/10.2.14.1
> ************************************************************/
> 
> 
> C G 
wrote:
> Hi All:
> 
> I am migrating from a small grid to a larger one. The small grid runs fine with no issues. On the larger grid, with nearly identical configuration files (just changing host names and file paths), I can get dfs to run, but not all the TaskTrackers. Specifically, the task trackers on the slave nodes fail to initialize, failing on a bind error to the master node using port 0.
> 
> The task tracker logs are below for the master (which starts up successfully) and from one of the slaves (which fails to start). Any thoughts/comments would be most appreciated.
> 
> Note that if I log in to the worker node, and using Python do a socket.connect() to the master on the master's port (43913 for this run) I can connect successfully. 
> 
> How do the slave nodes know what port to use when connecting? 
> 
> Any help appreciate as I am tearing out what little is left of my hair :-).
> 
> Thanks, 
> C G
> 
> 
> 
> ---------------------------------
> Looking for last minute shopping deals? Find them fast with Yahoo! Search.
> 
> 
> ---------------------------------
> Looking for last minute shopping deals? Find them fast with Yahoo! Search.



       
---------------------------------
Looking for last minute shopping deals?  Find them fast with Yahoo! Search.

Re: Problem bringing up TaskTracker on slave nodes...

Posted by Arun C Murthy <ar...@yahoo-inc.com>.
 > 2007-12-17 14:03:59,756 ERROR org.apache.hadoop.mapred.TaskTracker: 
Can not start task tracker because java.net.BindException: Problem 
binding to master/10.2.13.1:0
 >         at org.apache.hadoop.ipc.Server.bind(Server.java:163)
 >         at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:215)
 >         at org.apache.hadoop.ipc.Server.<init>(Server.java:657)
 >         at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:363)
 >         at org.apache.hadoop.ipc.RPC.getServer(RPC.java:333)
 >         at 
org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:409)
 >         at 
org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:717)
 >         at 
org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:1880)

That tells me you need to check who is running on that port...

Arun

C G wrote:
> Sorry, forgot to paste these in.....
>    
>    
>   2007-12-17 14:03:59,371 INFO org.apache.hadoop.mapred.TaskTracker: STARTUP_MSG:
> /************************************************************
> STARTUP_MSG: Starting TaskTracker
> STARTUP_MSG:   host = master/10.2.13.1
> STARTUP_MSG:   args = []
> ************************************************************/
> 2007-12-17 14:03:59,481 INFO org.mortbay.util.Credential: Checking Resource aliases
> 2007-12-17 14:03:59,531 INFO org.mortbay.http.HttpServer: Version Jetty/5.1.4
> 2007-12-17 14:03:59,531 INFO org.mortbay.util.Container: Started HttpContext[/static,/static]
> 2007-12-17 14:03:59,531 INFO org.mortbay.util.Container: Started HttpContext[/logs,/logs]
> 2007-12-17 14:03:59,712 INFO org.mortbay.util.Container: Started org.mortbay.jetty.servlet.WebApplicationHandler@1f82982
> 2007-12-17 14:03:59,739 INFO org.mortbay.util.Container: Started WebApplicationContext[/,/]
> 2007-12-17 14:03:59,742 INFO org.mortbay.http.SocketListener: Started SocketListener on 0.0.0.0:50060
> 2007-12-17 14:03:59,742 INFO org.mortbay.util.Container: Started org.mortbay.jetty.Server@aa37a6
> 2007-12-17 14:03:59,747 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=TaskTracker, sessionId=
> 2007-12-17 14:03:59,756 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 43913: starting
> 2007-12-17 14:03:59,756 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 43913: starting
> 2007-12-17 14:03:59,757 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 43913: starting
> 2007-12-17 14:03:59,757 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 43913: starting
> 2007-12-17 14:03:59,757 INFO org.apache.hadoop.mapred.TaskTracker: TaskTracker up at: /10.2.13.1:43913
> 2007-12-17 14:03:59,757 INFO org.apache.hadoop.mapred.TaskTracker: Starting tracker master:/10.2.13.1:43913
> 2007-12-17 14:03:59,759 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 43913: starting
> 2007-12-17 14:03:59,784 INFO org.apache.hadoop.mapred.TaskTracker: Starting thread: Map-events fetcher for all reduce tasks on master:/10.2.13.1:43913
>   
> 2007-12-17 14:03:59,370 INFO org.apache.hadoop.mapred.TaskTracker: STARTUP_MSG:
> /************************************************************
> STARTUP_MSG: Starting TaskTracker
> STARTUP_MSG:   host = worker1/10.2.14.1
> STARTUP_MSG:   args = []
> ************************************************************/
> 2007-12-17 14:03:59,481 INFO org.mortbay.util.Credential: Checking Resource aliases
> 2007-12-17 14:03:59,529 INFO org.mortbay.http.HttpServer: Version Jetty/5.1.4
> 2007-12-17 14:03:59,530 INFO org.mortbay.util.Container: Started HttpContext[/static,/static]
> 2007-12-17 14:03:59,530 INFO org.mortbay.util.Container: Started HttpContext[/logs,/logs]
> 2007-12-17 14:03:59,714 INFO org.mortbay.util.Container: Started org.mortbay.jetty.servlet.WebApplicationHandler@1f82982
> 2007-12-17 14:03:59,741 INFO org.mortbay.util.Container: Started WebApplicationContext[/,/]
> 2007-12-17 14:03:59,744 INFO org.mortbay.http.SocketListener: Started SocketListener on 0.0.0.0:50060
> 2007-12-17 14:03:59,744 INFO org.mortbay.util.Container: Started org.mortbay.jetty.Server@aa37a6
> 2007-12-17 14:03:59,749 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=TaskTracker, sessionId=
>   2007-12-17 14:03:59,756 INFO org.apache.hadoop.mapred.TaskTracker: SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down TaskTracker at worker1/10.2.14.1
> ************************************************************/
> 
> 
> C G <pa...@yahoo.com> wrote:
>   Hi All:
> 
> I am migrating from a small grid to a larger one. The small grid runs fine with no issues. On the larger grid, with nearly identical configuration files (just changing host names and file paths), I can get dfs to run, but not all the TaskTrackers. Specifically, the task trackers on the slave nodes fail to initialize, failing on a bind error to the master node using port 0.
> 
> The task tracker logs are below for the master (which starts up successfully) and from one of the slaves (which fails to start). Any thoughts/comments would be most appreciated.
> 
> Note that if I log in to the worker node, and using Python do a socket.connect() to the master on the master's port (43913 for this run) I can connect successfully. 
> 
> How do the slave nodes know what port to use when connecting? 
> 
> Any help appreciate as I am tearing out what little is left of my hair :-).
> 
> Thanks, 
> C G
> 
> 
> 
> ---------------------------------
> Looking for last minute shopping deals? Find them fast with Yahoo! Search.
> 
>        
> ---------------------------------
> Looking for last minute shopping deals?  Find them fast with Yahoo! Search.


Re: Problem bringing up TaskTracker on slave nodes...

Posted by C G <pa...@yahoo.com>.
Sorry, forgot to paste these in.....
   
   
  2007-12-17 14:03:59,371 INFO org.apache.hadoop.mapred.TaskTracker: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting TaskTracker
STARTUP_MSG:   host = master/10.2.13.1
STARTUP_MSG:   args = []
************************************************************/
2007-12-17 14:03:59,481 INFO org.mortbay.util.Credential: Checking Resource aliases
2007-12-17 14:03:59,531 INFO org.mortbay.http.HttpServer: Version Jetty/5.1.4
2007-12-17 14:03:59,531 INFO org.mortbay.util.Container: Started HttpContext[/static,/static]
2007-12-17 14:03:59,531 INFO org.mortbay.util.Container: Started HttpContext[/logs,/logs]
2007-12-17 14:03:59,712 INFO org.mortbay.util.Container: Started org.mortbay.jetty.servlet.WebApplicationHandler@1f82982
2007-12-17 14:03:59,739 INFO org.mortbay.util.Container: Started WebApplicationContext[/,/]
2007-12-17 14:03:59,742 INFO org.mortbay.http.SocketListener: Started SocketListener on 0.0.0.0:50060
2007-12-17 14:03:59,742 INFO org.mortbay.util.Container: Started org.mortbay.jetty.Server@aa37a6
2007-12-17 14:03:59,747 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=TaskTracker, sessionId=
2007-12-17 14:03:59,756 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 43913: starting
2007-12-17 14:03:59,756 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 43913: starting
2007-12-17 14:03:59,757 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 43913: starting
2007-12-17 14:03:59,757 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 43913: starting
2007-12-17 14:03:59,757 INFO org.apache.hadoop.mapred.TaskTracker: TaskTracker up at: /10.2.13.1:43913
2007-12-17 14:03:59,757 INFO org.apache.hadoop.mapred.TaskTracker: Starting tracker master:/10.2.13.1:43913
2007-12-17 14:03:59,759 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 43913: starting
2007-12-17 14:03:59,784 INFO org.apache.hadoop.mapred.TaskTracker: Starting thread: Map-events fetcher for all reduce tasks on master:/10.2.13.1:43913
  
2007-12-17 14:03:59,370 INFO org.apache.hadoop.mapred.TaskTracker: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting TaskTracker
STARTUP_MSG:   host = worker1/10.2.14.1
STARTUP_MSG:   args = []
************************************************************/
2007-12-17 14:03:59,481 INFO org.mortbay.util.Credential: Checking Resource aliases
2007-12-17 14:03:59,529 INFO org.mortbay.http.HttpServer: Version Jetty/5.1.4
2007-12-17 14:03:59,530 INFO org.mortbay.util.Container: Started HttpContext[/static,/static]
2007-12-17 14:03:59,530 INFO org.mortbay.util.Container: Started HttpContext[/logs,/logs]
2007-12-17 14:03:59,714 INFO org.mortbay.util.Container: Started org.mortbay.jetty.servlet.WebApplicationHandler@1f82982
2007-12-17 14:03:59,741 INFO org.mortbay.util.Container: Started WebApplicationContext[/,/]
2007-12-17 14:03:59,744 INFO org.mortbay.http.SocketListener: Started SocketListener on 0.0.0.0:50060
2007-12-17 14:03:59,744 INFO org.mortbay.util.Container: Started org.mortbay.jetty.Server@aa37a6
2007-12-17 14:03:59,749 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=TaskTracker, sessionId=
2007-12-17 14:03:59,756 ERROR org.apache.hadoop.mapred.TaskTracker: Can not start task tracker because java.net.BindException: Problem binding to master/10.2.13.1:0
        at org.apache.hadoop.ipc.Server.bind(Server.java:163)
        at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:215)
        at org.apache.hadoop.ipc.Server.<init>(Server.java:657)
        at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:363)
        at org.apache.hadoop.ipc.RPC.getServer(RPC.java:333)
        at org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:409)
        at org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:717)
        at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:1880)
  2007-12-17 14:03:59,756 INFO org.apache.hadoop.mapred.TaskTracker: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down TaskTracker at worker1/10.2.14.1
************************************************************/


C G <pa...@yahoo.com> wrote:
  Hi All:

I am migrating from a small grid to a larger one. The small grid runs fine with no issues. On the larger grid, with nearly identical configuration files (just changing host names and file paths), I can get dfs to run, but not all the TaskTrackers. Specifically, the task trackers on the slave nodes fail to initialize, failing on a bind error to the master node using port 0.

The task tracker logs are below for the master (which starts up successfully) and from one of the slaves (which fails to start). Any thoughts/comments would be most appreciated.

Note that if I log in to the worker node, and using Python do a socket.connect() to the master on the master's port (43913 for this run) I can connect successfully. 

How do the slave nodes know what port to use when connecting? 

Any help appreciate as I am tearing out what little is left of my hair :-).

Thanks, 
C G



---------------------------------
Looking for last minute shopping deals? Find them fast with Yahoo! Search.

       
---------------------------------
Looking for last minute shopping deals?  Find them fast with Yahoo! Search.