You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by "Jasmine (Xuanjing) Huang" <xj...@cs.umass.edu> on 2009/04/26 15:31:11 UTC

Re: Can't start fully-distributed operation of Hadoop in Sun Grid Engine

Hi, Jason,

Thanks for your advice, after insert port into the file of 
"hadoop-site.xml", I can start namenode and run job now.
But my system works only when I  set localhost to masters and add localhost 
(as well as some other nodes) to slavers file. And all the tasks are 
Data-local map tasks. I wonder if whether I enter fully distributed mode, or 
still in pseudo mode.

As for the SGE, I am only a user and know little about it. This is the user 
manual of our cluster: 
http://www.cs.umass.edu/~swarm/index.php?n=Main.UserDoc

Best,
Jasmine

----- Original Message ----- 
From: "jason hadoop" <ja...@gmail.com>
To: <co...@hadoop.apache.org>
Sent: Sunday, April 26, 2009 12:06 AM
Subject: Re: Can't start fully-distributed operation of Hadoop in Sun Grid 
Engine


> the parameter you specify for fs.default name should be of the form
> hdfs://host:port and the parameter you specify for the mapred.job.tracker
> MUST be host:port. I haven't looked at 18.3,  but it appears that the 
> :port
> is mandatory.
>
> In your case, the piece of code parsing the fs.default.name variable is 
> not
> able to tokenize it into protocol host and port correctly
>
> recap:
> fs.default.name hdfs://namenodeHost:port
> mapred.job.tracker jobtrackerHost:port
> sepecify all the parts above and try again.
>
> Can you please point me at information on using the sun grid, I want to
> include a paragraph or two about it in my book.
>
> On Sat, Apr 25, 2009 at 4:28 PM, Jasmine (Xuanjing) Huang <
> xjhuang@cs.umass.edu> wrote:
>
>> Hi, there,
>>
>> My hadoop system (version: 0.18.3) works well under standalone and
>> pseudo-distributed operation. But if I try to run hadoop in
>> fully-distributed mode in Sun Grid Engine, Hadoop always failed -- in 
>> fact,
>> the jobTracker and TaskzTracker can be started, but the namenode and
>> secondary namenode cannot be started. Could anyone help me with it?
>>
>> My SGE scripts looks like:
>>
>> #!/bin/bash
>> #$ -cwd
>> #$ -S /bin/bash
>> #$ -l long=TRUE
>> #$ -v JAVA_HOME=/usr/java/latest
>> #$ -v HADOOP_HOME=*********
>> #$ -pe hadoop 6
>> PATH="$HADOOP_HOME/bin:$PATH"
>> hadoop fs -put ********
>> hadoop jar *****
>> hadoop fs -get *********
>>
>> Then the output looks like:
>> Exception in thread "main" java.lang.NumberFormatException: For input
>> string: ""
>>       at
>> java.lang.NumberFormatException.forInputString(NumberFormatException.
>> java:48)
>>       at java.lang.Integer.parseInt(Integer.java:468)
>>       at java.lang.Integer.parseInt(Integer.java:497)
>>       at 
>> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:144)
>>       at org.apache.hadoop.dfs.NameNode.getAddress(NameNode.java:116)
>>       at
>> org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFil
>> eSystem.java:66)
>>       at
>> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1339
>> )
>>       at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:56)
>>       at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1351)
>>       at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:213)
>>       at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:118)
>>       at org.apache.hadoop.fs.FsShell.init(FsShell.java:88)
>>       at org.apache.hadoop.fs.FsShell.run(FsShell.java:1703)
>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>       at org.apache.hadoop.fs.FsShell.main(FsShell.java:1852)
>>
>> And the log of NameNode looks like
>> 2009-04-25 17:27:17,032 INFO org.apache.hadoop.dfs.NameNode: STARTUP_MSG:
>> /************************************************************
>> STARTUP_MSG: Starting NameNode
>> STARTUP_MSG:   host = ************
>> STARTUP_MSG:   args = []
>> STARTUP_MSG:   version = 0.18.3
>> ************************************************************/
>> 2009-04-25 17:27:17,147 ERROR org.apache.hadoop.dfs.NameNode:
>> java.lang.NumberFormatException: For i
>> nput string: ""
>>       at
>> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
>>       at java.lang.Integer.parseInt(Integer.java:468)
>>       at java.lang.Integer.parseInt(Integer.java:497)
>>       at 
>> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:144)
>>       at org.apache.hadoop.dfs.NameNode.getAddress(NameNode.java:116)
>>       at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:136)
>>       at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)
>>       at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)
>>       at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:830)
>>       at org.apache.hadoop.dfs.NameNode.main(NameNode.java:839)
>>
>> 2009-04-25 17:27:17,149 INFO org.apache.hadoop.dfs.NameNode: 
>> SHUTDOWN_MSG:
>> /************************************************************
>> SHUTDOWN_MSG: Shutting down NameNode at ***************
>>
>> Best,
>> Jasmine
>>
>>
>
>
> -- 
> Alpha Chapters of my book on Hadoop are available
> http://www.apress.com/book/view/9781430219422
> 


Re: Can't start fully-distributed operation of Hadoop in Sun Grid Engine

Posted by "Jasmine (Xuanjing) Huang" <xj...@cs.umass.edu>.
I have contacted with the administor of our cluster and he gave me the 
access. Now my program can work under full distributed mode.

Thanks a lot.

Jasmine
----- Original Message ----- 
From: "jason hadoop" <ja...@gmail.com>
To: <co...@hadoop.apache.org>
Sent: Sunday, April 26, 2009 12:13 PM
Subject: Re: Can't start fully-distributed operation of Hadoop in Sun Grid 
Engine


> It may be that the sun grid is similar to the EC2 and the machines have an
> internal IPaddress/name that MUST be used for inter machine communication
> and an external IPaddress/name that is only for internet access.
>
> The above overly complex sentence basically states there may be some
> firewall rules/tools in the sun grid that you need to be aware of and use.
>
> On Sun, Apr 26, 2009 at 6:31 AM, Jasmine (Xuanjing) Huang <
> xjhuang@cs.umass.edu> wrote:
>
>> Hi, Jason,
>>
>> Thanks for your advice, after insert port into the file of
>> "hadoop-site.xml", I can start namenode and run job now.
>> But my system works only when I  set localhost to masters and add 
>> localhost
>> (as well as some other nodes) to slavers file. And all the tasks are
>> Data-local map tasks. I wonder if whether I enter fully distributed mode, 
>> or
>> still in pseudo mode.
>>
>> As for the SGE, I am only a user and know little about it. This is the 
>> user
>> manual of our cluster:
>> http://www.cs.umass.edu/~swarm/index.php?n=Main.UserDoc<http://www.cs.umass.edu/%7Eswarm/index.php?n=Main.UserDoc>
>>
>> Best,
>> Jasmine
>>
>> ----- Original Message ----- From: "jason hadoop" 
>> <ja...@gmail.com>
>> To: <co...@hadoop.apache.org>
>> Sent: Sunday, April 26, 2009 12:06 AM
>> Subject: Re: Can't start fully-distributed operation of Hadoop in Sun 
>> Grid
>> Engine
>>
>>
>>
>>  the parameter you specify for fs.default name should be of the form
>>> hdfs://host:port and the parameter you specify for the 
>>> mapred.job.tracker
>>> MUST be host:port. I haven't looked at 18.3,  but it appears that the
>>> :port
>>> is mandatory.
>>>
>>> In your case, the piece of code parsing the fs.default.name variable is
>>> not
>>> able to tokenize it into protocol host and port correctly
>>>
>>> recap:
>>> fs.default.name hdfs://namenodeHost:port
>>> mapred.job.tracker jobtrackerHost:port
>>> sepecify all the parts above and try again.
>>>
>>> Can you please point me at information on using the sun grid, I want to
>>> include a paragraph or two about it in my book.
>>>
>>> On Sat, Apr 25, 2009 at 4:28 PM, Jasmine (Xuanjing) Huang <
>>> xjhuang@cs.umass.edu> wrote:
>>>
>>>  Hi, there,
>>>>
>>>> My hadoop system (version: 0.18.3) works well under standalone and
>>>> pseudo-distributed operation. But if I try to run hadoop in
>>>> fully-distributed mode in Sun Grid Engine, Hadoop always failed -- in
>>>> fact,
>>>> the jobTracker and TaskzTracker can be started, but the namenode and
>>>> secondary namenode cannot be started. Could anyone help me with it?
>>>>
>>>> My SGE scripts looks like:
>>>>
>>>> #!/bin/bash
>>>> #$ -cwd
>>>> #$ -S /bin/bash
>>>> #$ -l long=TRUE
>>>> #$ -v JAVA_HOME=/usr/java/latest
>>>> #$ -v HADOOP_HOME=*********
>>>> #$ -pe hadoop 6
>>>> PATH="$HADOOP_HOME/bin:$PATH"
>>>> hadoop fs -put ********
>>>> hadoop jar *****
>>>> hadoop fs -get *********
>>>>
>>>> Then the output looks like:
>>>> Exception in thread "main" java.lang.NumberFormatException: For input
>>>> string: ""
>>>>      at
>>>> java.lang.NumberFormatException.forInputString(NumberFormatException.
>>>> java:48)
>>>>      at java.lang.Integer.parseInt(Integer.java:468)
>>>>      at java.lang.Integer.parseInt(Integer.java:497)
>>>>      at
>>>> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:144)
>>>>      at org.apache.hadoop.dfs.NameNode.getAddress(NameNode.java:116)
>>>>      at
>>>> org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFil
>>>> eSystem.java:66)
>>>>      at
>>>> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1339
>>>> )
>>>>      at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:56)
>>>>      at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1351)
>>>>      at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:213)
>>>>      at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:118)
>>>>      at org.apache.hadoop.fs.FsShell.init(FsShell.java:88)
>>>>      at org.apache.hadoop.fs.FsShell.run(FsShell.java:1703)
>>>>      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>>      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>>>      at org.apache.hadoop.fs.FsShell.main(FsShell.java:1852)
>>>>
>>>> And the log of NameNode looks like
>>>> 2009-04-25 17:27:17,032 INFO org.apache.hadoop.dfs.NameNode: 
>>>> STARTUP_MSG:
>>>> /************************************************************
>>>> STARTUP_MSG: Starting NameNode
>>>> STARTUP_MSG:   host = ************
>>>> STARTUP_MSG:   args = []
>>>> STARTUP_MSG:   version = 0.18.3
>>>> ************************************************************/
>>>> 2009-04-25 17:27:17,147 ERROR org.apache.hadoop.dfs.NameNode:
>>>> java.lang.NumberFormatException: For i
>>>> nput string: ""
>>>>      at
>>>>
>>>> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
>>>>      at java.lang.Integer.parseInt(Integer.java:468)
>>>>      at java.lang.Integer.parseInt(Integer.java:497)
>>>>      at
>>>> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:144)
>>>>      at org.apache.hadoop.dfs.NameNode.getAddress(NameNode.java:116)
>>>>      at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:136)
>>>>      at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)
>>>>      at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)
>>>>      at 
>>>> org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:830)
>>>>      at org.apache.hadoop.dfs.NameNode.main(NameNode.java:839)
>>>>
>>>> 2009-04-25 17:27:17,149 INFO org.apache.hadoop.dfs.NameNode:
>>>> SHUTDOWN_MSG:
>>>> /************************************************************
>>>> SHUTDOWN_MSG: Shutting down NameNode at ***************
>>>>
>>>> Best,
>>>> Jasmine
>>>>
>>>>
>>>>
>>>
>>> --
>>> Alpha Chapters of my book on Hadoop are available
>>> http://www.apress.com/book/view/9781430219422
>>>
>>>
>>
>
>
> -- 
> Alpha Chapters of my book on Hadoop are available
> http://www.apress.com/book/view/9781430219422
> 


Re: Can't start fully-distributed operation of Hadoop in Sun Grid Engine

Posted by jason hadoop <ja...@gmail.com>.
It may be that the sun grid is similar to the EC2 and the machines have an
internal IPaddress/name that MUST be used for inter machine communication
and an external IPaddress/name that is only for internet access.

The above overly complex sentence basically states there may be some
firewall rules/tools in the sun grid that you need to be aware of and use.

On Sun, Apr 26, 2009 at 6:31 AM, Jasmine (Xuanjing) Huang <
xjhuang@cs.umass.edu> wrote:

> Hi, Jason,
>
> Thanks for your advice, after insert port into the file of
> "hadoop-site.xml", I can start namenode and run job now.
> But my system works only when I  set localhost to masters and add localhost
> (as well as some other nodes) to slavers file. And all the tasks are
> Data-local map tasks. I wonder if whether I enter fully distributed mode, or
> still in pseudo mode.
>
> As for the SGE, I am only a user and know little about it. This is the user
> manual of our cluster:
> http://www.cs.umass.edu/~swarm/index.php?n=Main.UserDoc<http://www.cs.umass.edu/%7Eswarm/index.php?n=Main.UserDoc>
>
> Best,
> Jasmine
>
> ----- Original Message ----- From: "jason hadoop" <ja...@gmail.com>
> To: <co...@hadoop.apache.org>
> Sent: Sunday, April 26, 2009 12:06 AM
> Subject: Re: Can't start fully-distributed operation of Hadoop in Sun Grid
> Engine
>
>
>
>  the parameter you specify for fs.default name should be of the form
>> hdfs://host:port and the parameter you specify for the mapred.job.tracker
>> MUST be host:port. I haven't looked at 18.3,  but it appears that the
>> :port
>> is mandatory.
>>
>> In your case, the piece of code parsing the fs.default.name variable is
>> not
>> able to tokenize it into protocol host and port correctly
>>
>> recap:
>> fs.default.name hdfs://namenodeHost:port
>> mapred.job.tracker jobtrackerHost:port
>> sepecify all the parts above and try again.
>>
>> Can you please point me at information on using the sun grid, I want to
>> include a paragraph or two about it in my book.
>>
>> On Sat, Apr 25, 2009 at 4:28 PM, Jasmine (Xuanjing) Huang <
>> xjhuang@cs.umass.edu> wrote:
>>
>>  Hi, there,
>>>
>>> My hadoop system (version: 0.18.3) works well under standalone and
>>> pseudo-distributed operation. But if I try to run hadoop in
>>> fully-distributed mode in Sun Grid Engine, Hadoop always failed -- in
>>> fact,
>>> the jobTracker and TaskzTracker can be started, but the namenode and
>>> secondary namenode cannot be started. Could anyone help me with it?
>>>
>>> My SGE scripts looks like:
>>>
>>> #!/bin/bash
>>> #$ -cwd
>>> #$ -S /bin/bash
>>> #$ -l long=TRUE
>>> #$ -v JAVA_HOME=/usr/java/latest
>>> #$ -v HADOOP_HOME=*********
>>> #$ -pe hadoop 6
>>> PATH="$HADOOP_HOME/bin:$PATH"
>>> hadoop fs -put ********
>>> hadoop jar *****
>>> hadoop fs -get *********
>>>
>>> Then the output looks like:
>>> Exception in thread "main" java.lang.NumberFormatException: For input
>>> string: ""
>>>      at
>>> java.lang.NumberFormatException.forInputString(NumberFormatException.
>>> java:48)
>>>      at java.lang.Integer.parseInt(Integer.java:468)
>>>      at java.lang.Integer.parseInt(Integer.java:497)
>>>      at
>>> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:144)
>>>      at org.apache.hadoop.dfs.NameNode.getAddress(NameNode.java:116)
>>>      at
>>> org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFil
>>> eSystem.java:66)
>>>      at
>>> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1339
>>> )
>>>      at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:56)
>>>      at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1351)
>>>      at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:213)
>>>      at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:118)
>>>      at org.apache.hadoop.fs.FsShell.init(FsShell.java:88)
>>>      at org.apache.hadoop.fs.FsShell.run(FsShell.java:1703)
>>>      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>>      at org.apache.hadoop.fs.FsShell.main(FsShell.java:1852)
>>>
>>> And the log of NameNode looks like
>>> 2009-04-25 17:27:17,032 INFO org.apache.hadoop.dfs.NameNode: STARTUP_MSG:
>>> /************************************************************
>>> STARTUP_MSG: Starting NameNode
>>> STARTUP_MSG:   host = ************
>>> STARTUP_MSG:   args = []
>>> STARTUP_MSG:   version = 0.18.3
>>> ************************************************************/
>>> 2009-04-25 17:27:17,147 ERROR org.apache.hadoop.dfs.NameNode:
>>> java.lang.NumberFormatException: For i
>>> nput string: ""
>>>      at
>>>
>>> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
>>>      at java.lang.Integer.parseInt(Integer.java:468)
>>>      at java.lang.Integer.parseInt(Integer.java:497)
>>>      at
>>> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:144)
>>>      at org.apache.hadoop.dfs.NameNode.getAddress(NameNode.java:116)
>>>      at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:136)
>>>      at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)
>>>      at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)
>>>      at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:830)
>>>      at org.apache.hadoop.dfs.NameNode.main(NameNode.java:839)
>>>
>>> 2009-04-25 17:27:17,149 INFO org.apache.hadoop.dfs.NameNode:
>>> SHUTDOWN_MSG:
>>> /************************************************************
>>> SHUTDOWN_MSG: Shutting down NameNode at ***************
>>>
>>> Best,
>>> Jasmine
>>>
>>>
>>>
>>
>> --
>> Alpha Chapters of my book on Hadoop are available
>> http://www.apress.com/book/view/9781430219422
>>
>>
>


-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422