You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by amit handa <am...@gmail.com> on 2009/04/25 07:36:40 UTC

Processing High CPU & Memory intensive tasks on Hadoop - Architecture question

Hi,

We are planning to use hadoop for some very expensive and long running
processing tasks.
The computing nodes that we plan to use are very heavy in terms of CPU and
memory requirement e.g one process instance takes almost 100% CPU (1 core)
and around 300 -400 MB of RAM.
The first time the process loads it can take around 1-1:30 minutes but after
that we can provide the data to process and it takes few seconds to process.
Can I model it on hadoop ?
Can I have my processes pre-loaded on the task processing machines and the
data be provided by hadoop? This will save the 1-1:30 minutes of intial load
time that it would otherwise take for each task.
I want to run a number of these processes in parallel  based on the machines
capacity (e.g 6 instances on a 8 cpu box) or using capacity scheduler.

Please let me know if this is possible or any pointers to how it can be done
?

Thanks,
Amit

Re: Processing High CPU & Memory intensive tasks on Hadoop - Architecture question

Posted by Steve Loughran <st...@apache.org>.

Aaron Kimball wrote:
> I'm not aware of any documentation about this particular use case for
> Hadoop. I think your best bet is to look into the JNI documentation about
> loading native libraries, and go from there.
> - Aaron

You could also try

1. Starting the main processing app as a process on the machines -and 
leave it running-

2. have your mapper (somehow) talk to that running process, passing in 
parameters (including local filesystem filenames) to read and write.

You can use RMI or other IPC mechanisms to talk to the long-lived process.

Re: Processing High CPU & Memory intensive tasks on Hadoop - Architecture question

Posted by Aaron Kimball <aa...@cloudera.com>.

I'm not aware of any documentation about this particular use case for
Hadoop. I think your best bet is to look into the JNI documentation about
loading native libraries, and go from there.
- Aaron


On Sat, Apr 25, 2009 at 10:44 PM, amit handa <am...@gmail.com> wrote:

> Thanks Aaron,
>
> The processing libs that we use, which take time to load are all c++ based
> .so libs.
> Can i invoke it from JVM during the configure stage of the mapper and keep
> it running as you suggested ?
> Can you point me to some documentation regarding the same ?
>
> Regards,
> Amit
>
> On Sat, Apr 25, 2009 at 1:42 PM, Aaron Kimball <aa...@cloudera.com> wrote:
>
> > Amit,
> >
> > This can be made to work with Hadoop. Basically, in your mapper's
> > "configure" stage it would do the heavy load-in process, then it would
> > process your individual work items as records during the actual "map"
> > stage.
> > A map task can be comprised of many records, so you'll be fine here.
> >
> > If you use Hadoop 0.19 or 0.20, you can also enable JVM reuse, where
> > multiple map tasks are performed serially in the same JVM instance. In
> this
> > case, the first task in the JVM would do the heavy load-in process into
> > static fields or other globally-accessible items; subsequent tasks could
> > recognize that the system state is already initialized and would not need
> > to
> > repeat it.
> >
> > The number of mapper/reducer tasks that run in parallel on a given node
> can
> > be configured with a simple setting; setting this to 6 will work just
> fine.
> > The capacity / fairshare schedulers are not what you need here -- their
> > main
> > function is to ensure that multiple jobs (separate sets of tasks) can all
> > make progress simultaneously by sharing cluster resources across jobs
> > rather
> > than running jobs in a FIFO fashion.
> >
> > - Aaron
> >
> > On Sat, Apr 25, 2009 at 2:36 PM, amit handa <am...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > We are planning to use hadoop for some very expensive and long running
> > > processing tasks.
> > > The computing nodes that we plan to use are very heavy in terms of CPU
> > and
> > > memory requirement e.g one process instance takes almost 100% CPU (1
> > core)
> > > and around 300 -400 MB of RAM.
> > > The first time the process loads it can take around 1-1:30 minutes but
> > > after
> > > that we can provide the data to process and it takes few seconds to
> > > process.
> > > Can I model it on hadoop ?
> > > Can I have my processes pre-loaded on the task processing machines and
> > the
> > > data be provided by hadoop? This will save the 1-1:30 minutes of intial
> > > load
> > > time that it would otherwise take for each task.
> > > I want to run a number of these processes in parallel  based on the
> > > machines
> > > capacity (e.g 6 instances on a 8 cpu box) or using capacity scheduler.
> > >
> > > Please let me know if this is possible or any pointers to how it can be
> > > done
> > > ?
> > >
> > > Thanks,
> > > Amit
> > >
> >
>

Re: Can't start fully-distributed operation of Hadoop in Sun Grid Engine

Posted by "Jasmine (Xuanjing) Huang" <xj...@cs.umass.edu>.

I have contacted with the administor of our cluster and he gave me the 
access. Now my program can work under full distributed mode.

Thanks a lot.

Jasmine
----- Original Message ----- 
From: "jason hadoop" <ja...@gmail.com>
To: <co...@hadoop.apache.org>
Sent: Sunday, April 26, 2009 12:13 PM
Subject: Re: Can't start fully-distributed operation of Hadoop in Sun Grid 
Engine


> It may be that the sun grid is similar to the EC2 and the machines have an
> internal IPaddress/name that MUST be used for inter machine communication
> and an external IPaddress/name that is only for internet access.
>
> The above overly complex sentence basically states there may be some
> firewall rules/tools in the sun grid that you need to be aware of and use.
>
> On Sun, Apr 26, 2009 at 6:31 AM, Jasmine (Xuanjing) Huang <
> xjhuang@cs.umass.edu> wrote:
>
>> Hi, Jason,
>>
>> Thanks for your advice, after insert port into the file of
>> "hadoop-site.xml", I can start namenode and run job now.
>> But my system works only when I  set localhost to masters and add 
>> localhost
>> (as well as some other nodes) to slavers file. And all the tasks are
>> Data-local map tasks. I wonder if whether I enter fully distributed mode, 
>> or
>> still in pseudo mode.
>>
>> As for the SGE, I am only a user and know little about it. This is the 
>> user
>> manual of our cluster:
>> http://www.cs.umass.edu/~swarm/index.php?n=Main.UserDoc<http://www.cs.umass.edu/%7Eswarm/index.php?n=Main.UserDoc>
>>
>> Best,
>> Jasmine
>>
>> ----- Original Message ----- From: "jason hadoop" 
>> <ja...@gmail.com>
>> To: <co...@hadoop.apache.org>
>> Sent: Sunday, April 26, 2009 12:06 AM
>> Subject: Re: Can't start fully-distributed operation of Hadoop in Sun 
>> Grid
>> Engine
>>
>>
>>
>>  the parameter you specify for fs.default name should be of the form
>>> hdfs://host:port and the parameter you specify for the 
>>> mapred.job.tracker
>>> MUST be host:port. I haven't looked at 18.3,  but it appears that the
>>> :port
>>> is mandatory.
>>>
>>> In your case, the piece of code parsing the fs.default.name variable is
>>> not
>>> able to tokenize it into protocol host and port correctly
>>>
>>> recap:
>>> fs.default.name hdfs://namenodeHost:port
>>> mapred.job.tracker jobtrackerHost:port
>>> sepecify all the parts above and try again.
>>>
>>> Can you please point me at information on using the sun grid, I want to
>>> include a paragraph or two about it in my book.
>>>
>>> On Sat, Apr 25, 2009 at 4:28 PM, Jasmine (Xuanjing) Huang <
>>> xjhuang@cs.umass.edu> wrote:
>>>
>>>  Hi, there,
>>>>
>>>> My hadoop system (version: 0.18.3) works well under standalone and
>>>> pseudo-distributed operation. But if I try to run hadoop in
>>>> fully-distributed mode in Sun Grid Engine, Hadoop always failed -- in
>>>> fact,
>>>> the jobTracker and TaskzTracker can be started, but the namenode and
>>>> secondary namenode cannot be started. Could anyone help me with it?
>>>>
>>>> My SGE scripts looks like:
>>>>
>>>> #!/bin/bash
>>>> #$ -cwd
>>>> #$ -S /bin/bash
>>>> #$ -l long=TRUE
>>>> #$ -v JAVA_HOME=/usr/java/latest
>>>> #$ -v HADOOP_HOME=*********
>>>> #$ -pe hadoop 6
>>>> PATH="$HADOOP_HOME/bin:$PATH"
>>>> hadoop fs -put ********
>>>> hadoop jar *****
>>>> hadoop fs -get *********
>>>>
>>>> Then the output looks like:
>>>> Exception in thread "main" java.lang.NumberFormatException: For input
>>>> string: ""
>>>>      at
>>>> java.lang.NumberFormatException.forInputString(NumberFormatException.
>>>> java:48)
>>>>      at java.lang.Integer.parseInt(Integer.java:468)
>>>>      at java.lang.Integer.parseInt(Integer.java:497)
>>>>      at
>>>> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:144)
>>>>      at org.apache.hadoop.dfs.NameNode.getAddress(NameNode.java:116)
>>>>      at
>>>> org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFil
>>>> eSystem.java:66)
>>>>      at
>>>> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1339
>>>> )
>>>>      at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:56)
>>>>      at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1351)
>>>>      at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:213)
>>>>      at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:118)
>>>>      at org.apache.hadoop.fs.FsShell.init(FsShell.java:88)
>>>>      at org.apache.hadoop.fs.FsShell.run(FsShell.java:1703)
>>>>      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>>      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>>>      at org.apache.hadoop.fs.FsShell.main(FsShell.java:1852)
>>>>
>>>> And the log of NameNode looks like
>>>> 2009-04-25 17:27:17,032 INFO org.apache.hadoop.dfs.NameNode: 
>>>> STARTUP_MSG:
>>>> /************************************************************
>>>> STARTUP_MSG: Starting NameNode
>>>> STARTUP_MSG:   host = ************
>>>> STARTUP_MSG:   args = []
>>>> STARTUP_MSG:   version = 0.18.3
>>>> ************************************************************/
>>>> 2009-04-25 17:27:17,147 ERROR org.apache.hadoop.dfs.NameNode:
>>>> java.lang.NumberFormatException: For i
>>>> nput string: ""
>>>>      at
>>>>
>>>> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
>>>>      at java.lang.Integer.parseInt(Integer.java:468)
>>>>      at java.lang.Integer.parseInt(Integer.java:497)
>>>>      at
>>>> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:144)
>>>>      at org.apache.hadoop.dfs.NameNode.getAddress(NameNode.java:116)
>>>>      at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:136)
>>>>      at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)
>>>>      at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)
>>>>      at 
>>>> org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:830)
>>>>      at org.apache.hadoop.dfs.NameNode.main(NameNode.java:839)
>>>>
>>>> 2009-04-25 17:27:17,149 INFO org.apache.hadoop.dfs.NameNode:
>>>> SHUTDOWN_MSG:
>>>> /************************************************************
>>>> SHUTDOWN_MSG: Shutting down NameNode at ***************
>>>>
>>>> Best,
>>>> Jasmine
>>>>
>>>>
>>>>
>>>
>>> --
>>> Alpha Chapters of my book on Hadoop are available
>>> http://www.apress.com/book/view/9781430219422
>>>
>>>
>>
>
>
> -- 
> Alpha Chapters of my book on Hadoop are available
> http://www.apress.com/book/view/9781430219422
>

Re: Can't start fully-distributed operation of Hadoop in Sun Grid Engine

Posted by jason hadoop <ja...@gmail.com>.

It may be that the sun grid is similar to the EC2 and the machines have an
internal IPaddress/name that MUST be used for inter machine communication
and an external IPaddress/name that is only for internet access.

The above overly complex sentence basically states there may be some
firewall rules/tools in the sun grid that you need to be aware of and use.

On Sun, Apr 26, 2009 at 6:31 AM, Jasmine (Xuanjing) Huang <
xjhuang@cs.umass.edu> wrote:

> Hi, Jason,
>
> Thanks for your advice, after insert port into the file of
> "hadoop-site.xml", I can start namenode and run job now.
> But my system works only when I  set localhost to masters and add localhost
> (as well as some other nodes) to slavers file. And all the tasks are
> Data-local map tasks. I wonder if whether I enter fully distributed mode, or
> still in pseudo mode.
>
> As for the SGE, I am only a user and know little about it. This is the user
> manual of our cluster:
> http://www.cs.umass.edu/~swarm/index.php?n=Main.UserDoc<http://www.cs.umass.edu/%7Eswarm/index.php?n=Main.UserDoc>
>
> Best,
> Jasmine
>
> ----- Original Message ----- From: "jason hadoop" <ja...@gmail.com>
> To: <co...@hadoop.apache.org>
> Sent: Sunday, April 26, 2009 12:06 AM
> Subject: Re: Can't start fully-distributed operation of Hadoop in Sun Grid
> Engine
>
>
>
>  the parameter you specify for fs.default name should be of the form
>> hdfs://host:port and the parameter you specify for the mapred.job.tracker
>> MUST be host:port. I haven't looked at 18.3,  but it appears that the
>> :port
>> is mandatory.
>>
>> In your case, the piece of code parsing the fs.default.name variable is
>> not
>> able to tokenize it into protocol host and port correctly
>>
>> recap:
>> fs.default.name hdfs://namenodeHost:port
>> mapred.job.tracker jobtrackerHost:port
>> sepecify all the parts above and try again.
>>
>> Can you please point me at information on using the sun grid, I want to
>> include a paragraph or two about it in my book.
>>
>> On Sat, Apr 25, 2009 at 4:28 PM, Jasmine (Xuanjing) Huang <
>> xjhuang@cs.umass.edu> wrote:
>>
>>  Hi, there,
>>>
>>> My hadoop system (version: 0.18.3) works well under standalone and
>>> pseudo-distributed operation. But if I try to run hadoop in
>>> fully-distributed mode in Sun Grid Engine, Hadoop always failed -- in
>>> fact,
>>> the jobTracker and TaskzTracker can be started, but the namenode and
>>> secondary namenode cannot be started. Could anyone help me with it?
>>>
>>> My SGE scripts looks like:
>>>
>>> #!/bin/bash
>>> #$ -cwd
>>> #$ -S /bin/bash
>>> #$ -l long=TRUE
>>> #$ -v JAVA_HOME=/usr/java/latest
>>> #$ -v HADOOP_HOME=*********
>>> #$ -pe hadoop 6
>>> PATH="$HADOOP_HOME/bin:$PATH"
>>> hadoop fs -put ********
>>> hadoop jar *****
>>> hadoop fs -get *********
>>>
>>> Then the output looks like:
>>> Exception in thread "main" java.lang.NumberFormatException: For input
>>> string: ""
>>>      at
>>> java.lang.NumberFormatException.forInputString(NumberFormatException.
>>> java:48)
>>>      at java.lang.Integer.parseInt(Integer.java:468)
>>>      at java.lang.Integer.parseInt(Integer.java:497)
>>>      at
>>> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:144)
>>>      at org.apache.hadoop.dfs.NameNode.getAddress(NameNode.java:116)
>>>      at
>>> org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFil
>>> eSystem.java:66)
>>>      at
>>> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1339
>>> )
>>>      at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:56)
>>>      at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1351)
>>>      at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:213)
>>>      at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:118)
>>>      at org.apache.hadoop.fs.FsShell.init(FsShell.java:88)
>>>      at org.apache.hadoop.fs.FsShell.run(FsShell.java:1703)
>>>      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>>      at org.apache.hadoop.fs.FsShell.main(FsShell.java:1852)
>>>
>>> And the log of NameNode looks like
>>> 2009-04-25 17:27:17,032 INFO org.apache.hadoop.dfs.NameNode: STARTUP_MSG:
>>> /************************************************************
>>> STARTUP_MSG: Starting NameNode
>>> STARTUP_MSG:   host = ************
>>> STARTUP_MSG:   args = []
>>> STARTUP_MSG:   version = 0.18.3
>>> ************************************************************/
>>> 2009-04-25 17:27:17,147 ERROR org.apache.hadoop.dfs.NameNode:
>>> java.lang.NumberFormatException: For i
>>> nput string: ""
>>>      at
>>>
>>> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
>>>      at java.lang.Integer.parseInt(Integer.java:468)
>>>      at java.lang.Integer.parseInt(Integer.java:497)
>>>      at
>>> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:144)
>>>      at org.apache.hadoop.dfs.NameNode.getAddress(NameNode.java:116)
>>>      at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:136)
>>>      at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)
>>>      at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)
>>>      at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:830)
>>>      at org.apache.hadoop.dfs.NameNode.main(NameNode.java:839)
>>>
>>> 2009-04-25 17:27:17,149 INFO org.apache.hadoop.dfs.NameNode:
>>> SHUTDOWN_MSG:
>>> /************************************************************
>>> SHUTDOWN_MSG: Shutting down NameNode at ***************
>>>
>>> Best,
>>> Jasmine
>>>
>>>
>>>
>>
>> --
>> Alpha Chapters of my book on Hadoop are available
>> http://www.apress.com/book/view/9781430219422
>>
>>
>


-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422

Re: Can't start fully-distributed operation of Hadoop in Sun Grid Engine

Posted by "Jasmine (Xuanjing) Huang" <xj...@cs.umass.edu>.

Hi, Jason,

Thanks for your advice, after insert port into the file of 
"hadoop-site.xml", I can start namenode and run job now.
But my system works only when I  set localhost to masters and add localhost 
(as well as some other nodes) to slavers file. And all the tasks are 
Data-local map tasks. I wonder if whether I enter fully distributed mode, or 
still in pseudo mode.

As for the SGE, I am only a user and know little about it. This is the user 
manual of our cluster: 
http://www.cs.umass.edu/~swarm/index.php?n=Main.UserDoc

Best,
Jasmine

----- Original Message ----- 
From: "jason hadoop" <ja...@gmail.com>
To: <co...@hadoop.apache.org>
Sent: Sunday, April 26, 2009 12:06 AM
Subject: Re: Can't start fully-distributed operation of Hadoop in Sun Grid 
Engine


> the parameter you specify for fs.default name should be of the form
> hdfs://host:port and the parameter you specify for the mapred.job.tracker
> MUST be host:port. I haven't looked at 18.3,  but it appears that the 
> :port
> is mandatory.
>
> In your case, the piece of code parsing the fs.default.name variable is 
> not
> able to tokenize it into protocol host and port correctly
>
> recap:
> fs.default.name hdfs://namenodeHost:port
> mapred.job.tracker jobtrackerHost:port
> sepecify all the parts above and try again.
>
> Can you please point me at information on using the sun grid, I want to
> include a paragraph or two about it in my book.
>
> On Sat, Apr 25, 2009 at 4:28 PM, Jasmine (Xuanjing) Huang <
> xjhuang@cs.umass.edu> wrote:
>
>> Hi, there,
>>
>> My hadoop system (version: 0.18.3) works well under standalone and
>> pseudo-distributed operation. But if I try to run hadoop in
>> fully-distributed mode in Sun Grid Engine, Hadoop always failed -- in 
>> fact,
>> the jobTracker and TaskzTracker can be started, but the namenode and
>> secondary namenode cannot be started. Could anyone help me with it?
>>
>> My SGE scripts looks like:
>>
>> #!/bin/bash
>> #$ -cwd
>> #$ -S /bin/bash
>> #$ -l long=TRUE
>> #$ -v JAVA_HOME=/usr/java/latest
>> #$ -v HADOOP_HOME=*********
>> #$ -pe hadoop 6
>> PATH="$HADOOP_HOME/bin:$PATH"
>> hadoop fs -put ********
>> hadoop jar *****
>> hadoop fs -get *********
>>
>> Then the output looks like:
>> Exception in thread "main" java.lang.NumberFormatException: For input
>> string: ""
>>       at
>> java.lang.NumberFormatException.forInputString(NumberFormatException.
>> java:48)
>>       at java.lang.Integer.parseInt(Integer.java:468)
>>       at java.lang.Integer.parseInt(Integer.java:497)
>>       at 
>> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:144)
>>       at org.apache.hadoop.dfs.NameNode.getAddress(NameNode.java:116)
>>       at
>> org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFil
>> eSystem.java:66)
>>       at
>> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1339
>> )
>>       at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:56)
>>       at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1351)
>>       at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:213)
>>       at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:118)
>>       at org.apache.hadoop.fs.FsShell.init(FsShell.java:88)
>>       at org.apache.hadoop.fs.FsShell.run(FsShell.java:1703)
>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>       at org.apache.hadoop.fs.FsShell.main(FsShell.java:1852)
>>
>> And the log of NameNode looks like
>> 2009-04-25 17:27:17,032 INFO org.apache.hadoop.dfs.NameNode: STARTUP_MSG:
>> /************************************************************
>> STARTUP_MSG: Starting NameNode
>> STARTUP_MSG:   host = ************
>> STARTUP_MSG:   args = []
>> STARTUP_MSG:   version = 0.18.3
>> ************************************************************/
>> 2009-04-25 17:27:17,147 ERROR org.apache.hadoop.dfs.NameNode:
>> java.lang.NumberFormatException: For i
>> nput string: ""
>>       at
>> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
>>       at java.lang.Integer.parseInt(Integer.java:468)
>>       at java.lang.Integer.parseInt(Integer.java:497)
>>       at 
>> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:144)
>>       at org.apache.hadoop.dfs.NameNode.getAddress(NameNode.java:116)
>>       at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:136)
>>       at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)
>>       at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)
>>       at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:830)
>>       at org.apache.hadoop.dfs.NameNode.main(NameNode.java:839)
>>
>> 2009-04-25 17:27:17,149 INFO org.apache.hadoop.dfs.NameNode: 
>> SHUTDOWN_MSG:
>> /************************************************************
>> SHUTDOWN_MSG: Shutting down NameNode at ***************
>>
>> Best,
>> Jasmine
>>
>>
>
>
> -- 
> Alpha Chapters of my book on Hadoop are available
> http://www.apress.com/book/view/9781430219422
>

Re: Can't start fully-distributed operation of Hadoop in Sun Grid Engine

Posted by jason hadoop <ja...@gmail.com>.

the parameter you specify for fs.default name should be of the form
hdfs://host:port and the parameter you specify for the mapred.job.tracker
MUST be host:port. I haven't looked at 18.3,  but it appears that the :port
is mandatory.

In your case, the piece of code parsing the fs.default.name variable is not
able to tokenize it into protocol host and port correctly

recap:
fs.default.name hdfs://namenodeHost:port
mapred.job.tracker jobtrackerHost:port
sepecify all the parts above and try again.

Can you please point me at information on using the sun grid, I want to
include a paragraph or two about it in my book.

On Sat, Apr 25, 2009 at 4:28 PM, Jasmine (Xuanjing) Huang <
xjhuang@cs.umass.edu> wrote:

> Hi, there,
>
> My hadoop system (version: 0.18.3) works well under standalone and
> pseudo-distributed operation. But if I try to run hadoop in
> fully-distributed mode in Sun Grid Engine, Hadoop always failed -- in fact,
> the jobTracker and TaskzTracker can be started, but the namenode and
> secondary namenode cannot be started. Could anyone help me with it?
>
> My SGE scripts looks like:
>
> #!/bin/bash
> #$ -cwd
> #$ -S /bin/bash
> #$ -l long=TRUE
> #$ -v JAVA_HOME=/usr/java/latest
> #$ -v HADOOP_HOME=*********
> #$ -pe hadoop 6
> PATH="$HADOOP_HOME/bin:$PATH"
> hadoop fs -put ********
> hadoop jar *****
> hadoop fs -get *********
>
> Then the output looks like:
> Exception in thread "main" java.lang.NumberFormatException: For input
> string: ""
>       at
> java.lang.NumberFormatException.forInputString(NumberFormatException.
> java:48)
>       at java.lang.Integer.parseInt(Integer.java:468)
>       at java.lang.Integer.parseInt(Integer.java:497)
>       at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:144)
>       at org.apache.hadoop.dfs.NameNode.getAddress(NameNode.java:116)
>       at
> org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFil
> eSystem.java:66)
>       at
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1339
> )
>       at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:56)
>       at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1351)
>       at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:213)
>       at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:118)
>       at org.apache.hadoop.fs.FsShell.init(FsShell.java:88)
>       at org.apache.hadoop.fs.FsShell.run(FsShell.java:1703)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>       at org.apache.hadoop.fs.FsShell.main(FsShell.java:1852)
>
> And the log of NameNode looks like
> 2009-04-25 17:27:17,032 INFO org.apache.hadoop.dfs.NameNode: STARTUP_MSG:
> /************************************************************
> STARTUP_MSG: Starting NameNode
> STARTUP_MSG:   host = ************
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 0.18.3
> ************************************************************/
> 2009-04-25 17:27:17,147 ERROR org.apache.hadoop.dfs.NameNode:
> java.lang.NumberFormatException: For i
> nput string: ""
>       at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
>       at java.lang.Integer.parseInt(Integer.java:468)
>       at java.lang.Integer.parseInt(Integer.java:497)
>       at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:144)
>       at org.apache.hadoop.dfs.NameNode.getAddress(NameNode.java:116)
>       at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:136)
>       at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)
>       at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)
>       at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:830)
>       at org.apache.hadoop.dfs.NameNode.main(NameNode.java:839)
>
> 2009-04-25 17:27:17,149 INFO org.apache.hadoop.dfs.NameNode: SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down NameNode at ***************
>
> Best,
> Jasmine
>
>


-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422

Can't start fully-distributed operation of Hadoop in Sun Grid Engine

Posted by "Jasmine (Xuanjing) Huang" <xj...@cs.umass.edu>.

Hi, there,

My hadoop system (version: 0.18.3) works well under standalone and 
pseudo-distributed operation. But if I try to run hadoop in 
fully-distributed mode in Sun Grid Engine, Hadoop always failed -- in fact, 
the jobTracker and TaskzTracker can be started, but the namenode and 
secondary namenode cannot be started. Could anyone help me with it?

My SGE scripts looks like:

#!/bin/bash
#$ -cwd
#$ -S /bin/bash
#$ -l long=TRUE
#$ -v JAVA_HOME=/usr/java/latest
#$ -v HADOOP_HOME=*********
#$ -pe hadoop 6
PATH="$HADOOP_HOME/bin:$PATH"
hadoop fs -put ********
hadoop jar *****
hadoop fs -get *********

Then the output looks like:
Exception in thread "main" java.lang.NumberFormatException: For input 
string: ""
        at 
java.lang.NumberFormatException.forInputString(NumberFormatException.
java:48)
        at java.lang.Integer.parseInt(Integer.java:468)
        at java.lang.Integer.parseInt(Integer.java:497)
        at 
org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:144)
        at org.apache.hadoop.dfs.NameNode.getAddress(NameNode.java:116)
        at 
org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFil
eSystem.java:66)
        at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1339
)
        at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:56)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1351)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:213)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:118)
        at org.apache.hadoop.fs.FsShell.init(FsShell.java:88)
        at org.apache.hadoop.fs.FsShell.run(FsShell.java:1703)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.fs.FsShell.main(FsShell.java:1852)

And the log of NameNode looks like
2009-04-25 17:27:17,032 INFO org.apache.hadoop.dfs.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = ************
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.18.3
************************************************************/
2009-04-25 17:27:17,147 ERROR org.apache.hadoop.dfs.NameNode: 
java.lang.NumberFormatException: For i
nput string: ""
        at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
        at java.lang.Integer.parseInt(Integer.java:468)
        at java.lang.Integer.parseInt(Integer.java:497)
        at 
org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:144)
        at org.apache.hadoop.dfs.NameNode.getAddress(NameNode.java:116)
        at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:136)
        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)
        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)
        at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:830)
        at org.apache.hadoop.dfs.NameNode.main(NameNode.java:839)

2009-04-25 17:27:17,149 INFO org.apache.hadoop.dfs.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ***************

Best,
Jasmine

Re: Processing High CPU & Memory intensive tasks on Hadoop - Architecture question

Posted by jason hadoop <ja...@gmail.com>.

static, pinned items persist across jvm reuse.

On Sat, Apr 25, 2009 at 6:44 AM, amit handa <am...@gmail.com> wrote:

> Thanks Aaron,
>
> The processing libs that we use, which take time to load are all c++ based
> .so libs.
> Can i invoke it from JVM during the configure stage of the mapper and keep
> it running as you suggested ?
> Can you point me to some documentation regarding the same ?
>
> Regards,
> Amit
>
> On Sat, Apr 25, 2009 at 1:42 PM, Aaron Kimball <aa...@cloudera.com> wrote:
>
> > Amit,
> >
> > This can be made to work with Hadoop. Basically, in your mapper's
> > "configure" stage it would do the heavy load-in process, then it would
> > process your individual work items as records during the actual "map"
> > stage.
> > A map task can be comprised of many records, so you'll be fine here.
> >
> > If you use Hadoop 0.19 or 0.20, you can also enable JVM reuse, where
> > multiple map tasks are performed serially in the same JVM instance. In
> this
> > case, the first task in the JVM would do the heavy load-in process into
> > static fields or other globally-accessible items; subsequent tasks could
> > recognize that the system state is already initialized and would not need
> > to
> > repeat it.
> >
> > The number of mapper/reducer tasks that run in parallel on a given node
> can
> > be configured with a simple setting; setting this to 6 will work just
> fine.
> > The capacity / fairshare schedulers are not what you need here -- their
> > main
> > function is to ensure that multiple jobs (separate sets of tasks) can all
> > make progress simultaneously by sharing cluster resources across jobs
> > rather
> > than running jobs in a FIFO fashion.
> >
> > - Aaron
> >
> > On Sat, Apr 25, 2009 at 2:36 PM, amit handa <am...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > We are planning to use hadoop for some very expensive and long running
> > > processing tasks.
> > > The computing nodes that we plan to use are very heavy in terms of CPU
> > and
> > > memory requirement e.g one process instance takes almost 100% CPU (1
> > core)
> > > and around 300 -400 MB of RAM.
> > > The first time the process loads it can take around 1-1:30 minutes but
> > > after
> > > that we can provide the data to process and it takes few seconds to
> > > process.
> > > Can I model it on hadoop ?
> > > Can I have my processes pre-loaded on the task processing machines and
> > the
> > > data be provided by hadoop? This will save the 1-1:30 minutes of intial
> > > load
> > > time that it would otherwise take for each task.
> > > I want to run a number of these processes in parallel  based on the
> > > machines
> > > capacity (e.g 6 instances on a 8 cpu box) or using capacity scheduler.
> > >
> > > Please let me know if this is possible or any pointers to how it can be
> > > done
> > > ?
> > >
> > > Thanks,
> > > Amit
> > >
> >
>



-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422

Re: Processing High CPU & Memory intensive tasks on Hadoop - Architecture question

Posted by amit handa <am...@gmail.com>.

Thanks Aaron,

The processing libs that we use, which take time to load are all c++ based
.so libs.
Can i invoke it from JVM during the configure stage of the mapper and keep
it running as you suggested ?
Can you point me to some documentation regarding the same ?

Regards,
Amit

On Sat, Apr 25, 2009 at 1:42 PM, Aaron Kimball <aa...@cloudera.com> wrote:

> Amit,
>
> This can be made to work with Hadoop. Basically, in your mapper's
> "configure" stage it would do the heavy load-in process, then it would
> process your individual work items as records during the actual "map"
> stage.
> A map task can be comprised of many records, so you'll be fine here.
>
> If you use Hadoop 0.19 or 0.20, you can also enable JVM reuse, where
> multiple map tasks are performed serially in the same JVM instance. In this
> case, the first task in the JVM would do the heavy load-in process into
> static fields or other globally-accessible items; subsequent tasks could
> recognize that the system state is already initialized and would not need
> to
> repeat it.
>
> The number of mapper/reducer tasks that run in parallel on a given node can
> be configured with a simple setting; setting this to 6 will work just fine.
> The capacity / fairshare schedulers are not what you need here -- their
> main
> function is to ensure that multiple jobs (separate sets of tasks) can all
> make progress simultaneously by sharing cluster resources across jobs
> rather
> than running jobs in a FIFO fashion.
>
> - Aaron
>
> On Sat, Apr 25, 2009 at 2:36 PM, amit handa <am...@gmail.com> wrote:
>
> > Hi,
> >
> > We are planning to use hadoop for some very expensive and long running
> > processing tasks.
> > The computing nodes that we plan to use are very heavy in terms of CPU
> and
> > memory requirement e.g one process instance takes almost 100% CPU (1
> core)
> > and around 300 -400 MB of RAM.
> > The first time the process loads it can take around 1-1:30 minutes but
> > after
> > that we can provide the data to process and it takes few seconds to
> > process.
> > Can I model it on hadoop ?
> > Can I have my processes pre-loaded on the task processing machines and
> the
> > data be provided by hadoop? This will save the 1-1:30 minutes of intial
> > load
> > time that it would otherwise take for each task.
> > I want to run a number of these processes in parallel  based on the
> > machines
> > capacity (e.g 6 instances on a 8 cpu box) or using capacity scheduler.
> >
> > Please let me know if this is possible or any pointers to how it can be
> > done
> > ?
> >
> > Thanks,
> > Amit
> >
>

Re: Processing High CPU & Memory intensive tasks on Hadoop - Architecture question

Posted by Aaron Kimball <aa...@cloudera.com>.

Amit,

This can be made to work with Hadoop. Basically, in your mapper's
"configure" stage it would do the heavy load-in process, then it would
process your individual work items as records during the actual "map" stage.
A map task can be comprised of many records, so you'll be fine here.

If you use Hadoop 0.19 or 0.20, you can also enable JVM reuse, where
multiple map tasks are performed serially in the same JVM instance. In this
case, the first task in the JVM would do the heavy load-in process into
static fields or other globally-accessible items; subsequent tasks could
recognize that the system state is already initialized and would not need to
repeat it.

The number of mapper/reducer tasks that run in parallel on a given node can
be configured with a simple setting; setting this to 6 will work just fine.
The capacity / fairshare schedulers are not what you need here -- their main
function is to ensure that multiple jobs (separate sets of tasks) can all
make progress simultaneously by sharing cluster resources across jobs rather
than running jobs in a FIFO fashion.

- Aaron

On Sat, Apr 25, 2009 at 2:36 PM, amit handa <am...@gmail.com> wrote:

> Hi,
>
> We are planning to use hadoop for some very expensive and long running
> processing tasks.
> The computing nodes that we plan to use are very heavy in terms of CPU and
> memory requirement e.g one process instance takes almost 100% CPU (1 core)
> and around 300 -400 MB of RAM.
> The first time the process loads it can take around 1-1:30 minutes but
> after
> that we can provide the data to process and it takes few seconds to
> process.
> Can I model it on hadoop ?
> Can I have my processes pre-loaded on the task processing machines and the
> data be provided by hadoop? This will save the 1-1:30 minutes of intial
> load
> time that it would otherwise take for each task.
> I want to run a number of these processes in parallel  based on the
> machines
> capacity (e.g 6 instances on a 8 cpu box) or using capacity scheduler.
>
> Please let me know if this is possible or any pointers to how it can be
> done
> ?
>
> Thanks,
> Amit
>