You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Robert Krüger <kr...@signal7.de> on 2008/06/12 11:02:15 UTC

Programatically initializing and starting HDFS cluster

Hi,

for our developers I would like to write a few lines of Java code that, 
given a base directory, sets up an HDFS filesystem, initializes it, if 
it is not there yet and then starts the service(s) in process. This is 
to run on each developer's machine, probably within a tomcat instance. I 
don't want to do this (if I don't have to) in a bunch of shell scripts.

Could anyone point to code samples that do similar things or give any 
other hints that make this easier than to look at what the Command line 
tools do and reverse engineer it from there?

Thanks in advance,

Robert

Re: Programatically initializing and starting HDFS cluster

Posted by Chris Collins <ch...@scoutlabs.com>.
I am also interested about this option, since I will probably be  
hacking at such a thing in the next few weeks.

I am also curious if you can run MR jobs within process rather than  
launching each time.  The scenario is when initialization takes just  
way too long for a map reduce shard to be executed in this model.  For  
example, say you are trying to compute the top n terms within a set of  
documents where top n is those top rarest terms in some model  corpus,  
perhaps you have a df index, or perhaps you have a huge nlp engine  
thats used for entity extraction, any of these assume  a chunk of  
memory and  a chunk of time to init each pass.

Here of course you really would need not only to specify the job, but  
somehow constrain the candidate nodes this can run on based upon their  
ability to run this.

C

On Jun 12, 2008, at 2:02 AM, Robert Krüger wrote:

>
> Hi,
>
> for our developers I would like to write a few lines of Java code  
> that, given a base directory, sets up an HDFS filesystem,  
> initializes it, if it is not there yet and then starts the  
> service(s) in process. This is to run on each developer's machine,  
> probably within a tomcat instance. I don't want to do this (if I  
> don't have to) in a bunch of shell scripts.
>
> Could anyone point to code samples that do similar things or give  
> any other hints that make this easier than to look at what the  
> Command line tools do and reverse engineer it from there?
>
> Thanks in advance,
>
> Robert