You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Robert Krüger <kr...@signal7.de> on 2008/06/12 11:02:15 UTC
Programatically initializing and starting HDFS cluster
Hi,
for our developers I would like to write a few lines of Java code that,
given a base directory, sets up an HDFS filesystem, initializes it, if
it is not there yet and then starts the service(s) in process. This is
to run on each developer's machine, probably within a tomcat instance. I
don't want to do this (if I don't have to) in a bunch of shell scripts.
Could anyone point to code samples that do similar things or give any
other hints that make this easier than to look at what the Command line
tools do and reverse engineer it from there?
Thanks in advance,
Robert
Re: Programatically initializing and starting HDFS cluster
Posted by Chris Collins <ch...@scoutlabs.com>.
I am also interested about this option, since I will probably be
hacking at such a thing in the next few weeks.
I am also curious if you can run MR jobs within process rather than
launching each time. The scenario is when initialization takes just
way too long for a map reduce shard to be executed in this model. For
example, say you are trying to compute the top n terms within a set of
documents where top n is those top rarest terms in some model corpus,
perhaps you have a df index, or perhaps you have a huge nlp engine
thats used for entity extraction, any of these assume a chunk of
memory and a chunk of time to init each pass.
Here of course you really would need not only to specify the job, but
somehow constrain the candidate nodes this can run on based upon their
ability to run this.
C
On Jun 12, 2008, at 2:02 AM, Robert Krüger wrote:
>
> Hi,
>
> for our developers I would like to write a few lines of Java code
> that, given a base directory, sets up an HDFS filesystem,
> initializes it, if it is not there yet and then starts the
> service(s) in process. This is to run on each developer's machine,
> probably within a tomcat instance. I don't want to do this (if I
> don't have to) in a bunch of shell scripts.
>
> Could anyone point to code samples that do similar things or give
> any other hints that make this easier than to look at what the
> Command line tools do and reverse engineer it from there?
>
> Thanks in advance,
>
> Robert