You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by Chris Embree <ce...@gmail.com> on 2013/01/22 20:24:40 UTC

Where do/should .jar files live?

Hi List,

This should be a simple question, I think.  Disclosure, I am not a java
developer. ;)

We're getting ready to build our Dev and Prod clusters. I'm pretty
comfortable with HDFS and how it sits atop several local file systems on
multiple servers.  I'm fairly comfortable with the concept of Map/Reduce
and why it's cool and we want it.

Now for the question.  Where should my developers, put and store their jar
files?  Or asked another way, what's the best entry point for submitting
jobs?

We have separate physical systems for NN, Checkpoint Node (formerly 2nn),
Job Tracker and Standby NN.  Should I run from the JT node? Do I keep all
of my finished .jar's on the JT local file system?
Or should I expect that jobs will be run via Oozie?  Do I put jars on the
local Oozie FS?

Thanks in advance.
Chris

Re: Where do/should .jar files live?

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.

On top of what Bejoy said, just wanted to add that when you submit a job to
Hadoop using the hadoop jar command, the jars which you reference in the
command on the edge/client node will be picked up by Hadoop and made
available to the cluster nodes where the mappers and reducers run.

Thanks
Hemanth


On Wed, Jan 23, 2013 at 8:24 AM, <be...@gmail.com> wrote:

> **
> Hi Chris
>
> In larger clusters it is better to have an edge/client node where all the
> user jars reside and you trigger your MR jobs from here.
>
> A client/edge node is a server with hadoop jars and conf but hosting no
> daemons.
>
> In smaller clusters one DN might act as the client node and you can
> execute your jars from there. Here you have a risk of that DN getting
> filled if the files are copied to hdfs from this DN (as per block placement
> policy one replica would always be on this node)
>
>
> In oozie you put your executables into hdfs . But oozie comes at an
> integration level. In initial development phase, developers put jar into
> the LFS on client node, execute and test their code.
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> ------------------------------
> *From: * Chris Embree <ce...@gmail.com>
> *Date: *Tue, 22 Jan 2013 14:24:40 -0500
> *To: *<us...@hadoop.apache.org>
> *ReplyTo: * user@hadoop.apache.org
> *Subject: *Where do/should .jar files live?
>
> Hi List,
>
> This should be a simple question, I think.  Disclosure, I am not a java
> developer. ;)
>
> We're getting ready to build our Dev and Prod clusters. I'm pretty
> comfortable with HDFS and how it sits atop several local file systems on
> multiple servers.  I'm fairly comfortable with the concept of Map/Reduce
> and why it's cool and we want it.
>
> Now for the question.  Where should my developers, put and store their jar
> files?  Or asked another way, what's the best entry point for submitting
> jobs?
>
> We have separate physical systems for NN, Checkpoint Node (formerly 2nn),
> Job Tracker and Standby NN.  Should I run from the JT node? Do I keep all
> of my finished .jar's on the JT local file system?
> Or should I expect that jobs will be run via Oozie?  Do I put jars on the
> local Oozie FS?
>
> Thanks in advance.
> Chris
>

Re: Where do/should .jar files live?

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.

On top of what Bejoy said, just wanted to add that when you submit a job to
Hadoop using the hadoop jar command, the jars which you reference in the
command on the edge/client node will be picked up by Hadoop and made
available to the cluster nodes where the mappers and reducers run.

Thanks
Hemanth


On Wed, Jan 23, 2013 at 8:24 AM, <be...@gmail.com> wrote:

> **
> Hi Chris
>
> In larger clusters it is better to have an edge/client node where all the
> user jars reside and you trigger your MR jobs from here.
>
> A client/edge node is a server with hadoop jars and conf but hosting no
> daemons.
>
> In smaller clusters one DN might act as the client node and you can
> execute your jars from there. Here you have a risk of that DN getting
> filled if the files are copied to hdfs from this DN (as per block placement
> policy one replica would always be on this node)
>
>
> In oozie you put your executables into hdfs . But oozie comes at an
> integration level. In initial development phase, developers put jar into
> the LFS on client node, execute and test their code.
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> ------------------------------
> *From: * Chris Embree <ce...@gmail.com>
> *Date: *Tue, 22 Jan 2013 14:24:40 -0500
> *To: *<us...@hadoop.apache.org>
> *ReplyTo: * user@hadoop.apache.org
> *Subject: *Where do/should .jar files live?
>
> Hi List,
>
> This should be a simple question, I think.  Disclosure, I am not a java
> developer. ;)
>
> We're getting ready to build our Dev and Prod clusters. I'm pretty
> comfortable with HDFS and how it sits atop several local file systems on
> multiple servers.  I'm fairly comfortable with the concept of Map/Reduce
> and why it's cool and we want it.
>
> Now for the question.  Where should my developers, put and store their jar
> files?  Or asked another way, what's the best entry point for submitting
> jobs?
>
> We have separate physical systems for NN, Checkpoint Node (formerly 2nn),
> Job Tracker and Standby NN.  Should I run from the JT node? Do I keep all
> of my finished .jar's on the JT local file system?
> Or should I expect that jobs will be run via Oozie?  Do I put jars on the
> local Oozie FS?
>
> Thanks in advance.
> Chris
>

Re: Where do/should .jar files live?

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.

On top of what Bejoy said, just wanted to add that when you submit a job to
Hadoop using the hadoop jar command, the jars which you reference in the
command on the edge/client node will be picked up by Hadoop and made
available to the cluster nodes where the mappers and reducers run.

Thanks
Hemanth


On Wed, Jan 23, 2013 at 8:24 AM, <be...@gmail.com> wrote:

> **
> Hi Chris
>
> In larger clusters it is better to have an edge/client node where all the
> user jars reside and you trigger your MR jobs from here.
>
> A client/edge node is a server with hadoop jars and conf but hosting no
> daemons.
>
> In smaller clusters one DN might act as the client node and you can
> execute your jars from there. Here you have a risk of that DN getting
> filled if the files are copied to hdfs from this DN (as per block placement
> policy one replica would always be on this node)
>
>
> In oozie you put your executables into hdfs . But oozie comes at an
> integration level. In initial development phase, developers put jar into
> the LFS on client node, execute and test their code.
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> ------------------------------
> *From: * Chris Embree <ce...@gmail.com>
> *Date: *Tue, 22 Jan 2013 14:24:40 -0500
> *To: *<us...@hadoop.apache.org>
> *ReplyTo: * user@hadoop.apache.org
> *Subject: *Where do/should .jar files live?
>
> Hi List,
>
> This should be a simple question, I think.  Disclosure, I am not a java
> developer. ;)
>
> We're getting ready to build our Dev and Prod clusters. I'm pretty
> comfortable with HDFS and how it sits atop several local file systems on
> multiple servers.  I'm fairly comfortable with the concept of Map/Reduce
> and why it's cool and we want it.
>
> Now for the question.  Where should my developers, put and store their jar
> files?  Or asked another way, what's the best entry point for submitting
> jobs?
>
> We have separate physical systems for NN, Checkpoint Node (formerly 2nn),
> Job Tracker and Standby NN.  Should I run from the JT node? Do I keep all
> of my finished .jar's on the JT local file system?
> Or should I expect that jobs will be run via Oozie?  Do I put jars on the
> local Oozie FS?
>
> Thanks in advance.
> Chris
>

Re: Where do/should .jar files live?

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.

On top of what Bejoy said, just wanted to add that when you submit a job to
Hadoop using the hadoop jar command, the jars which you reference in the
command on the edge/client node will be picked up by Hadoop and made
available to the cluster nodes where the mappers and reducers run.

Thanks
Hemanth


On Wed, Jan 23, 2013 at 8:24 AM, <be...@gmail.com> wrote:

> **
> Hi Chris
>
> In larger clusters it is better to have an edge/client node where all the
> user jars reside and you trigger your MR jobs from here.
>
> A client/edge node is a server with hadoop jars and conf but hosting no
> daemons.
>
> In smaller clusters one DN might act as the client node and you can
> execute your jars from there. Here you have a risk of that DN getting
> filled if the files are copied to hdfs from this DN (as per block placement
> policy one replica would always be on this node)
>
>
> In oozie you put your executables into hdfs . But oozie comes at an
> integration level. In initial development phase, developers put jar into
> the LFS on client node, execute and test their code.
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> ------------------------------
> *From: * Chris Embree <ce...@gmail.com>
> *Date: *Tue, 22 Jan 2013 14:24:40 -0500
> *To: *<us...@hadoop.apache.org>
> *ReplyTo: * user@hadoop.apache.org
> *Subject: *Where do/should .jar files live?
>
> Hi List,
>
> This should be a simple question, I think.  Disclosure, I am not a java
> developer. ;)
>
> We're getting ready to build our Dev and Prod clusters. I'm pretty
> comfortable with HDFS and how it sits atop several local file systems on
> multiple servers.  I'm fairly comfortable with the concept of Map/Reduce
> and why it's cool and we want it.
>
> Now for the question.  Where should my developers, put and store their jar
> files?  Or asked another way, what's the best entry point for submitting
> jobs?
>
> We have separate physical systems for NN, Checkpoint Node (formerly 2nn),
> Job Tracker and Standby NN.  Should I run from the JT node? Do I keep all
> of my finished .jar's on the JT local file system?
> Or should I expect that jobs will be run via Oozie?  Do I put jars on the
> local Oozie FS?
>
> Thanks in advance.
> Chris
>

Re: Where do/should .jar files live?

Posted by be...@gmail.com.

Hi Chris

In larger clusters it is better to have an edge/client node where all the user jars reside and you trigger your MR jobs from here.

A client/edge node is a server with hadoop jars and conf but hosting no daemons.

In smaller clusters one DN might act as the client node and you can execute your jars from there.  Here you have a risk of that DN getting filled if the files are copied to hdfs from this DN (as per block placement policy one replica would always be  on this node)


In oozie you put your executables into hdfs . But oozie comes at an integration level. In initial development phase, developers put jar into the LFS on client node, execute and test their code.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-----Original Message-----
From: Chris Embree <ce...@gmail.com>
Date: Tue, 22 Jan 2013 14:24:40 
To: <us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Where do/should .jar files live?

Hi List,

This should be a simple question, I think.  Disclosure, I am not a java
developer. ;)

We're getting ready to build our Dev and Prod clusters. I'm pretty
comfortable with HDFS and how it sits atop several local file systems on
multiple servers.  I'm fairly comfortable with the concept of Map/Reduce
and why it's cool and we want it.

Now for the question.  Where should my developers, put and store their jar
files?  Or asked another way, what's the best entry point for submitting
jobs?

We have separate physical systems for NN, Checkpoint Node (formerly 2nn),
Job Tracker and Standby NN.  Should I run from the JT node? Do I keep all
of my finished .jar's on the JT local file system?
Or should I expect that jobs will be run via Oozie?  Do I put jars on the
local Oozie FS?

Thanks in advance.
Chris

Re: Where do/should .jar files live?

Posted by be...@gmail.com.

Hi Chris

In larger clusters it is better to have an edge/client node where all the user jars reside and you trigger your MR jobs from here.

A client/edge node is a server with hadoop jars and conf but hosting no daemons.

In smaller clusters one DN might act as the client node and you can execute your jars from there.  Here you have a risk of that DN getting filled if the files are copied to hdfs from this DN (as per block placement policy one replica would always be  on this node)


In oozie you put your executables into hdfs . But oozie comes at an integration level. In initial development phase, developers put jar into the LFS on client node, execute and test their code.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-----Original Message-----
From: Chris Embree <ce...@gmail.com>
Date: Tue, 22 Jan 2013 14:24:40 
To: <us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Where do/should .jar files live?

Hi List,

This should be a simple question, I think.  Disclosure, I am not a java
developer. ;)

We're getting ready to build our Dev and Prod clusters. I'm pretty
comfortable with HDFS and how it sits atop several local file systems on
multiple servers.  I'm fairly comfortable with the concept of Map/Reduce
and why it's cool and we want it.

Now for the question.  Where should my developers, put and store their jar
files?  Or asked another way, what's the best entry point for submitting
jobs?

We have separate physical systems for NN, Checkpoint Node (formerly 2nn),
Job Tracker and Standby NN.  Should I run from the JT node? Do I keep all
of my finished .jar's on the JT local file system?
Or should I expect that jobs will be run via Oozie?  Do I put jars on the
local Oozie FS?

Thanks in advance.
Chris

Re: Where do/should .jar files live?

Posted by be...@gmail.com.

Hi Chris

In larger clusters it is better to have an edge/client node where all the user jars reside and you trigger your MR jobs from here.

A client/edge node is a server with hadoop jars and conf but hosting no daemons.

In smaller clusters one DN might act as the client node and you can execute your jars from there.  Here you have a risk of that DN getting filled if the files are copied to hdfs from this DN (as per block placement policy one replica would always be  on this node)


In oozie you put your executables into hdfs . But oozie comes at an integration level. In initial development phase, developers put jar into the LFS on client node, execute and test their code.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-----Original Message-----
From: Chris Embree <ce...@gmail.com>
Date: Tue, 22 Jan 2013 14:24:40 
To: <us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Where do/should .jar files live?

Hi List,

This should be a simple question, I think.  Disclosure, I am not a java
developer. ;)

We're getting ready to build our Dev and Prod clusters. I'm pretty
comfortable with HDFS and how it sits atop several local file systems on
multiple servers.  I'm fairly comfortable with the concept of Map/Reduce
and why it's cool and we want it.

Now for the question.  Where should my developers, put and store their jar
files?  Or asked another way, what's the best entry point for submitting
jobs?

We have separate physical systems for NN, Checkpoint Node (formerly 2nn),
Job Tracker and Standby NN.  Should I run from the JT node? Do I keep all
of my finished .jar's on the JT local file system?
Or should I expect that jobs will be run via Oozie?  Do I put jars on the
local Oozie FS?

Thanks in advance.
Chris

Re: Where do/should .jar files live?

Posted by be...@gmail.com.

Hi Chris

In larger clusters it is better to have an edge/client node where all the user jars reside and you trigger your MR jobs from here.

A client/edge node is a server with hadoop jars and conf but hosting no daemons.

In smaller clusters one DN might act as the client node and you can execute your jars from there.  Here you have a risk of that DN getting filled if the files are copied to hdfs from this DN (as per block placement policy one replica would always be  on this node)


In oozie you put your executables into hdfs . But oozie comes at an integration level. In initial development phase, developers put jar into the LFS on client node, execute and test their code.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-----Original Message-----
From: Chris Embree <ce...@gmail.com>
Date: Tue, 22 Jan 2013 14:24:40 
To: <us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Where do/should .jar files live?

Hi List,

This should be a simple question, I think.  Disclosure, I am not a java
developer. ;)

We're getting ready to build our Dev and Prod clusters. I'm pretty
comfortable with HDFS and how it sits atop several local file systems on
multiple servers.  I'm fairly comfortable with the concept of Map/Reduce
and why it's cool and we want it.

Now for the question.  Where should my developers, put and store their jar
files?  Or asked another way, what's the best entry point for submitting
jobs?

We have separate physical systems for NN, Checkpoint Node (formerly 2nn),
Job Tracker and Standby NN.  Should I run from the JT node? Do I keep all
of my finished .jar's on the JT local file system?
Or should I expect that jobs will be run via Oozie?  Do I put jars on the
local Oozie FS?

Thanks in advance.
Chris