You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Harsh J <ha...@cloudera.com> on 2011/06/21 23:02:53 UTC

Re: Large startup time in remote MapReduce job

Allen,

On Wed, Jun 22, 2011 at 2:28 AM, Allen Wittenauer <aw...@apache.org> wrote:
>
> On Jun 21, 2011, at 1:31 PM, Harsh J wrote:
>
>> Gabor,
>>
>> If your jar does not contain code changes that need to get transmitted
>> every time, you can consider placing them on the JT/TT classpaths
>
>        ... which means you get to bounce your system every time you change code.

Its ugly, but if the jar filename remains the same there shouldn't
need to be any bouncing. Doable if there's no activity at replacement
point of time?

P.s. Gabor: Moving the discussion to mapreduce-user@hadoop.apache.org
Please use the general@ list only for general project-wide discussions
on Hadoop.

-- 
Harsh J

Re: Large startup time in remote MapReduce job

Posted by Allen Wittenauer <aw...@apache.org>.
On Jun 22, 2011, at 10:08 AM, Allen Wittenauer wrote:

> 
> On Jun 21, 2011, at 2:02 PM, Harsh J wrote:
>>>> 
>>>> If your jar does not contain code changes that need to get transmitted
>>>> every time, you can consider placing them on the JT/TT classpaths
>>> 
>>>       ... which means you get to bounce your system every time you change code.
>> 
>> Its ugly, but if the jar filename remains the same there shouldn't
>> need to be any bouncing. Doable if there's no activity at replacement
>> point of time?
> 
> 	I have yet to find a jar file that never ever changes.


	(I suppose the exception to this rule is all the java code at places like NASA, JPL, etc involved with the space program.  But that's totally cheating!)



Re: Large startup time in remote MapReduce job

Posted by Allen Wittenauer <aw...@apache.org>.
On Jun 21, 2011, at 2:02 PM, Harsh J wrote:
>>> 
>>> If your jar does not contain code changes that need to get transmitted
>>> every time, you can consider placing them on the JT/TT classpaths
>> 
>>        ... which means you get to bounce your system every time you change code.
> 
> Its ugly, but if the jar filename remains the same there shouldn't
> need to be any bouncing. Doable if there's no activity at replacement
> point of time?

	I have yet to find a jar file that never ever changes.

	It may take a year+, but it will change.  The corollary is that you'll need to change it at the worst possible time and in an incompatible way such that older code will break and need to be upgraded.  So don't do it. :)


Re: Large startup time in remote MapReduce job

Posted by John Armstrong <jo...@ccri.com>.
On Wed, 22 Jun 2011 00:15:56 +0200, Gabor Makrai <ma...@gmail.com>
wrote:
> Fortunately, DistributedCache solved my problem! I put a jar file to
> HDFS. which contains the necessary classes for the job and I used this:
> *DistributedCache.addFileToClassPath(new Path("/myjar/myjar.jar"),
conf);*

Can I ask which version of Hadoop you're using?  Whenever I try to use
addFileToClassPath on 0.20.2+737 it adds the file to the distributed cache
but my mappers and reducers still can't find the classes.  I'm stuck with
handing around a huge fat jar as my job.jar that contains all the
dependencies my mappers and reducers need.  I think this is related to
MAPREDUCE-752, but so far nobody on this list has really tried to give a
real diagnosis.

Re: Large startup time in remote MapReduce job

Posted by Gabor Makrai <ma...@gmail.com>.
Fortunately, DistributedCache solved my problem! I put a jar file to
HDFS. which contains the necessary classes for the job and I used this:
*DistributedCache.addFileToClassPath(new Path("/myjar/myjar.jar"), conf);*

Thanks for fast answer!
And sorry for my mistake (about the wrong list), that was my first one!
Thank again!

Best,
Gabor

On Tue, Jun 21, 2011 at 11:02 PM, Harsh J <ha...@cloudera.com> wrote:

> Allen,
>
> On Wed, Jun 22, 2011 at 2:28 AM, Allen Wittenauer <aw...@apache.org> wrote:
> >
> > On Jun 21, 2011, at 1:31 PM, Harsh J wrote:
> >
> >> Gabor,
> >>
> >> If your jar does not contain code changes that need to get transmitted
> >> every time, you can consider placing them on the JT/TT classpaths
> >
> >        ... which means you get to bounce your system every time you
> change code.
>
> Its ugly, but if the jar filename remains the same there shouldn't
> need to be any bouncing. Doable if there's no activity at replacement
> point of time?
>
> P.s. Gabor: Moving the discussion to mapreduce-user@hadoop.apache.org
> Please use the general@ list only for general project-wide discussions
> on Hadoop.
>
> --
> Harsh J
>