You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Yang <te...@gmail.com> on 2014/10/24 20:51:37 UTC

why does hadoop creates /tmp/hadoop-user/hadoop-unjar-xxxx/ dir and unjar my fat jar?

I just noticed that when I run a "hadoop jar
my-fat-jar-with-all-dependencies.jar" , it unjars the job jar in
/tmp/hadoop-username/hadoop-unjar-xxxx/ and extracts all the classes in
there.

the fat jar is pretty big, so it took up a lot of space (particularly
inodes ) and ran out of quota.

I wonder why do we have to unjar these classes on the **client node** ? the
jar won't even be accessed until on the compute nodes, right?

Re: why does hadoop creates /tmp/hadoop-user/hadoop-unjar-xxxx/ dir and unjar my fat jar?

Posted by Harsh J <ha...@cloudera.com>.

If you use 'hadoop jar' to invoke your application, this is the default
behaviour. The reason it is done is that the utility supports use of
jars-within-jar feature, that lets one pack additional dependency jars into
an application as a lib/ subdirectory under the root of the main jar.

It is not a configurable behaviour presently, so given your inodes issue,
you may want to either use the jars-within-jar feature, which does not
produce massive amounts of .class files cause of use of packed dependent
jars within the jar's lib/, or you may avoid use of 'hadoop jar' (RunJar
utility) by invoking instead with the generated classpath:

java -cp $(hadoop classpath):my-fat-jar-with-all-dependencies.jar
your.app.mainClass

On Sat, Oct 25, 2014 at 3:17 PM, Yang <te...@gmail.com> wrote:

> I thought this might be because that hadoop wants to pack everything
> (including the -files dfs cache files) into one single jar, so I removed
> the -files commands I have.
>
> but it still extracts the jar. this is rather confusing
>
>
>
> On Fri, Oct 24, 2014 at 11:51 AM, Yang <te...@gmail.com> wrote:
>
>> I just noticed that when I run a "hadoop jar
>> my-fat-jar-with-all-dependencies.jar" , it unjars the job jar in
>> /tmp/hadoop-username/hadoop-unjar-xxxx/ and extracts all the classes in
>> there.
>>
>> the fat jar is pretty big, so it took up a lot of space (particularly
>> inodes ) and ran out of quota.
>>
>> I wonder why do we have to unjar these classes on the **client node** ?
>> the jar won't even be accessed until on the compute nodes, right?
>>
>
>


-- 
Harsh J

Re: why does hadoop creates /tmp/hadoop-user/hadoop-unjar-xxxx/ dir and unjar my fat jar?

Posted by Harsh J <ha...@cloudera.com>.

If you use 'hadoop jar' to invoke your application, this is the default
behaviour. The reason it is done is that the utility supports use of
jars-within-jar feature, that lets one pack additional dependency jars into
an application as a lib/ subdirectory under the root of the main jar.

It is not a configurable behaviour presently, so given your inodes issue,
you may want to either use the jars-within-jar feature, which does not
produce massive amounts of .class files cause of use of packed dependent
jars within the jar's lib/, or you may avoid use of 'hadoop jar' (RunJar
utility) by invoking instead with the generated classpath:

java -cp $(hadoop classpath):my-fat-jar-with-all-dependencies.jar
your.app.mainClass

On Sat, Oct 25, 2014 at 3:17 PM, Yang <te...@gmail.com> wrote:

> I thought this might be because that hadoop wants to pack everything
> (including the -files dfs cache files) into one single jar, so I removed
> the -files commands I have.
>
> but it still extracts the jar. this is rather confusing
>
>
>
> On Fri, Oct 24, 2014 at 11:51 AM, Yang <te...@gmail.com> wrote:
>
>> I just noticed that when I run a "hadoop jar
>> my-fat-jar-with-all-dependencies.jar" , it unjars the job jar in
>> /tmp/hadoop-username/hadoop-unjar-xxxx/ and extracts all the classes in
>> there.
>>
>> the fat jar is pretty big, so it took up a lot of space (particularly
>> inodes ) and ran out of quota.
>>
>> I wonder why do we have to unjar these classes on the **client node** ?
>> the jar won't even be accessed until on the compute nodes, right?
>>
>
>


-- 
Harsh J

Re: why does hadoop creates /tmp/hadoop-user/hadoop-unjar-xxxx/ dir and unjar my fat jar?

Posted by Harsh J <ha...@cloudera.com>.

If you use 'hadoop jar' to invoke your application, this is the default
behaviour. The reason it is done is that the utility supports use of
jars-within-jar feature, that lets one pack additional dependency jars into
an application as a lib/ subdirectory under the root of the main jar.

It is not a configurable behaviour presently, so given your inodes issue,
you may want to either use the jars-within-jar feature, which does not
produce massive amounts of .class files cause of use of packed dependent
jars within the jar's lib/, or you may avoid use of 'hadoop jar' (RunJar
utility) by invoking instead with the generated classpath:

java -cp $(hadoop classpath):my-fat-jar-with-all-dependencies.jar
your.app.mainClass

On Sat, Oct 25, 2014 at 3:17 PM, Yang <te...@gmail.com> wrote:

> I thought this might be because that hadoop wants to pack everything
> (including the -files dfs cache files) into one single jar, so I removed
> the -files commands I have.
>
> but it still extracts the jar. this is rather confusing
>
>
>
> On Fri, Oct 24, 2014 at 11:51 AM, Yang <te...@gmail.com> wrote:
>
>> I just noticed that when I run a "hadoop jar
>> my-fat-jar-with-all-dependencies.jar" , it unjars the job jar in
>> /tmp/hadoop-username/hadoop-unjar-xxxx/ and extracts all the classes in
>> there.
>>
>> the fat jar is pretty big, so it took up a lot of space (particularly
>> inodes ) and ran out of quota.
>>
>> I wonder why do we have to unjar these classes on the **client node** ?
>> the jar won't even be accessed until on the compute nodes, right?
>>
>
>


-- 
Harsh J

Re: why does hadoop creates /tmp/hadoop-user/hadoop-unjar-xxxx/ dir and unjar my fat jar?

Posted by Harsh J <ha...@cloudera.com>.

If you use 'hadoop jar' to invoke your application, this is the default
behaviour. The reason it is done is that the utility supports use of
jars-within-jar feature, that lets one pack additional dependency jars into
an application as a lib/ subdirectory under the root of the main jar.

It is not a configurable behaviour presently, so given your inodes issue,
you may want to either use the jars-within-jar feature, which does not
produce massive amounts of .class files cause of use of packed dependent
jars within the jar's lib/, or you may avoid use of 'hadoop jar' (RunJar
utility) by invoking instead with the generated classpath:

java -cp $(hadoop classpath):my-fat-jar-with-all-dependencies.jar
your.app.mainClass

On Sat, Oct 25, 2014 at 3:17 PM, Yang <te...@gmail.com> wrote:

> I thought this might be because that hadoop wants to pack everything
> (including the -files dfs cache files) into one single jar, so I removed
> the -files commands I have.
>
> but it still extracts the jar. this is rather confusing
>
>
>
> On Fri, Oct 24, 2014 at 11:51 AM, Yang <te...@gmail.com> wrote:
>
>> I just noticed that when I run a "hadoop jar
>> my-fat-jar-with-all-dependencies.jar" , it unjars the job jar in
>> /tmp/hadoop-username/hadoop-unjar-xxxx/ and extracts all the classes in
>> there.
>>
>> the fat jar is pretty big, so it took up a lot of space (particularly
>> inodes ) and ran out of quota.
>>
>> I wonder why do we have to unjar these classes on the **client node** ?
>> the jar won't even be accessed until on the compute nodes, right?
>>
>
>


-- 
Harsh J

Re: why does hadoop creates /tmp/hadoop-user/hadoop-unjar-xxxx/ dir and unjar my fat jar?

Posted by Yang <te...@gmail.com>.

I thought this might be because that hadoop wants to pack everything
(including the -files dfs cache files) into one single jar, so I removed
the -files commands I have.

but it still extracts the jar. this is rather confusing

On Fri, Oct 24, 2014 at 11:51 AM, Yang <te...@gmail.com> wrote:

> I just noticed that when I run a "hadoop jar
> my-fat-jar-with-all-dependencies.jar" , it unjars the job jar in
> /tmp/hadoop-username/hadoop-unjar-xxxx/ and extracts all the classes in
> there.
>
> the fat jar is pretty big, so it took up a lot of space (particularly
> inodes ) and ran out of quota.
>
> I wonder why do we have to unjar these classes on the **client node** ?
> the jar won't even be accessed until on the compute nodes, right?
>

Re: why does hadoop creates /tmp/hadoop-user/hadoop-unjar-xxxx/ dir and unjar my fat jar?

Posted by Yang <te...@gmail.com>.

I thought this might be because that hadoop wants to pack everything
(including the -files dfs cache files) into one single jar, so I removed
the -files commands I have.

but it still extracts the jar. this is rather confusing

On Fri, Oct 24, 2014 at 11:51 AM, Yang <te...@gmail.com> wrote:

> I just noticed that when I run a "hadoop jar
> my-fat-jar-with-all-dependencies.jar" , it unjars the job jar in
> /tmp/hadoop-username/hadoop-unjar-xxxx/ and extracts all the classes in
> there.
>
> the fat jar is pretty big, so it took up a lot of space (particularly
> inodes ) and ran out of quota.
>
> I wonder why do we have to unjar these classes on the **client node** ?
> the jar won't even be accessed until on the compute nodes, right?
>

Re: why does hadoop creates /tmp/hadoop-user/hadoop-unjar-xxxx/ dir and unjar my fat jar?

Posted by Yang <te...@gmail.com>.

I thought this might be because that hadoop wants to pack everything
(including the -files dfs cache files) into one single jar, so I removed
the -files commands I have.

but it still extracts the jar. this is rather confusing

On Fri, Oct 24, 2014 at 11:51 AM, Yang <te...@gmail.com> wrote:

> I just noticed that when I run a "hadoop jar
> my-fat-jar-with-all-dependencies.jar" , it unjars the job jar in
> /tmp/hadoop-username/hadoop-unjar-xxxx/ and extracts all the classes in
> there.
>
> the fat jar is pretty big, so it took up a lot of space (particularly
> inodes ) and ran out of quota.
>
> I wonder why do we have to unjar these classes on the **client node** ?
> the jar won't even be accessed until on the compute nodes, right?
>

Re: why does hadoop creates /tmp/hadoop-user/hadoop-unjar-xxxx/ dir and unjar my fat jar?

Posted by Yang <te...@gmail.com>.

I thought this might be because that hadoop wants to pack everything
(including the -files dfs cache files) into one single jar, so I removed
the -files commands I have.

but it still extracts the jar. this is rather confusing

On Fri, Oct 24, 2014 at 11:51 AM, Yang <te...@gmail.com> wrote:

> I just noticed that when I run a "hadoop jar
> my-fat-jar-with-all-dependencies.jar" , it unjars the job jar in
> /tmp/hadoop-username/hadoop-unjar-xxxx/ and extracts all the classes in
> there.
>
> the fat jar is pretty big, so it took up a lot of space (particularly
> inodes ) and ran out of quota.
>
> I wonder why do we have to unjar these classes on the **client node** ?
> the jar won't even be accessed until on the compute nodes, right?
>