You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by "Bourgon, Armel" <Ar...@citygridmedia.com> on 2015/10/15 23:40:06 UTC

hadoop 2.6 jar command doesn't load text files in classpath

Hello,

To give you a bit of context, I wrote a java library that aims to provide an easy way to coordinate multiple MR jobs and execute them with a single jar submission. The final result is a "fat jar” (build using the maven assembly plugin) that contains the different Mapper and Reducer classes and a Main class that has the logic to submit the different jobs to the cluster.

To accomplish this, the Main relies on some text files (packaged in the jar) to be present. Those files are not needed by the MR jobs themselves, it’s some kind of configuration for the Main to know how it should schedule the different MR jobs. 

The jar is executed like that:
hadoop jar the_jar_file.jar <args>

It has been used in production for a long time now but recently we decided to upgrade to hadoop 2.6 (we were using 0.20). All our jobs packaged like that are failing because the Main cannot locate the text files in the classpath.

I did a bit of debugging by replacing the Main with a piece of code that print the content of the classpath. When running the jar with:
java -jar the_jar_file.jar <args>

I can see the text files in the list. But when I run the same jar with:
hadoop jar the_jar_file.jar <args>

The text files are missing. I assume that something changed in the way the hadoop jar command read the jar and build the classpath. I found someone complaining about the same issue on stakoverflow (http://stackoverflow.com/questions/31670390/accessing-jar-resource-when-run-in-hadoop) but nobody replied.

I would like to be able to keep the same mechanism (keep those conf files in the jar and access them at runtime from the classpath), maybe their is an options to alter the way the jar command behave? Can someone point me to the source code of the jar command?

Thanks!


Re: hadoop 2.6 jar command doesn't load text files in classpath

Posted by "Bourgon, Armel" <Ar...@citygridmedia.com>.
Hello Chris,

Thanks for your interest in this issue. I set up a local installation of 2.5.2 and I am actually running into the same issue.
I will work on a jira ticket tomorrow.

Best
—
Armel Bourgon

> On Oct 15, 2015, at 3:41 PM, Chris Nauroth <cn...@hortonworks.com> wrote:
> 
> Hello Armel,
> 
> That's an interesting find.  Thank you for reporting it.
> 
> I know you mentioned an upgrade from 0.20 to 2.6.  Do you also have the
> ability to test against an earlier 2.x release, like 2.5.2?  If the
> problem repros in 2.6.0, but not in 2.5.2, then I wonder if it's a
> regression introduced by the client-side classloader isolation that we
> shipped in 2.6.0.
> 
> https://issues.apache.org/jira/browse/HADOOP-10893
> 
> 
> If you're interested in looking at the code, the most relevant piece is
> the RunJar class, which is the main entry point of the "hadoop jar"
> command.
> 
> If you have a simplified consistent repro that demonstrates the problem,
> then I suggest filing a JIRA for further investigation.  Thanks again.
> 
> --Chris Nauroth
> 
> 
> 
> 
> On 10/15/15, 2:40 PM, "Bourgon, Armel" <Ar...@citygridmedia.com>
> wrote:
> 
>> Hello,
>> 
>> To give you a bit of context, I wrote a java library that aims to provide
>> an easy way to coordinate multiple MR jobs and execute them with a single
>> jar submission. The final result is a "fat jar² (build using the maven
>> assembly plugin) that contains the different Mapper and Reducer classes
>> and a Main class that has the logic to submit the different jobs to the
>> cluster.
>> 
>> To accomplish this, the Main relies on some text files (packaged in the
>> jar) to be present. Those files are not needed by the MR jobs themselves,
>> it¹s some kind of configuration for the Main to know how it should
>> schedule the different MR jobs.
>> 
>> The jar is executed like that:
>> hadoop jar the_jar_file.jar <args>
>> 
>> It has been used in production for a long time now but recently we
>> decided to upgrade to hadoop 2.6 (we were using 0.20). All our jobs
>> packaged like that are failing because the Main cannot locate the text
>> files in the classpath.
>> 
>> I did a bit of debugging by replacing the Main with a piece of code that
>> print the content of the classpath. When running the jar with:
>> java -jar the_jar_file.jar <args>
>> 
>> I can see the text files in the list. But when I run the same jar with:
>> hadoop jar the_jar_file.jar <args>
>> 
>> The text files are missing. I assume that something changed in the way
>> the hadoop jar command read the jar and build the classpath. I found
>> someone complaining about the same issue on stakoverflow
>> (http://stackoverflow.com/questions/31670390/accessing-jar-resource-when-r
>> un-in-hadoop) but nobody replied.
>> 
>> I would like to be able to keep the same mechanism (keep those conf files
>> in the jar and access them at runtime from the classpath), maybe their is
>> an options to alter the way the jar command behave? Can someone point me
>> to the source code of the jar command?
>> 
>> Thanks!
>> 
> 


Re: hadoop 2.6 jar command doesn't load text files in classpath

Posted by "Bourgon, Armel" <Ar...@citygridmedia.com>.
Hello Chris,

Thanks for your interest in this issue. I set up a local installation of 2.5.2 and I am actually running into the same issue.
I will work on a jira ticket tomorrow.

Best
—
Armel Bourgon

> On Oct 15, 2015, at 3:41 PM, Chris Nauroth <cn...@hortonworks.com> wrote:
> 
> Hello Armel,
> 
> That's an interesting find.  Thank you for reporting it.
> 
> I know you mentioned an upgrade from 0.20 to 2.6.  Do you also have the
> ability to test against an earlier 2.x release, like 2.5.2?  If the
> problem repros in 2.6.0, but not in 2.5.2, then I wonder if it's a
> regression introduced by the client-side classloader isolation that we
> shipped in 2.6.0.
> 
> https://issues.apache.org/jira/browse/HADOOP-10893
> 
> 
> If you're interested in looking at the code, the most relevant piece is
> the RunJar class, which is the main entry point of the "hadoop jar"
> command.
> 
> If you have a simplified consistent repro that demonstrates the problem,
> then I suggest filing a JIRA for further investigation.  Thanks again.
> 
> --Chris Nauroth
> 
> 
> 
> 
> On 10/15/15, 2:40 PM, "Bourgon, Armel" <Ar...@citygridmedia.com>
> wrote:
> 
>> Hello,
>> 
>> To give you a bit of context, I wrote a java library that aims to provide
>> an easy way to coordinate multiple MR jobs and execute them with a single
>> jar submission. The final result is a "fat jar² (build using the maven
>> assembly plugin) that contains the different Mapper and Reducer classes
>> and a Main class that has the logic to submit the different jobs to the
>> cluster.
>> 
>> To accomplish this, the Main relies on some text files (packaged in the
>> jar) to be present. Those files are not needed by the MR jobs themselves,
>> it¹s some kind of configuration for the Main to know how it should
>> schedule the different MR jobs.
>> 
>> The jar is executed like that:
>> hadoop jar the_jar_file.jar <args>
>> 
>> It has been used in production for a long time now but recently we
>> decided to upgrade to hadoop 2.6 (we were using 0.20). All our jobs
>> packaged like that are failing because the Main cannot locate the text
>> files in the classpath.
>> 
>> I did a bit of debugging by replacing the Main with a piece of code that
>> print the content of the classpath. When running the jar with:
>> java -jar the_jar_file.jar <args>
>> 
>> I can see the text files in the list. But when I run the same jar with:
>> hadoop jar the_jar_file.jar <args>
>> 
>> The text files are missing. I assume that something changed in the way
>> the hadoop jar command read the jar and build the classpath. I found
>> someone complaining about the same issue on stakoverflow
>> (http://stackoverflow.com/questions/31670390/accessing-jar-resource-when-r
>> un-in-hadoop) but nobody replied.
>> 
>> I would like to be able to keep the same mechanism (keep those conf files
>> in the jar and access them at runtime from the classpath), maybe their is
>> an options to alter the way the jar command behave? Can someone point me
>> to the source code of the jar command?
>> 
>> Thanks!
>> 
> 


Re: hadoop 2.6 jar command doesn't load text files in classpath

Posted by "Bourgon, Armel" <Ar...@citygridmedia.com>.
Hello Chris,

Thanks for your interest in this issue. I set up a local installation of 2.5.2 and I am actually running into the same issue.
I will work on a jira ticket tomorrow.

Best
—
Armel Bourgon

> On Oct 15, 2015, at 3:41 PM, Chris Nauroth <cn...@hortonworks.com> wrote:
> 
> Hello Armel,
> 
> That's an interesting find.  Thank you for reporting it.
> 
> I know you mentioned an upgrade from 0.20 to 2.6.  Do you also have the
> ability to test against an earlier 2.x release, like 2.5.2?  If the
> problem repros in 2.6.0, but not in 2.5.2, then I wonder if it's a
> regression introduced by the client-side classloader isolation that we
> shipped in 2.6.0.
> 
> https://issues.apache.org/jira/browse/HADOOP-10893
> 
> 
> If you're interested in looking at the code, the most relevant piece is
> the RunJar class, which is the main entry point of the "hadoop jar"
> command.
> 
> If you have a simplified consistent repro that demonstrates the problem,
> then I suggest filing a JIRA for further investigation.  Thanks again.
> 
> --Chris Nauroth
> 
> 
> 
> 
> On 10/15/15, 2:40 PM, "Bourgon, Armel" <Ar...@citygridmedia.com>
> wrote:
> 
>> Hello,
>> 
>> To give you a bit of context, I wrote a java library that aims to provide
>> an easy way to coordinate multiple MR jobs and execute them with a single
>> jar submission. The final result is a "fat jar² (build using the maven
>> assembly plugin) that contains the different Mapper and Reducer classes
>> and a Main class that has the logic to submit the different jobs to the
>> cluster.
>> 
>> To accomplish this, the Main relies on some text files (packaged in the
>> jar) to be present. Those files are not needed by the MR jobs themselves,
>> it¹s some kind of configuration for the Main to know how it should
>> schedule the different MR jobs.
>> 
>> The jar is executed like that:
>> hadoop jar the_jar_file.jar <args>
>> 
>> It has been used in production for a long time now but recently we
>> decided to upgrade to hadoop 2.6 (we were using 0.20). All our jobs
>> packaged like that are failing because the Main cannot locate the text
>> files in the classpath.
>> 
>> I did a bit of debugging by replacing the Main with a piece of code that
>> print the content of the classpath. When running the jar with:
>> java -jar the_jar_file.jar <args>
>> 
>> I can see the text files in the list. But when I run the same jar with:
>> hadoop jar the_jar_file.jar <args>
>> 
>> The text files are missing. I assume that something changed in the way
>> the hadoop jar command read the jar and build the classpath. I found
>> someone complaining about the same issue on stakoverflow
>> (http://stackoverflow.com/questions/31670390/accessing-jar-resource-when-r
>> un-in-hadoop) but nobody replied.
>> 
>> I would like to be able to keep the same mechanism (keep those conf files
>> in the jar and access them at runtime from the classpath), maybe their is
>> an options to alter the way the jar command behave? Can someone point me
>> to the source code of the jar command?
>> 
>> Thanks!
>> 
> 


Re: hadoop 2.6 jar command doesn't load text files in classpath

Posted by "Bourgon, Armel" <Ar...@citygridmedia.com>.
Hello Chris,

Thanks for your interest in this issue. I set up a local installation of 2.5.2 and I am actually running into the same issue.
I will work on a jira ticket tomorrow.

Best
—
Armel Bourgon

> On Oct 15, 2015, at 3:41 PM, Chris Nauroth <cn...@hortonworks.com> wrote:
> 
> Hello Armel,
> 
> That's an interesting find.  Thank you for reporting it.
> 
> I know you mentioned an upgrade from 0.20 to 2.6.  Do you also have the
> ability to test against an earlier 2.x release, like 2.5.2?  If the
> problem repros in 2.6.0, but not in 2.5.2, then I wonder if it's a
> regression introduced by the client-side classloader isolation that we
> shipped in 2.6.0.
> 
> https://issues.apache.org/jira/browse/HADOOP-10893
> 
> 
> If you're interested in looking at the code, the most relevant piece is
> the RunJar class, which is the main entry point of the "hadoop jar"
> command.
> 
> If you have a simplified consistent repro that demonstrates the problem,
> then I suggest filing a JIRA for further investigation.  Thanks again.
> 
> --Chris Nauroth
> 
> 
> 
> 
> On 10/15/15, 2:40 PM, "Bourgon, Armel" <Ar...@citygridmedia.com>
> wrote:
> 
>> Hello,
>> 
>> To give you a bit of context, I wrote a java library that aims to provide
>> an easy way to coordinate multiple MR jobs and execute them with a single
>> jar submission. The final result is a "fat jar² (build using the maven
>> assembly plugin) that contains the different Mapper and Reducer classes
>> and a Main class that has the logic to submit the different jobs to the
>> cluster.
>> 
>> To accomplish this, the Main relies on some text files (packaged in the
>> jar) to be present. Those files are not needed by the MR jobs themselves,
>> it¹s some kind of configuration for the Main to know how it should
>> schedule the different MR jobs.
>> 
>> The jar is executed like that:
>> hadoop jar the_jar_file.jar <args>
>> 
>> It has been used in production for a long time now but recently we
>> decided to upgrade to hadoop 2.6 (we were using 0.20). All our jobs
>> packaged like that are failing because the Main cannot locate the text
>> files in the classpath.
>> 
>> I did a bit of debugging by replacing the Main with a piece of code that
>> print the content of the classpath. When running the jar with:
>> java -jar the_jar_file.jar <args>
>> 
>> I can see the text files in the list. But when I run the same jar with:
>> hadoop jar the_jar_file.jar <args>
>> 
>> The text files are missing. I assume that something changed in the way
>> the hadoop jar command read the jar and build the classpath. I found
>> someone complaining about the same issue on stakoverflow
>> (http://stackoverflow.com/questions/31670390/accessing-jar-resource-when-r
>> un-in-hadoop) but nobody replied.
>> 
>> I would like to be able to keep the same mechanism (keep those conf files
>> in the jar and access them at runtime from the classpath), maybe their is
>> an options to alter the way the jar command behave? Can someone point me
>> to the source code of the jar command?
>> 
>> Thanks!
>> 
> 


Re: hadoop 2.6 jar command doesn't load text files in classpath

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hello Armel,

That's an interesting find.  Thank you for reporting it.

I know you mentioned an upgrade from 0.20 to 2.6.  Do you also have the
ability to test against an earlier 2.x release, like 2.5.2?  If the
problem repros in 2.6.0, but not in 2.5.2, then I wonder if it's a
regression introduced by the client-side classloader isolation that we
shipped in 2.6.0.

https://issues.apache.org/jira/browse/HADOOP-10893


If you're interested in looking at the code, the most relevant piece is
the RunJar class, which is the main entry point of the "hadoop jar"
command.

If you have a simplified consistent repro that demonstrates the problem,
then I suggest filing a JIRA for further investigation.  Thanks again.

--Chris Nauroth




On 10/15/15, 2:40 PM, "Bourgon, Armel" <Ar...@citygridmedia.com>
wrote:

>Hello,
>
>To give you a bit of context, I wrote a java library that aims to provide
>an easy way to coordinate multiple MR jobs and execute them with a single
>jar submission. The final result is a "fat jar² (build using the maven
>assembly plugin) that contains the different Mapper and Reducer classes
>and a Main class that has the logic to submit the different jobs to the
>cluster.
>
>To accomplish this, the Main relies on some text files (packaged in the
>jar) to be present. Those files are not needed by the MR jobs themselves,
>it¹s some kind of configuration for the Main to know how it should
>schedule the different MR jobs.
>
>The jar is executed like that:
>hadoop jar the_jar_file.jar <args>
>
>It has been used in production for a long time now but recently we
>decided to upgrade to hadoop 2.6 (we were using 0.20). All our jobs
>packaged like that are failing because the Main cannot locate the text
>files in the classpath.
>
>I did a bit of debugging by replacing the Main with a piece of code that
>print the content of the classpath. When running the jar with:
>java -jar the_jar_file.jar <args>
>
>I can see the text files in the list. But when I run the same jar with:
>hadoop jar the_jar_file.jar <args>
>
>The text files are missing. I assume that something changed in the way
>the hadoop jar command read the jar and build the classpath. I found
>someone complaining about the same issue on stakoverflow
>(http://stackoverflow.com/questions/31670390/accessing-jar-resource-when-r
>un-in-hadoop) but nobody replied.
>
>I would like to be able to keep the same mechanism (keep those conf files
>in the jar and access them at runtime from the classpath), maybe their is
>an options to alter the way the jar command behave? Can someone point me
>to the source code of the jar command?
>
>Thanks!
>


Re: hadoop 2.6 jar command doesn't load text files in classpath

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hello Armel,

That's an interesting find.  Thank you for reporting it.

I know you mentioned an upgrade from 0.20 to 2.6.  Do you also have the
ability to test against an earlier 2.x release, like 2.5.2?  If the
problem repros in 2.6.0, but not in 2.5.2, then I wonder if it's a
regression introduced by the client-side classloader isolation that we
shipped in 2.6.0.

https://issues.apache.org/jira/browse/HADOOP-10893


If you're interested in looking at the code, the most relevant piece is
the RunJar class, which is the main entry point of the "hadoop jar"
command.

If you have a simplified consistent repro that demonstrates the problem,
then I suggest filing a JIRA for further investigation.  Thanks again.

--Chris Nauroth




On 10/15/15, 2:40 PM, "Bourgon, Armel" <Ar...@citygridmedia.com>
wrote:

>Hello,
>
>To give you a bit of context, I wrote a java library that aims to provide
>an easy way to coordinate multiple MR jobs and execute them with a single
>jar submission. The final result is a "fat jar² (build using the maven
>assembly plugin) that contains the different Mapper and Reducer classes
>and a Main class that has the logic to submit the different jobs to the
>cluster.
>
>To accomplish this, the Main relies on some text files (packaged in the
>jar) to be present. Those files are not needed by the MR jobs themselves,
>it¹s some kind of configuration for the Main to know how it should
>schedule the different MR jobs.
>
>The jar is executed like that:
>hadoop jar the_jar_file.jar <args>
>
>It has been used in production for a long time now but recently we
>decided to upgrade to hadoop 2.6 (we were using 0.20). All our jobs
>packaged like that are failing because the Main cannot locate the text
>files in the classpath.
>
>I did a bit of debugging by replacing the Main with a piece of code that
>print the content of the classpath. When running the jar with:
>java -jar the_jar_file.jar <args>
>
>I can see the text files in the list. But when I run the same jar with:
>hadoop jar the_jar_file.jar <args>
>
>The text files are missing. I assume that something changed in the way
>the hadoop jar command read the jar and build the classpath. I found
>someone complaining about the same issue on stakoverflow
>(http://stackoverflow.com/questions/31670390/accessing-jar-resource-when-r
>un-in-hadoop) but nobody replied.
>
>I would like to be able to keep the same mechanism (keep those conf files
>in the jar and access them at runtime from the classpath), maybe their is
>an options to alter the way the jar command behave? Can someone point me
>to the source code of the jar command?
>
>Thanks!
>


Re: hadoop 2.6 jar command doesn't load text files in classpath

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hello Armel,

That's an interesting find.  Thank you for reporting it.

I know you mentioned an upgrade from 0.20 to 2.6.  Do you also have the
ability to test against an earlier 2.x release, like 2.5.2?  If the
problem repros in 2.6.0, but not in 2.5.2, then I wonder if it's a
regression introduced by the client-side classloader isolation that we
shipped in 2.6.0.

https://issues.apache.org/jira/browse/HADOOP-10893


If you're interested in looking at the code, the most relevant piece is
the RunJar class, which is the main entry point of the "hadoop jar"
command.

If you have a simplified consistent repro that demonstrates the problem,
then I suggest filing a JIRA for further investigation.  Thanks again.

--Chris Nauroth




On 10/15/15, 2:40 PM, "Bourgon, Armel" <Ar...@citygridmedia.com>
wrote:

>Hello,
>
>To give you a bit of context, I wrote a java library that aims to provide
>an easy way to coordinate multiple MR jobs and execute them with a single
>jar submission. The final result is a "fat jar² (build using the maven
>assembly plugin) that contains the different Mapper and Reducer classes
>and a Main class that has the logic to submit the different jobs to the
>cluster.
>
>To accomplish this, the Main relies on some text files (packaged in the
>jar) to be present. Those files are not needed by the MR jobs themselves,
>it¹s some kind of configuration for the Main to know how it should
>schedule the different MR jobs.
>
>The jar is executed like that:
>hadoop jar the_jar_file.jar <args>
>
>It has been used in production for a long time now but recently we
>decided to upgrade to hadoop 2.6 (we were using 0.20). All our jobs
>packaged like that are failing because the Main cannot locate the text
>files in the classpath.
>
>I did a bit of debugging by replacing the Main with a piece of code that
>print the content of the classpath. When running the jar with:
>java -jar the_jar_file.jar <args>
>
>I can see the text files in the list. But when I run the same jar with:
>hadoop jar the_jar_file.jar <args>
>
>The text files are missing. I assume that something changed in the way
>the hadoop jar command read the jar and build the classpath. I found
>someone complaining about the same issue on stakoverflow
>(http://stackoverflow.com/questions/31670390/accessing-jar-resource-when-r
>un-in-hadoop) but nobody replied.
>
>I would like to be able to keep the same mechanism (keep those conf files
>in the jar and access them at runtime from the classpath), maybe their is
>an options to alter the way the jar command behave? Can someone point me
>to the source code of the jar command?
>
>Thanks!
>


Re: hadoop 2.6 jar command doesn't load text files in classpath

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hello Armel,

That's an interesting find.  Thank you for reporting it.

I know you mentioned an upgrade from 0.20 to 2.6.  Do you also have the
ability to test against an earlier 2.x release, like 2.5.2?  If the
problem repros in 2.6.0, but not in 2.5.2, then I wonder if it's a
regression introduced by the client-side classloader isolation that we
shipped in 2.6.0.

https://issues.apache.org/jira/browse/HADOOP-10893


If you're interested in looking at the code, the most relevant piece is
the RunJar class, which is the main entry point of the "hadoop jar"
command.

If you have a simplified consistent repro that demonstrates the problem,
then I suggest filing a JIRA for further investigation.  Thanks again.

--Chris Nauroth




On 10/15/15, 2:40 PM, "Bourgon, Armel" <Ar...@citygridmedia.com>
wrote:

>Hello,
>
>To give you a bit of context, I wrote a java library that aims to provide
>an easy way to coordinate multiple MR jobs and execute them with a single
>jar submission. The final result is a "fat jar² (build using the maven
>assembly plugin) that contains the different Mapper and Reducer classes
>and a Main class that has the logic to submit the different jobs to the
>cluster.
>
>To accomplish this, the Main relies on some text files (packaged in the
>jar) to be present. Those files are not needed by the MR jobs themselves,
>it¹s some kind of configuration for the Main to know how it should
>schedule the different MR jobs.
>
>The jar is executed like that:
>hadoop jar the_jar_file.jar <args>
>
>It has been used in production for a long time now but recently we
>decided to upgrade to hadoop 2.6 (we were using 0.20). All our jobs
>packaged like that are failing because the Main cannot locate the text
>files in the classpath.
>
>I did a bit of debugging by replacing the Main with a piece of code that
>print the content of the classpath. When running the jar with:
>java -jar the_jar_file.jar <args>
>
>I can see the text files in the list. But when I run the same jar with:
>hadoop jar the_jar_file.jar <args>
>
>The text files are missing. I assume that something changed in the way
>the hadoop jar command read the jar and build the classpath. I found
>someone complaining about the same issue on stakoverflow
>(http://stackoverflow.com/questions/31670390/accessing-jar-resource-when-r
>un-in-hadoop) but nobody replied.
>
>I would like to be able to keep the same mechanism (keep those conf files
>in the jar and access them at runtime from the classpath), maybe their is
>an options to alter the way the jar command behave? Can someone point me
>to the source code of the jar command?
>
>Thanks!
>