You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Viral Bajaria <vi...@gmail.com> on 2013/01/07 12:42:22 UTC

hadoop 0.23.5 -files and -libjars

Hi,

I have been trying to play around with the hadoop jar command in 0.23.5 and
hive 0.9.0 wanted to run a custom mapreduce job using:

hadoop jar <jar> <main-class> -libjars "comma-separated list of files"
-files "comma-separated list of files"

Both libjars and files have the same files specified. The first problem is,
when I used GenericOptionsParser it is setting the
"mapreduce.client.genericoptionsparser.used" to true but it is not
populating the "tmpjars" and "tmpfiles" configuration properties. Not
exactly sure why. Any ideas ?

I then looked at the GenericOptionsParser code in github and pulled out the
relevant pieces into my Driver class and the exact same piece of code adds
the "tmpjars" and "tmpfiles" to the job.xml, why this works while invoking
GenericOptionsParser does not is something that I am not sure.

I can see the -files in the filecache and I printed out the classpath in
the Mapper setup method and I can see it listed as file:/{filepath}. I
checked that path and it exists and is accessible to that user. I did a
getClassByName on a class in that jar but I keep on getting ClassNotFound
exception. Any reason why this would happen ? If the file exists in the
classpath I would assume either Class.forName or
context.GetConfiguration().getClassByName to work, but both don't work.

I am running a single-node cluster for all the experimentation but don't
want to explicitly add all the jars to hadoop-env or yarn-env.

Any help will be appreciated.

Thanks,
Viral