You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by Pedro Dusso <pm...@gmail.com> on 2014/05/20 14:27:06 UTC

Problems setting custom class for the Pluggable Sort in MapReduce Next Generation

Hello,

I'm developing a custom map output buffer which uses replacement selection
instead of quicksort. It's available
here<https://bitbucket.org/pmdusso/hadoop-replacement-selection-sort/overview>.
It is based on the new pluggable interface from the JIRA number
2454<https://issues.apache.org/jira/browse/MAPREDUCE-2454>
.

I've been testing it in a single-node installation with success. I
configure the job during its creation like this:

*  conf.set("io.serializations",
"io.serialization.WritableSerializationWithZeroEndingText");*
* conf.set("mapreduce.job.map.output.collector.class",
"pluggable.MapOutputHeapWithMetadataHeap");*

I used to generate a runnable jar and run it normally as java -jar ...  But
now I would like to try it in a multinode cluster (which is working with
normal jobs). I remove this hardcoded configuration and start calling the
jar like:

*hadoop jar jars/wordCount.jar
-Dmapreduce.task.io.sort.mb=16
-Dmapreduce.job.map.output.collector.class=pluggable.*
*MapOutputHeapWithMetadataHeap**
-Dio.serializations=io.serialization.WritableSerializationWithZeroEndingText
/wordcount/words /wordcount/output/out*

But I can't get this to work. I keep getting a ClassNotFoundException:

Error: java.lang.RuntimeException: java.lang.RuntimeException:
java.lang.ClassNotFoundException: Class
pluggable.MapOutputHeapWithMetadataHeap.class not found
 at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1927)
at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:383)
 at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80)
at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException:
Class pluggable.MapOutputHeapWithMetadataHeap.class not found
 at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1895)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1919)
 ... 10 more
Caused by: java.lang.ClassNotFoundException: Class
pluggable.MapOutputHeapWithMetadataHeap.class not found
at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801)
 at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1893)
... 11 more

I have two projects: one for jobs like wordCount, grep, etc. and one where
I'm developing my custom output buffer (the one in the bitbucket linked
above). Because of this, I tried different jar configurations:

   - Project jobs having a *project* dependency in Eclipse. Export runnable
   jar with packaged required libraries and also copied as a folder
   - Project jobs adds a jar generate from custom output buffer project
   - Fat jar generated with mvn in project jobs.


All of those failed. I would appreciate any help, since it seems to have
very few information about this online. If I'm missing some important
information, please let know I will bring it.

Best regards,

Pedro Martins Dusso