You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@crunch.apache.org by Lucy Chen <lu...@gmail.com> on 2015/04/02 23:00:59 UTC

Exception from Set.difference

Hi,

     I am trying to do Set difference as follows:

PCollection<MyClass> C = Set.difference(A, B);


Here both A and B are PCollection<MyClass> type.


MyClass is defined as follows:


public class *MyClass* implements java.io.Serializable, Cloneable{

 private String a;

private String b;

private int c;

private Map<String, Double> d;

private int e;

 public MyClass(){

this(null, null, 0, new HashMap<String, Double>());

}

 public MyClass(String labelID, String sampleID, Integer pos_neg_ind,
HashMap<String, Double> feat_val_pair){

......

        }

        public MyClass(String input){

         .....

         }

         .....

}


      From running the set difference, I got the following error. Was that
because of MyClass including a Map member d? If so, is there another way to
generate the set diff by having these inputs?


      Thanks!


Lucy


java.lang.Exception:
org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: Error while
doing final merge

at
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)

at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)

Caused by: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError:
Error while doing final merge

at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:160)

at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)

at
org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

at java.util.concurrent.FutureTask.run(FutureTask.java:262)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:744)

Caused by: org.apache.avro.AvroRuntimeException: Can't compare maps!

at org.apache.avro.io.BinaryData.compare(BinaryData.java:134)

at org.apache.avro.io.BinaryData.compare(BinaryData.java:139)

at org.apache.avro.io.BinaryData.compare(BinaryData.java:92)

at org.apache.avro.io.BinaryData.compare(BinaryData.java:72)

at
org.apache.avro.mapred.AvroKeyComparator.compare(AvroKeyComparator.java:43)

at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:578)

at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:144)

at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:108)

at
org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:524)

at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:539)

at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:209)

at
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.finalMerge(MergeManagerImpl.java:731)

at
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.close(MergeManagerImpl.java:370)

at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:158)

... 7 more

Re: Exception from Set.difference

Posted by Josh Wills <jw...@cloudera.com>.

Yeah, it looks like Avro doesn't support comparison on map fields:

https://github.com/apache/avro/blob/trunk/lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java

Assuming the value of the map fields matter for comparison purposes, it
seems like your best bet is to serialize the data as a List of pairs or two
Lists with corresponding entries, ensuring that the lists are sorted based
on the key of the map. Not a pretty solution, but it should work.

J

On Thu, Apr 2, 2015 at 2:00 PM, Lucy Chen <lu...@gmail.com>
wrote:

> Hi,
>
>      I am trying to do Set difference as follows:
>
> PCollection<MyClass> C = Set.difference(A, B);
>
>
> Here both A and B are PCollection<MyClass> type.
>
>
> MyClass is defined as follows:
>
>
> public class *MyClass* implements java.io.Serializable, Cloneable{
>
>  private String a;
>
> private String b;
>
> private int c;
>
> private Map<String, Double> d;
>
> private int e;
>
>  public MyClass(){
>
> this(null, null, 0, new HashMap<String, Double>());
>
> }
>
>  public MyClass(String labelID, String sampleID, Integer pos_neg_ind,
> HashMap<String, Double> feat_val_pair){
>
> ......
>
>         }
>
>         public MyClass(String input){
>
>          .....
>
>          }
>
>          .....
>
> }
>
>
>       From running the set difference, I got the following error. Was that
> because of MyClass including a Map member d? If so, is there another way to
> generate the set diff by having these inputs?
>
>
>       Thanks!
>
>
> Lucy
>
>
> java.lang.Exception:
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: Error while
> doing final merge
>
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
>
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
>
> Caused by: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError:
> Error while doing final merge
>
> at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:160)
>
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
>
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
>
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
> at java.lang.Thread.run(Thread.java:744)
>
> Caused by: org.apache.avro.AvroRuntimeException: Can't compare maps!
>
> at org.apache.avro.io.BinaryData.compare(BinaryData.java:134)
>
> at org.apache.avro.io.BinaryData.compare(BinaryData.java:139)
>
> at org.apache.avro.io.BinaryData.compare(BinaryData.java:92)
>
> at org.apache.avro.io.BinaryData.compare(BinaryData.java:72)
>
> at
> org.apache.avro.mapred.AvroKeyComparator.compare(AvroKeyComparator.java:43)
>
> at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:578)
>
> at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:144)
>
> at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:108)
>
> at
> org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:524)
>
> at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:539)
>
> at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:209)
>
> at
> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.finalMerge(MergeManagerImpl.java:731)
>
> at
> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.close(MergeManagerImpl.java:370)
>
> at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:158)
>
> ... 7 more
>
>
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>