You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by "leiwangouc@gmail.com" <le...@gmail.com> on 2014/04/15 06:23:51 UTC

Pig: java.lang.String cannot be cast to org.apache.pig.data.DataBag in specified map task

Hi, 

   I am using cloudera and  run mapreduce job written with pig latin,  I met the following exception in a map task: 
014-04-15 11:30:39,532 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.pig.data.DataBag
	at org.apache.pig.builtin.Distinct.getDistinctFromNestedBags(Distinct.java:140)
	at org.apache.pig.builtin.Distinct.access$100(Distinct.java:39)
	at org.apache.pig.builtin.Distinct$Intermediate.exec(Distinct.java:101)
	at org.apache.pig.builtin.Distinct$Intermediate.exec(Distinct.java:94)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:337)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:376)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:354)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:308)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:263)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.runPipeline(PODemux.java:220)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.getNext(PODemux.java:210)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:185)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:163)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)
	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
	at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1477)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1587)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1199)
	at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:609)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:675)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
	at org.apache.hadoop.mapred.Child.main(Child.java:262)
By looking up the staketrace i think the exception is throw here:  
http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.pig/pig/0.11.0-cdh4.3.1/org/apache/pig/builtin/Distinct.java  line 140

However,  the second retry of this  map task succeed. They are using exactly the same data and same code. This really confuse me.

Any insight about this?

Thanks,
Lei
 


leiwangouc@gmail.com