You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by "leiwangouc@gmail.com" <le...@gmail.com> on 2014/04/15 06:23:51 UTC
Pig: java.lang.String cannot be cast to org.apache.pig.data.DataBag in specified map task
Hi,
I am using cloudera and run mapreduce job written with pig latin, I met the following exception in a map task:
014-04-15 11:30:39,532 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.pig.data.DataBag
at org.apache.pig.builtin.Distinct.getDistinctFromNestedBags(Distinct.java:140)
at org.apache.pig.builtin.Distinct.access$100(Distinct.java:39)
at org.apache.pig.builtin.Distinct$Intermediate.exec(Distinct.java:101)
at org.apache.pig.builtin.Distinct$Intermediate.exec(Distinct.java:94)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:337)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:376)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:354)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:308)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:263)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.runPipeline(PODemux.java:220)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.getNext(PODemux.java:210)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:185)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:163)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1477)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1587)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1199)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:609)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:675)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
By looking up the staketrace i think the exception is throw here:
http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.pig/pig/0.11.0-cdh4.3.1/org/apache/pig/builtin/Distinct.java line 140
However, the second retry of this map task succeed. They are using exactly the same data and same code. This really confuse me.
Any insight about this?
Thanks,
Lei
leiwangouc@gmail.com