You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Nandor Kollar (JIRA)" <ji...@apache.org> on 2017/03/24 16:51:41 UTC
[jira] [Commented] (PIG-5134) Fix TestAvroStorage unit test in Spark mode

    [ https://issues.apache.org/jira/browse/PIG-5134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15940712#comment-15940712 ] 

Nandor Kollar commented on PIG-5134:
------------------------------------

[~kellyzly] an update: I managed to solve this without using Kryo, but don't really like the solution I came up with. Using Kryo would be better choice I think. In my solution, I implemented readObject and writeObject methods, these methods read/write the Avro schema as well as the data from/to the OutputStream/InputStream. This is done only for AvroTupleWrapper, but I'm afraid we'll have to implement the same logic for the other Avro wrapper classes too. I noticed, that a similar issue related to Spark and Avro compatibility was already resolved: AVRO-1502. It seems that this was only fixed for SpecificRecords but not for GenericRecords, which we use in Pig. [~rohini] do you have any recommendation, which option should we follow?

> Fix TestAvroStorage unit test in Spark mode
> -------------------------------------------
>
>                 Key: PIG-5134
>                 URL: https://issues.apache.org/jira/browse/PIG-5134
>             Project: Pig
>          Issue Type: Bug
>          Components: spark
>            Reporter: liyunzhang_intel
>            Assignee: Nandor Kollar
>             Fix For: spark-branch
>
>         Attachments: PIG-5134.patch
>
>
> It seems that test fails, because Avro GenericData#Record doesn't implement Serializable interface:
> {code}
> 2017-02-23 09:14:41,887 ERROR [main] spark.JobGraphBuilder (JobGraphBuilder.java:sparkOperToRDD(183)) - throw exception in sparkOperToRDD: 
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0 in stage 9.0 (TID 9) had a not serializable result: org.apache.avro.generic.GenericData$Record
> Serialization stack:
> 	- object not serializable (class: org.apache.avro.generic.GenericData$Record, value: {"key": "stuff in closet", "value1": {"thing": "hat", "count": 7}, "value2": {"thing": "coat", "count": 2}})
> 	- field (class: org.apache.pig.impl.util.avro.AvroTupleWrapper, name: avroObject, type: interface org.apache.avro.generic.IndexedRecord)
> 	- object (class org.apache.pig.impl.util.avro.AvroTupleWrapper, org.apache.pig.impl.util.avro.AvroTupleWrapper@3d3a58c1)
> 	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
> 	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
> {code}
> The failing tests is a new test introduced with merging trunk to spark branch, that's why we didn't see this error before.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)