You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Gabor Feher (JIRA)" <ji...@apache.org> on 2016/01/18 19:27:39 UTC

[jira] [Commented] (SPARK-3630) Identify cause of Kryo+Snappy PARSING_ERROR

    [ https://issues.apache.org/jira/browse/SPARK-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105621#comment-15105621 ] 

Gabor Feher commented on SPARK-3630:
------------------------------------

Hi, I've run into this problem as well.

{code}
 com.esotericsoftware.kryo.KryoException: java.io.IOException: failed to uncompress the chunk: PARSING_ERROR(2)
        at com.esotericsoftware.kryo.io.Input.fill(Input.java:142)
        at com.esotericsoftware.kryo.io.Input.require(Input.java:155)
        at com.esotericsoftware.kryo.io.Input.readInt(Input.java:337)
        at com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:109)
        at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610)
        at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:721)
        at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:228)
        at org.apache.spark.serializer.DeserializationStream.readKey(Serializer.scala:169)
        at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:201)
        at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:198)
        at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
        at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
        at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:197)
        at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:103)
        at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:98)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
...
Caused by: java.io.IOException: failed to uncompress the chunk: PARSING_ERROR(2)
        at org.xerial.snappy.SnappyInputStream.hasNextChunk(SnappyInputStream.java:361)
        at org.xerial.snappy.SnappyInputStream.rawRead(SnappyInputStream.java:158)
        at org.xerial.snappy.SnappyInputStream.read(SnappyInputStream.java:142)
        at com.esotericsoftware.kryo.io.Input.fill(Input.java:140)
        ... 45 more

{code}

Spark version was 1.6.0, official "hadoop free" download, running with yarn client mode on Amazon EMR.
I used these instructions to link in hadoop: https://spark.apache.org/docs/1.6.0/hadoop-provided.html
Hadoop version was 2.6.0

As far as I could see, the problem has shown up only after a few hours of running, "in the middle" of a job.

> Identify cause of Kryo+Snappy PARSING_ERROR
> -------------------------------------------
>
>                 Key: SPARK-3630
>                 URL: https://issues.apache.org/jira/browse/SPARK-3630
>             Project: Spark
>          Issue Type: Task
>          Components: Spark Core
>    Affects Versions: 1.1.0, 1.2.0
>            Reporter: Andrew Ash
>            Assignee: Josh Rosen
>
> A recent GraphX commit caused non-deterministic exceptions in unit tests so it was reverted (see SPARK-3400).
> Separately, [~aash] observed the same exception stacktrace in an application-specific Kryo registrator:
> {noformat}
> com.esotericsoftware.kryo.KryoException: java.io.IOException: failed to uncompress the chunk: PARSING_ERROR(2)
> com.esotericsoftware.kryo.io.Input.fill(Input.java:142) com.esotericsoftware.kryo.io.Input.require(Input.java:169) 
> com.esotericsoftware.kryo.io.Input.readInt(Input.java:325) com.esotericsoftware.kryo.io.Input.readFloat(Input.java:624) 
> com.esotericsoftware.kryo.serializers.DefaultSerializers$FloatSerializer.read(DefaultSerializers.java:127) 
> com.esotericsoftware.kryo.serializers.DefaultSerializers$FloatSerializer.read(DefaultSerializers.java:117) 
> com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:109) 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18) 
> com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
> ...
> {noformat}
> This ticket is to identify the cause of the exception in the GraphX commit so the faulty commit can be fixed and merged back into master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org