You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Mridul Muralidharan (JIRA)" <ji...@apache.org> on 2014/08/28 15:28:08 UTC
[jira] [Commented] (SPARK-3277) LZ4 compression cause the the ExternalSort exception

    [ https://issues.apache.org/jira/browse/SPARK-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14113741#comment-14113741 ] 

Mridul Muralidharan commented on SPARK-3277:
--------------------------------------------

This looks like unrelated changes pushed to BlockObjectWriter as part of introduction of ShuffleWriteMetrics.
I had introducing checks and also documented that we must not infer size based on position of stream after flush - since close can write data to the streams (and one flush can result in more data getting generated which need not be flushed to streams).

Apparently this logic was modified subsequently causing this bug.
Solution would be to revert changes to update shuffleBytesWritten before close of stream.
It must be done after close and based on file.length

> LZ4 compression cause the the ExternalSort exception
> ----------------------------------------------------
>
>                 Key: SPARK-3277
>                 URL: https://issues.apache.org/jira/browse/SPARK-3277
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.0.2
>            Reporter: hzw
>             Fix For: 1.1.0
>
>
> I tested the LZ4 compression,and it come up with such problem.(with wordcount)
> Also I tested the snappy and LZF,and they were OK.
> At last I set the  "spark.shuffle.spill" as false to avoid such exeception, but once open this "switch", this error would come.
> Exeception Info as follow:
> java.lang.AssertionError: assertion failed
>         at scala.Predef$.assert(Predef.scala:165)
>         at org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.<init>(ExternalAppendOnlyMap.scala:416)
>         at org.apache.spark.util.collection.ExternalAppendOnlyMap.spill(ExternalAppendOnlyMap.scala:235)
>         at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:150)
>         at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:58)
>         at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:55)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:54)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>         at java.lang.Thread.run(Thread.java:722)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org