You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Josh Rosen (JIRA)" <ji...@apache.org> on 2015/05/20 22:17:00 UTC
[jira] [Created] (SPARK-7766) KryoSerializerInstance re-use is not
safe when auto-flush is disabled
Josh Rosen created SPARK-7766:
---------------------------------
Summary: KryoSerializerInstance re-use is not safe when auto-flush is disabled
Key: SPARK-7766
URL: https://issues.apache.org/jira/browse/SPARK-7766
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 1.4.0
Reporter: Josh Rosen
Assignee: Josh Rosen
Priority: Blocker
SPARK-3386 modified the shuffle write path to re-use serializer instances across multiple calls to DiskBlockObjectWriter. It turns out that this introduced a very rare bug when using KryoSerializer: if auto-reset is disabled and reference-tracking is enabled, then we'll end up re-using the same serializer instance to write multiple output streams without calling {{reset()}} between write calls, which can lead to cases where objects in one file may contain references to objects that are in previous files, which can cause errors during deserialization.
The fix should be simple: add {{reset}} calls at the end of {{serialize}} and {{serializeStream}}.
Thanks to John Carrino for reporting this issue on GItHub: https://github.com/apache/spark/pull/5606#issuecomment-103995103
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org