You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Josh Rosen (JIRA)" <ji...@apache.org> on 2015/05/26 19:15:21 UTC

[jira] [Created] (SPARK-7873) Serializer re-use + Kryo autoReset disabled leads to AraryIndexOutOfBounds exception in sort-shuffle bypassMergeSort path

Josh Rosen created SPARK-7873:
---------------------------------

             Summary: Serializer re-use + Kryo autoReset disabled leads to AraryIndexOutOfBounds exception in sort-shuffle bypassMergeSort path
                 Key: SPARK-7873
                 URL: https://issues.apache.org/jira/browse/SPARK-7873
             Project: Spark
          Issue Type: Bug
          Components: Shuffle, Spark Core
    Affects Versions: 1.4.0
            Reporter: Josh Rosen
            Assignee: Josh Rosen
            Priority: Blocker


This is a somewhat obscure bug, but I think that it will seriously impact KryoSerializer users who use custom registrators which disabled auto-reset.  When auto-reset is disabled, then this breaks things in some of our shuffle paths which actually end up creating multiple OutputStreams from the same shared SerializerInstance (which is unsafe).  To illustrate this, the following test fails in 1.4:

{code}
class KryoSerializerAutoResetDisabledSuite extends FunSuite with SharedSparkContext {
  conf.set("spark.serializer", classOf[KryoSerializer].getName)
  conf.set("spark.kryo.registrator", classOf[RegistratorWithoutAutoReset].getName)

  test("sort-shuffle with bypassMergeSort") {
    val myObject = ("Hello", "World")
    assert(sc.parallelize(Seq.fill(100)(myObject)).repartition(2).collect().toSet === Set(myObject))
  }
}
{code}

This was introduced by a patch which enables serializer re-use in some of the shuffle paths, since constructing new serializer instances is actually pretty costly for KryoSerializer.  We had already fixed another corner-case bug related to this, but missed this one.  From an engineering risk management perspective, we probably should have just reverted the original serializer reuse patch and added a big cross-product-of-configurations-and-shuffle-managers test suite before attempting to fix the defects.

I think that I have a pretty simple fix for this, but we still might want to consider a revert for 1.4 just to be safe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org