You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by chenghao-intel <gi...@git.apache.org> on 2015/09/07 02:49:53 UTC

[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

GitHub user chenghao-intel opened a pull request:

    https://github.com/apache/spark/pull/8635

    [SPARK-10466][SQL] UnsafeRow SerDe exception with data spill

    Data Spill with UnsafeRow causes assert failure.
    
    ```
    java.lang.AssertionError: assertion failed
    	at scala.Predef$.assert(Predef.scala:165)
    	at org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$2.writeKey(UnsafeRowSerializer.scala:75)
    	at org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:180)
    	at org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$2$$anonfun$apply$1.apply(ExternalSorter.scala:688)
    	at org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$2$$anonfun$apply$1.apply(ExternalSorter.scala:687)
    	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
    	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    	at org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$2.apply(ExternalSorter.scala:687)
    	at org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$2.apply(ExternalSorter.scala:683)
    	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
    	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    	at org.apache.spark.util.collection.ExternalSorter.writePartitionedFile(ExternalSorter.scala:683)
    	at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:80)
    	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
    	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
    	at org.apache.spark.scheduler.Task.run(Task.scala:88)
    	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/chenghao-intel/spark unsafe_spill

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/8635.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #8635
    
----
commit 684cdb6f92b973ab872996e1f434324f3160b4be
Author: Cheng Hao <ha...@intel.com>
Date:   2015-09-07T00:39:47Z

    UnsafeRow SerDe exception with data spill

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-139089245
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-138883641
  
    @andrewor14 seems very difficult to have a very simple unit test, as `ExternalSorter` have to work with lots of other components, hence I added some mock stuff.
    
    Those mock stuff should be helpful, as I found some other interesting bug, and I can continue to fix it once this merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-138269088
  
      [Test build #42090 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42090/consoleFull) for   PR 8635 at commit [`229ce8a`](https://github.com/apache/spark/commit/229ce8a08223a46d992c7bc6df4880c9595d35e8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-138884475
  
      [Test build #42202 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42202/consoleFull) for   PR 8635 at commit [`7f09a62`](https://github.com/apache/spark/commit/7f09a62c7ce9a4ed3f607831ebec72ef4c799c85).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-139235293
  
      [Test build #42263 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42263/console) for   PR 8635 at commit [`e8b27b5`](https://github.com/apache/spark/commit/e8b27b5515251c856b7711c0c253e3b92e354fab).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-138757310
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-138759152
  
    Thank you @andrewor14, I agree, it's too tricky with unit test like that, i will follow your idea to re-write the unit test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-139235426
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42263/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-139203696
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8635#discussion_r39002189
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/MiniSparkSQLClusterSuite.scala ---
    @@ -0,0 +1,68 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql
    +
    +import org.apache.spark.{SparkFunSuite, SparkContext, SparkConf}
    +
    +class MiniSparkSQLClusterSuite extends SparkFunSuite {
    +  /**
    +   * Create a spark context with specified configuration.
    +   */
    +  protected def withSparkConf(sparkConfs: (String, String)*)(f: (SQLContext) => Unit): Unit = {
    +    val (keys, values) = sparkConfs.unzip
    +    val conf = new SparkConf()
    +      .setMaster("local[1]")
    +      .setAppName("testing")
    +
    +    (keys, values).zipped.foreach(conf.set)
    +    val sc = new SparkContext(conf)
    +
    +    try f(createSQLContext(sc)) finally {
    +      sc.stop()
    +    }
    +  }
    +
    +  /**
    +   * Sets all SQL configurations specified in `pairs`, and then calls `f`
    +   */
    +  protected def withSQLConf(pairs: (String, String)*)
    +     (f: SQLContext => Unit)
    +     (sqlContext: SQLContext): Unit = {
    +    val (keys, values) = pairs.unzip
    +
    +    (keys, values).zipped.foreach(sqlContext.conf.setConfString)
    +
    +    f(sqlContext)
    +  }
    +
    +  protected def createSQLContext(sc: SparkContext): SQLContext = {
    +    new SQLContext(sc)
    +  }
    +
    +  test("SPARK-10466 mapside external sorting for UnsafeRow") {
    +    withSparkConf(
    +      ("spark.shuffle.sort.bypassMergeThreshold", "1"),
    +      ("spark.shuffle.memoryFraction", "0.005")) {
    +      withSQLConf(SQLConf.SHUFFLE_PARTITIONS.key -> "2") { sqlContext =>
    +        import sqlContext.implicits._
    +        sqlContext.sparkContext.parallelize(1 to 10000000, 5) // hopefully big enough for data spill
    --- End diff --
    
    also, you probably don't need to explicitly set the SQL shuffle partitions. Just set the `bypassMergeThreshold` to 0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-139174505
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42241/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-138882920
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-138648410
  
    @chenghao-intel Can you add a unit test?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8635#discussion_r39234361
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/UnsafeRowSerializerSuite.scala ---
    @@ -40,9 +44,15 @@ class ClosableByteArrayInputStream(buf: Array[Byte]) extends ByteArrayInputStrea
     class UnsafeRowSerializerSuite extends SparkFunSuite {
     
       private def toUnsafeRow(row: Row, schema: Array[DataType]): UnsafeRow = {
    -    val internalRow = CatalystTypeConverters.convertToCatalyst(row).asInstanceOf[InternalRow]
    +    val converter = unsafeRowConverter(schema)
    +    converter(row)
    +  }
    +
    +  private def unsafeRowConverter(schema: Array[DataType]): Row => UnsafeRow = {
    --- End diff --
    
    I mean we can just inline it in `toUnsafeRow`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-139125161
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-138957889
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8635#discussion_r39002021
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/UnsafeRowSerializer.scala ---
    @@ -143,7 +143,7 @@ private class UnsafeRowSerializerInstance(numFields: Int) extends SerializerInst
           override def readKey[T: ClassTag](): T = {
             // We skipped serialization of the key in writeKey(), so just return a dummy value since
             // this is going to be discarded anyways.
    -        null.asInstanceOf[T]
    +        (-1).asInstanceOf[T]
    --- End diff --
    
    then of course we need to relax the assertion in `writeKey` and add a comment that explains why we need to accept `key == null` there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8635#discussion_r39001955
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/MiniSparkSQLClusterSuite.scala ---
    @@ -0,0 +1,68 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql
    +
    +import org.apache.spark.{SparkFunSuite, SparkContext, SparkConf}
    +
    +class MiniSparkSQLClusterSuite extends SparkFunSuite {
    --- End diff --
    
    Ideally, we don't necessary to create a special unit test for the bug fixing, however, there are some other issues, which probably requires re-creating the SparkContext with different SparkConf.
    For example: https://issues.apache.org/jira/browse/SPARK-10474


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8635#discussion_r39237533
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/UnsafeRowSerializerSuite.scala ---
    @@ -40,9 +44,15 @@ class ClosableByteArrayInputStream(buf: Array[Byte]) extends ByteArrayInputStrea
     class UnsafeRowSerializerSuite extends SparkFunSuite {
     
       private def toUnsafeRow(row: Row, schema: Array[DataType]): UnsafeRow = {
    -    val internalRow = CatalystTypeConverters.convertToCatalyst(row).asInstanceOf[InternalRow]
    +    val converter = unsafeRowConverter(schema)
    +    converter(row)
    +  }
    +
    +  private def unsafeRowConverter(schema: Array[DataType]): Row => UnsafeRow = {
    --- End diff --
    
    Yes, I got your mean, if we inline that in `toUnsafeRow`, then for every call of `toUnsafeRow`, we will get a new instance of `Converter` according to the schema, this is actually very expensive, as it's codegen internally for creating the `converter` instance.
    
    Probably we'd better to remove the function `toUnsafeRow` in the future, since it's always cause performance problem, and people even not notice that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-139102349
  
      [Test build #42235 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42235/consoleFull) for   PR 8635 at commit [`68ff3d3`](https://github.com/apache/spark/commit/68ff3d38a39753c9e45b9222d4c32c541030f19c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-139119721
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-138758708
  
    @chenghao-intel thanks for adding the test. When I posted the code reproduction it wasn't meant as unit test code, but for those following this issue to reproduce it. Given that we understand the root cause of this issue I would prefer to have a finer-grained test that doesn't rely on thresholds.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-138166182
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-138290017
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-139330795
  
    The latest commit actually already passed tests:
    ```
    Test build #42236 has finished for PR 8635 at commit e8b27b5.
    This patch passes all tests.
    ```
    I'm merging this into master 1.5. Thanks @chenghao-intel.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-139204109
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8635#discussion_r39002129
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/MiniSparkSQLClusterSuite.scala ---
    @@ -0,0 +1,68 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql
    +
    +import org.apache.spark.{SparkFunSuite, SparkContext, SparkConf}
    +
    +class MiniSparkSQLClusterSuite extends SparkFunSuite {
    +  /**
    +   * Create a spark context with specified configuration.
    +   */
    +  protected def withSparkConf(sparkConfs: (String, String)*)(f: (SQLContext) => Unit): Unit = {
    +    val (keys, values) = sparkConfs.unzip
    +    val conf = new SparkConf()
    +      .setMaster("local[1]")
    +      .setAppName("testing")
    +
    +    (keys, values).zipped.foreach(conf.set)
    +    val sc = new SparkContext(conf)
    +
    +    try f(createSQLContext(sc)) finally {
    +      sc.stop()
    +    }
    +  }
    +
    +  /**
    +   * Sets all SQL configurations specified in `pairs`, and then calls `f`
    +   */
    +  protected def withSQLConf(pairs: (String, String)*)
    +     (f: SQLContext => Unit)
    +     (sqlContext: SQLContext): Unit = {
    +    val (keys, values) = pairs.unzip
    +
    +    (keys, values).zipped.foreach(sqlContext.conf.setConfString)
    +
    +    f(sqlContext)
    +  }
    +
    +  protected def createSQLContext(sc: SparkContext): SQLContext = {
    +    new SQLContext(sc)
    +  }
    +
    +  test("SPARK-10466 mapside external sorting for UnsafeRow") {
    +    withSparkConf(
    +      ("spark.shuffle.sort.bypassMergeThreshold", "1"),
    +      ("spark.shuffle.memoryFraction", "0.005")) {
    +      withSQLConf(SQLConf.SHUFFLE_PARTITIONS.key -> "2") { sqlContext =>
    +        import sqlContext.implicits._
    +        sqlContext.sparkContext.parallelize(1 to 10000000, 5) // hopefully big enough for data spill
    --- End diff --
    
    This test is rather brittle. Can you write a more fine-grained test for `ExternalSorter`, one that asserts `numSpills > 0` to make sure it's actually covering the merge code path? I would add it in `ExternalSorterSuite` itself instead of creating a whole new file that doesn't add much value.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-139126346
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-138758196
  
      [Test build #42178 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42178/consoleFull) for   PR 8635 at commit [`c47c53c`](https://github.com/apache/spark/commit/c47c53cdf097a421364f56bd26dec75a9d2d0e33).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-139089246
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42227/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8635#discussion_r38991612
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/UnsafeRowSerializer.scala ---
    @@ -72,7 +72,6 @@ private class UnsafeRowSerializerInstance(numFields: Int) extends SerializerInst
         override def writeKey[T: ClassTag](key: T): SerializationStream = {
           // The key is only needed on the map side when computing partition ids. It does not need to
           // be shuffled.
    -      assert(key.isInstanceOf[Int])
    --- End diff --
    
    yeah I like that better. We can't have a partition ID of `-1`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-138289919
  
      [Test build #42090 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42090/console) for   PR 8635 at commit [`229ce8a`](https://github.com/apache/spark/commit/229ce8a08223a46d992c7bc6df4880c9595d35e8).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-138776998
  
      [Test build #42178 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42178/console) for   PR 8635 at commit [`c47c53c`](https://github.com/apache/spark/commit/c47c53cdf097a421364f56bd26dec75a9d2d0e33).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-138883026
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-138777054
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-139204124
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-139093387
  
      [Test build #1734 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1734/consoleFull) for   PR 8635 at commit [`b8dd7eb`](https://github.com/apache/spark/commit/b8dd7eb765df0e48991eb231a4898a513100f133).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-138145967
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8635#discussion_r39193337
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/UnsafeRowSerializerSuite.scala ---
    @@ -40,9 +44,15 @@ class ClosableByteArrayInputStream(buf: Array[Byte]) extends ByteArrayInputStrea
     class UnsafeRowSerializerSuite extends SparkFunSuite {
     
       private def toUnsafeRow(row: Row, schema: Array[DataType]): UnsafeRow = {
    -    val internalRow = CatalystTypeConverters.convertToCatalyst(row).asInstanceOf[InternalRow]
    +    val converter = unsafeRowConverter(schema)
    +    converter(row)
    +  }
    +
    +  private def unsafeRowConverter(schema: Array[DataType]): Row => UnsafeRow = {
    --- End diff --
    
    this method seems strictly unnecessary... we can just remove it in the future.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-139126255
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-139084523
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/8635


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8635#discussion_r38828491
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/UnsafeRowSerializer.scala ---
    @@ -72,7 +72,6 @@ private class UnsafeRowSerializerInstance(numFields: Int) extends SerializerInst
         override def writeKey[T: ClassTag](key: T): SerializationStream = {
           // The key is only needed on the map side when computing partition ids. It does not need to
           // be shuffled.
    -      assert(key.isInstanceOf[Int])
    --- End diff --
    
    `key` is possible null, see https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/UnsafeRowSerializer.scala#L146
    
    This will happens with external sorting (with data spill).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-139174504
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-139103619
  
      [Test build #42236 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42236/consoleFull) for   PR 8635 at commit [`e8b27b5`](https://github.com/apache/spark/commit/e8b27b5515251c856b7711c0c253e3b92e354fab).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-138268307
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-139127407
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-139119693
  
      [Test build #42235 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42235/console) for   PR 8635 at commit [`68ff3d3`](https://github.com/apache/spark/commit/68ff3d38a39753c9e45b9222d4c32c541030f19c).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-139174454
  
      [Test build #42241 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42241/console) for   PR 8635 at commit [`e8b27b5`](https://github.com/apache/spark/commit/e8b27b5515251c856b7711c0c253e3b92e354fab).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-139084537
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-139101929
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-138910138
  
      [Test build #42202 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42202/console) for   PR 8635 at commit [`7f09a62`](https://github.com/apache/spark/commit/7f09a62c7ce9a4ed3f607831ebec72ef4c799c85).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-138166137
  
      [Test build #42079 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42079/console) for   PR 8635 at commit [`684cdb6`](https://github.com/apache/spark/commit/684cdb6f92b973ab872996e1f434324f3160b4be).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-138268321
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-138290018
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42090/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-139084506
  
    Thank you @andrewor14 your code is much simple, I took it already. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-138995034
  
    @chenghao-intel thanks for taking the time to write the test. However I think it is more complicated than necessary. I was able to add a much smaller test to the existing `UnsafeRowSerializerSuite`. Can you use this one instead?
    ```
    test("SPARK-10466: external sorter spilling with unsafe row serializer") {
      val conf = new SparkConf()
        .set("spark.shuffle.spill.initialMemoryThreshold", "1024")
        .set("spark.shuffle.sort.bypassMergeThreshold", "0")
        .set("spark.shuffle.memoryFraction", "0.0001")
      var sc: SparkContext = null
      var outputFile: File = null
      try {
        sc = new SparkContext("local", "test", conf)
        outputFile = File.createTempFile("test-unsafe-row-serializer-spill", "")
        val data = (1 to 1000).iterator.map { i =>
          val internalRow = CatalystTypeConverters.convertToCatalyst(Row(i)).asInstanceOf[InternalRow]
          val unsafeRow = UnsafeProjection.create(Array(IntegerType: DataType)).apply(internalRow)
          (i, unsafeRow)
        }
        val ser = new UnsafeRowSerializer(numFields = 2)
        val part = new HashPartitioner(10)
        val sorter = new ExternalSorter[Int, UnsafeRow, UnsafeRow](
          partitioner = Some(part), serializer = Some(ser))
    
        // Ensure we spilled something and have to merge them later
        assert(sorter.numSpills === 0)
        sorter.insertAll(data)
        assert(sorter.numSpills > 0)
    
        // Merging spilled files should not throw assertion error
        val taskContext = new TaskContextImpl(0, 0, 0, 0, null, null, InternalAccumulator.create(sc))
        taskContext.taskMetrics.shuffleWriteMetrics = Some(new ShuffleWriteMetrics)
        sorter.writePartitionedFile(ShuffleBlockId(0, 0, 0), taskContext, outputFile)
    
      } finally {
        // Clean up
        if (sc != null) {
          sc.stop()
        }
        if (outputFile != null) {
          outputFile.delete()
        }
      }
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-138166185
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42079/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-139113390
  
      [Test build #1733 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1733/console) for   PR 8635 at commit [`b8dd7eb`](https://github.com/apache/spark/commit/b8dd7eb765df0e48991eb231a4898a513100f133).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class BlockFetchException(messages: String, throwable: Throwable)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-138146310
  
    cc @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-138145857
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-139235425
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-138757367
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-139111031
  
      [Test build #1734 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1734/console) for   PR 8635 at commit [`b8dd7eb`](https://github.com/apache/spark/commit/b8dd7eb765df0e48991eb231a4898a513100f133).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-139127349
  
      [Test build #42241 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42241/consoleFull) for   PR 8635 at commit [`e8b27b5`](https://github.com/apache/spark/commit/e8b27b5515251c856b7711c0c253e3b92e354fab).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-138742064
  
    yes, that's more simple for unit test, I will steal it. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8635#discussion_r38990109
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/UnsafeRowSerializer.scala ---
    @@ -72,7 +72,6 @@ private class UnsafeRowSerializerInstance(numFields: Int) extends SerializerInst
         override def writeKey[T: ClassTag](key: T): SerializationStream = {
           // The key is only needed on the map side when computing partition ids. It does not need to
           // be shuffled.
    -      assert(key.isInstanceOf[Int])
    --- End diff --
    
    How about change the dummy value to a number instead of `null`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-138740128
  
    By the way, I was able to come up with a smaller reproduction:
    ```
    bin/spark-shell --master local
      --conf spark.shuffle.memoryFraction=0.005
      --conf spark.shuffle.sort.bypassMergeThreshold=0
    
    sc.parallelize(1 to 2 * 1000 * 1000, 10)
      .map { i => (i, i) }.toDF("a", "b").groupBy("b").avg().count()
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8635#discussion_r39233455
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/UnsafeRowSerializerSuite.scala ---
    @@ -40,9 +44,15 @@ class ClosableByteArrayInputStream(buf: Array[Byte]) extends ByteArrayInputStrea
     class UnsafeRowSerializerSuite extends SparkFunSuite {
     
       private def toUnsafeRow(row: Row, schema: Array[DataType]): UnsafeRow = {
    -    val internalRow = CatalystTypeConverters.convertToCatalyst(row).asInstanceOf[InternalRow]
    +    val converter = unsafeRowConverter(schema)
    +    converter(row)
    +  }
    +
    +  private def unsafeRowConverter(schema: Array[DataType]): Row => UnsafeRow = {
    --- End diff --
    
    Actually `UnsafeProjection.create(schema)` will do the codegen stuff, and this causes long time if we have to generate the large mount of `UnsafeRow`s.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-139103058
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-139127409
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42236/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8635#discussion_r39031470
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/MiniSparkSQLClusterSuite.scala ---
    @@ -0,0 +1,168 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql
    +
    +import java.io.File
    +import java.util.UUID
    +
    +import org.apache.spark.executor.{ShuffleWriteMetrics, TaskMetrics}
    +import org.apache.spark.serializer.SerializerInstance
    +import org.apache.spark.sql.catalyst.{InternalRow, CatalystTypeConverters}
    +import org.apache.spark.sql.catalyst.expressions.{UnsafeProjection, UnsafeRow}
    +import org.apache.spark.sql.execution.UnsafeRowSerializer
    +import org.apache.spark.sql.types.{IntegerType, DataType}
    +import org.apache.spark.storage._
    +import org.apache.spark.util.Utils
    +import org.apache.spark._
    +import org.apache.spark.util.collection.ExternalSorter
    +import org.mockito.Answers.RETURNS_SMART_NULLS
    +import org.mockito.Matchers._
    +import org.mockito.Mockito._
    +import org.mockito.invocation.InvocationOnMock
    +import org.mockito.stubbing.Answer
    +import org.mockito.{MockitoAnnotations, Mock}
    +import org.scalatest.BeforeAndAfterEach
    +
    +import scala.collection.mutable
    +import scala.collection.mutable.ArrayBuffer
    +
    +class MiniSparkSQLClusterSuite extends SparkFunSuite with BeforeAndAfterEach {
    +
    +  @Mock(answer = RETURNS_SMART_NULLS) private var blockManager: BlockManager = _
    +  @Mock(answer = RETURNS_SMART_NULLS) private var diskBlockManager: DiskBlockManager = _
    +  @Mock(answer = RETURNS_SMART_NULLS) private var taskContext: TaskContext = _
    +
    +  private var taskMetrics: TaskMetrics = _
    +  private var shuffleWriteMetrics: ShuffleWriteMetrics = _
    +  private var tempDir: File = _
    +  private var outputFile: File = _
    +  private val temporaryFilesCreated: mutable.Buffer[File] = new ArrayBuffer[File]()
    +  private val blockIdToFileMap: mutable.Map[BlockId, File] = new mutable.HashMap[BlockId, File]
    +  private val shuffleBlockId: ShuffleBlockId = new ShuffleBlockId(0, 0, 0)
    +
    +  override def beforeEach(): Unit = {
    +    tempDir = Utils.createTempDir()
    +    outputFile = File.createTempFile("shuffle", null, tempDir)
    +    shuffleWriteMetrics = new ShuffleWriteMetrics
    +    taskMetrics = new TaskMetrics
    +    taskMetrics.shuffleWriteMetrics = Some(shuffleWriteMetrics)
    +    MockitoAnnotations.initMocks(this)
    +    when(taskContext.taskMetrics()).thenReturn(taskMetrics)
    +    import InternalAccumulator._
    +    when(taskContext.internalMetricsToAccumulators).thenReturn(
    +      Map(
    +        PEAK_EXECUTION_MEMORY ->
    +        new Accumulator(
    +          0L, AccumulatorParam.LongAccumulatorParam, Some(PEAK_EXECUTION_MEMORY), internal = true)))
    +    when(blockManager.diskBlockManager).thenReturn(diskBlockManager)
    +    when(blockManager.getDiskWriter(
    +      any[BlockId],
    +      any[File],
    +      any[SerializerInstance],
    +      anyInt(),
    +      any[ShuffleWriteMetrics]
    +    )).thenAnswer(new Answer[DiskBlockObjectWriter] {
    +      override def answer(invocation: InvocationOnMock): DiskBlockObjectWriter = {
    +        val args = invocation.getArguments
    +        new DiskBlockObjectWriter(
    +          args(0).asInstanceOf[BlockId],
    +          args(1).asInstanceOf[File],
    +          args(2).asInstanceOf[SerializerInstance],
    +          args(3).asInstanceOf[Int],
    +          compressStream = identity,
    +          syncWrites = false,
    +          args(4).asInstanceOf[ShuffleWriteMetrics]
    +        )
    +      }
    +    })
    +    when(diskBlockManager.createTempShuffleBlock()).thenAnswer(
    +      new Answer[(TempShuffleBlockId, File)] {
    +        override def answer(invocation: InvocationOnMock): (TempShuffleBlockId, File) = {
    +          val blockId = new TempShuffleBlockId(UUID.randomUUID)
    +          val file = File.createTempFile(blockId.toString, null, tempDir)
    +          blockIdToFileMap.put(blockId, file)
    +          temporaryFilesCreated.append(file)
    +          (blockId, file)
    +        }
    +      })
    +    when(diskBlockManager.getFile(any[BlockId])).thenAnswer(
    +      new Answer[File] {
    +        override def answer(invocation: InvocationOnMock): File = {
    +          blockIdToFileMap.get(invocation.getArguments.head.asInstanceOf[BlockId]).get
    +        }
    +      })
    +  }
    +
    +  override def afterEach(): Unit = {
    +    Utils.deleteRecursively(tempDir)
    +    blockIdToFileMap.clear()
    +    temporaryFilesCreated.clear()
    +  }
    +
    +  private def toUnsafeRow(row: Row, schema: Array[DataType]): UnsafeRow = {
    +    val internalRow = CatalystTypeConverters.convertToCatalyst(row).asInstanceOf[InternalRow]
    +    val converter = UnsafeProjection.create(schema)
    +    converter.apply(internalRow)
    +  }
    +
    +  /**
    +   * Create a spark context with specified configuration, and the calls `f`
    +   */
    +  protected def withSparkConf(sparkConfs: (String, String)*)(f: (SparkContext) => Unit): Unit = {
    +    val (keys, values) = sparkConfs.unzip
    +    val conf = new SparkConf(true)
    +      .setMaster("local[1]")
    +      .setAppName("testing")
    +
    +    var sc: SparkContext = null
    +    val env = SparkEnv.get
    +    Utils.tryWithSafeFinally {
    +      (keys, values).zipped.foreach(conf.set)
    +      sc = new SparkContext(conf)
    +      f(sc)
    +    } {
    +      sc.stop()
    +      SparkEnv.set(env)
    +    }
    +  }
    +
    +  test("SPARK-10466 external sorting for UnsafeRow with spilling") {
    +    withSparkConf(
    +      ("spark.shuffle.sort.bypassMergeThreshold", "0"),
    +      ("spark.shuffle.spill.initialMemoryThreshold", "1024"), // 1k
    +      ("spark.shuffle.memoryFraction", "0.0001")) { sc =>
    +
    +      // create unsafe rows
    +      val count = 10000
    +      val unsafeRowIt = (1 to count).iterator.map { i =>
    +        (i, toUnsafeRow(Row(i, i, i), Array(IntegerType, IntegerType, IntegerType)))
    +      }
    +
    +      val serializer = new UnsafeRowSerializer(3)
    +      val sorter = new ExternalSorter[Int, UnsafeRow, UnsafeRow](
    +        None, None, Some(implicitly[Ordering[Int]]), Some(serializer))
    +      sorter.insertAll(unsafeRowIt)
    +      // Make sure it spilled
    +      assert(sc.env.blockManager.diskBlockManager.getAllFiles().length > 0)
    +
    +      assert(sorter.writePartitionedFile(shuffleBlockId, taskContext, outputFile).sum > 0)
    --- End diff --
    
    Exception will be thrown here if we didn't change the `UnsafeRowSerializer` as above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-138148768
  
      [Test build #42079 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42079/consoleFull) for   PR 8635 at commit [`684cdb6`](https://github.com/apache/spark/commit/684cdb6f92b973ab872996e1f434324f3160b4be).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-139101946
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-139127315
  
      [Test build #42236 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42236/console) for   PR 8635 at commit [`e8b27b5`](https://github.com/apache/spark/commit/e8b27b5515251c856b7711c0c253e3b92e354fab).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-138910313
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42202/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-138910310
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-139119722
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42235/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-139103065
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8635#discussion_r38980598
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/UnsafeRowSerializer.scala ---
    @@ -72,7 +72,6 @@ private class UnsafeRowSerializerInstance(numFields: Int) extends SerializerInst
         override def writeKey[T: ClassTag](key: T): SerializationStream = {
           // The key is only needed on the map side when computing partition ids. It does not need to
           // be shuffled.
    -      assert(key.isInstanceOf[Int])
    --- End diff --
    
    so isn't the correct fix here to assert the following?
    ```
    assert(key == null || key.isInstanceOf[Int])
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-138777055
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42178/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-139093521
  
      [Test build #1733 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1733/consoleFull) for   PR 8635 at commit [`b8dd7eb`](https://github.com/apache/spark/commit/b8dd7eb765df0e48991eb231a4898a513100f133).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8635#discussion_r38983250
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/UnsafeRowSerializer.scala ---
    @@ -72,7 +72,6 @@ private class UnsafeRowSerializerInstance(numFields: Int) extends SerializerInst
         override def writeKey[T: ClassTag](key: T): SerializationStream = {
           // The key is only needed on the map side when computing partition ids. It does not need to
           // be shuffled.
    -      assert(key.isInstanceOf[Int])
    --- End diff --
    
    wouldn't the right thing to do here be to allow nulls as well?
    ```
    assert(key == null || key.isInstanceOf[Int])
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8635#discussion_r39001977
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/UnsafeRowSerializer.scala ---
    @@ -143,7 +143,7 @@ private class UnsafeRowSerializerInstance(numFields: Int) extends SerializerInst
           override def readKey[T: ClassTag](): T = {
             // We skipped serialization of the key in writeKey(), so just return a dummy value since
             // this is going to be discarded anyways.
    -        null.asInstanceOf[T]
    +        (-1).asInstanceOf[T]
    --- End diff --
    
    Actually this looks really weird now and doesn't really match the comment. Would you mind changing it back?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10466][SQL] UnsafeRow SerDe exception w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8635#issuecomment-139204569
  
      [Test build #42263 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42263/consoleFull) for   PR 8635 at commit [`e8b27b5`](https://github.com/apache/spark/commit/e8b27b5515251c856b7711c0c253e3b92e354fab).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org