You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by rxin <gi...@git.apache.org> on 2015/10/08 20:35:53 UTC

[GitHub] spark pull request: [SPARK-10914] UnsafeRow serialization breaks w...

GitHub user rxin opened a pull request:

    https://github.com/apache/spark/pull/9030

    [SPARK-10914] UnsafeRow serialization breaks when two machines have different Oops size.

    UnsafeRow contains 3 pieces of information when pointing to some data in memory (an object, a base offset, and length). When the row is serialized with Java/Kryo serialization, the object layout in memory can change if two machines have different pointer width (Oops in JVM).
    
    To reproduce, launch Spark using
    
    MASTER=local-cluster[2,1,1024] bin/spark-shell --conf "spark.executor.extraJavaOptions=-XX:-UseCompressedOops"
    
    And then run the following
    
    scala> sql("select 1 xx").collect()


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rxin/spark SPARK-10914

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9030.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9030
    
----
commit 465fc8e18147b9e8cf34e0f5bcbc338d03ad4f95
Author: Reynold Xin <rx...@databricks.com>
Date:   2015-10-08T18:34:14Z

    [SPARK-10914] UnsafeRow serialization breaks when two machines have different Oops size.
    
    The problem is that UnsafeRow contains 3 pieces of information when pointing to some data in memory (an object, a base offset, and length). When the row is serialized with Java/Kryo serialization, the object layout in memory can change if two machines have different pointer width (Oops in JVM).
    
    To reproduce, launch Spark using
    
    MASTER=local-cluster[2,1,1024] bin/spark-shell --conf "spark.executor.extraJavaOptions=-XX:-UseCompressedOops"
    
    And then run the following
    
    scala> sql("select 1 xx").collect()
    
    (cherry picked from commit 157b2a818d3993b1321cc41fb7b30407bd13490b)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10914] UnsafeRow serialization breaks w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9030#issuecomment-146672235
  
      [Test build #1861 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1861/consoleFull) for   PR 9030 at commit [`9b79e6f`](https://github.com/apache/spark/commit/9b79e6f1de4114d7a9a1dc693c079409b6846ffb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10914] UnsafeRow serialization breaks w...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9030#discussion_r41551077
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoin.scala ---
    @@ -89,8 +89,13 @@ case class BroadcastHashJoin(
             // The following line doesn't run in a job so we cannot track the metric value. However, we
             // have already tracked it in the above lines. So here we can use
             // `SQLMetrics.nullLongMetric` to ignore it.
    +
    +        input.foreach { row =>
    --- End diff --
    
    going to revert this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10914] UnsafeRow serialization breaks w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9030#issuecomment-146649482
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10914] UnsafeRow serialization breaks w...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/9030#issuecomment-146701450
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10914] UnsafeRow serialization breaks w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9030#issuecomment-146680709
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10914] UnsafeRow serialization breaks w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9030#issuecomment-146652667
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10914] UnsafeRow serialization breaks w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9030#issuecomment-146652257
  
      [Test build #43414 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43414/consoleFull) for   PR 9030 at commit [`9b79e6f`](https://github.com/apache/spark/commit/9b79e6f1de4114d7a9a1dc693c079409b6846ffb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10914] UnsafeRow serialization breaks w...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/9030#issuecomment-146651839
  
    LGTM, we have done this for UTF8String already (not support Kryo). 
    
    @cloud-fan Should we also do this for UnsafeArrayData?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10914] UnsafeRow serialization breaks w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9030#issuecomment-146680629
  
      [Test build #43414 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43414/console) for   PR 9030 at commit [`9b79e6f`](https://github.com/apache/spark/commit/9b79e6f1de4114d7a9a1dc693c079409b6846ffb).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `public final class UnsafeRow extends MutableRow implements Externalizable, KryoSerializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10914] UnsafeRow serialization breaks w...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9030#discussion_r41558879
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/UnsafeRowSuite.scala ---
    @@ -29,6 +30,32 @@ import org.apache.spark.unsafe.types.UTF8String
     
     class UnsafeRowSuite extends SparkFunSuite {
     
    +  test("UnsafeRow Java serialization") {
    +    // serializing an UnsafeRow pointing to a large buffer should only serialize the relevant data
    +    val data = new Array[Byte](1024)
    +    val row = new UnsafeRow
    +    row.pointTo(data, 1, 16)
    +    row.setLong(0, 19285)
    +
    +    val ser = new JavaSerializer(new SparkConf).newInstance()
    +    val row1 = ser.deserialize[UnsafeRow](ser.serialize(row))
    +    assert(row1.getLong(0) == 19285)
    +    assert(row1.getBaseObject().asInstanceOf[Array[Byte]].length == 16)
    +  }
    +
    +  test("UnsafeRow Kryo serialization") {
    +    // serializing an UnsafeRow pointing to a large buffer should only serialize the relevant data
    +    val data = new Array[Byte](1024)
    +    val row = new UnsafeRow
    +    row.pointTo(data, 1, 16)
    +    row.setLong(0, 19285)
    +
    +    val ser = new KryoSerializer(new SparkConf).newInstance()
    +    val row1 = ser.deserialize[UnsafeRow](ser.serialize(row))
    +    assert(row1.getLong(0) == 19285)
    +    assert(row1.getBaseObject().asInstanceOf[Array[Byte]].length == 16)
    --- End diff --
    
    This actually doesn't matter anymore with new versions of ScalaTest.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10914] UnsafeRow serialization breaks w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9030#issuecomment-146721247
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10914] UnsafeRow serialization breaks w...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/9030


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10914] UnsafeRow serialization breaks w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9030#issuecomment-146702398
  
      [Test build #43434 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43434/consoleFull) for   PR 9030 at commit [`9b79e6f`](https://github.com/apache/spark/commit/9b79e6f1de4114d7a9a1dc693c079409b6846ffb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10914] UnsafeRow serialization breaks w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9030#issuecomment-146670768
  
      [Test build #1860 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1860/console) for   PR 9030 at commit [`9b79e6f`](https://github.com/apache/spark/commit/9b79e6f1de4114d7a9a1dc693c079409b6846ffb).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10914] UnsafeRow serialization breaks w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9030#issuecomment-146693557
  
      [Test build #1861 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1861/console) for   PR 9030 at commit [`9b79e6f`](https://github.com/apache/spark/commit/9b79e6f1de4114d7a9a1dc693c079409b6846ffb).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `public final class UnsafeRow extends MutableRow implements Externalizable, KryoSerializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10914] UnsafeRow serialization breaks w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9030#issuecomment-146667055
  
      [Test build #1860 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1860/consoleFull) for   PR 9030 at commit [`9b79e6f`](https://github.com/apache/spark/commit/9b79e6f1de4114d7a9a1dc693c079409b6846ffb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10914] UnsafeRow serialization breaks w...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9030#discussion_r41553070
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java ---
    @@ -596,4 +601,40 @@ public boolean anyNull() {
       public void writeToMemory(Object target, long targetOffset) {
         Platform.copyMemory(baseObject, baseOffset, target, targetOffset, sizeInBytes);
       }
    +
    +  @Override
    +  public void writeExternal(ObjectOutput out) throws IOException {
    +    byte[] bytes = getBytes();
    +    out.writeInt(bytes.length);
    +    out.writeInt(this.numFields);
    +    out.write(bytes);
    +  }
    +
    +  @Override
    +  public void readExternal(ObjectInput in) throws IOException, ClassNotFoundException {
    +    this.baseOffset = BYTE_ARRAY_OFFSET;
    +    this.sizeInBytes = in.readInt();
    +    this.numFields = in.readInt();
    +    this.bitSetWidthInBytes = calculateBitSetWidthInBytes(numFields);
    +    this.baseObject = new byte[sizeInBytes];
    +    in.readFully((byte[]) baseObject);
    +  }
    +
    +  @Override
    +  public void write(Kryo kryo, Output out) {
    +    byte[] bytes = getBytes();
    +    out.writeInt(bytes.length);
    +    out.writeInt(this.numFields);
    +    out.write(bytes);
    --- End diff --
    
    can this just call `writeExternal(out)`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10914] UnsafeRow serialization breaks w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9030#issuecomment-146721142
  
      [Test build #43434 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43434/console) for   PR 9030 at commit [`9b79e6f`](https://github.com/apache/spark/commit/9b79e6f1de4114d7a9a1dc693c079409b6846ffb).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `public final class UnsafeRow extends MutableRow implements Externalizable, KryoSerializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10914] UnsafeRow serialization breaks w...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/9030#issuecomment-146724781
  
    Merging this in master & branch-1.5.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10914] UnsafeRow serialization breaks w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9030#issuecomment-146649524
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10914] UnsafeRow serialization breaks w...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/9030#issuecomment-146647963
  
    cc @davies and @JoshRosen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10914] UnsafeRow serialization breaks w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9030#issuecomment-146648238
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10914] UnsafeRow serialization breaks w...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9030#discussion_r41553096
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/UnsafeRowSuite.scala ---
    @@ -29,6 +30,32 @@ import org.apache.spark.unsafe.types.UTF8String
     
     class UnsafeRowSuite extends SparkFunSuite {
     
    +  test("UnsafeRow Java serialization") {
    +    // serializing an UnsafeRow pointing to a large buffer should only serialize the relevant data
    +    val data = new Array[Byte](1024)
    +    val row = new UnsafeRow
    +    row.pointTo(data, 1, 16)
    +    row.setLong(0, 19285)
    +
    +    val ser = new JavaSerializer(new SparkConf).newInstance()
    +    val row1 = ser.deserialize[UnsafeRow](ser.serialize(row))
    +    assert(row1.getLong(0) == 19285)
    +    assert(row1.getBaseObject().asInstanceOf[Array[Byte]].length == 16)
    +  }
    +
    +  test("UnsafeRow Kryo serialization") {
    +    // serializing an UnsafeRow pointing to a large buffer should only serialize the relevant data
    +    val data = new Array[Byte](1024)
    +    val row = new UnsafeRow
    +    row.pointTo(data, 1, 16)
    +    row.setLong(0, 19285)
    +
    +    val ser = new KryoSerializer(new SparkConf).newInstance()
    +    val row1 = ser.deserialize[UnsafeRow](ser.serialize(row))
    +    assert(row1.getLong(0) == 19285)
    +    assert(row1.getBaseObject().asInstanceOf[Array[Byte]].length == 16)
    --- End diff --
    
    `===`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10914] UnsafeRow serialization breaks w...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/9030#issuecomment-146654269
  
    I think we should apply this to unsafe array too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10914] UnsafeRow serialization breaks w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9030#issuecomment-146701946
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10914] UnsafeRow serialization breaks w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9030#issuecomment-146652670
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43413/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10914] UnsafeRow serialization breaks w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9030#issuecomment-146648268
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10914] UnsafeRow serialization breaks w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9030#issuecomment-146701970
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10914] UnsafeRow serialization breaks w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9030#issuecomment-146680712
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43414/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10914] UnsafeRow serialization breaks w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9030#issuecomment-146721248
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43434/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org