You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by liancheng <gi...@git.apache.org> on 2015/09/24 22:47:16 UTC

[GitHub] spark pull request: [SPARK-10811] [SQL] Eliminates unnecessary byt...

GitHub user liancheng opened a pull request:

    https://github.com/apache/spark/pull/8907

    [SPARK-10811] [SQL] Eliminates unnecessary byte array copying

    When reading Parquet string and binary-backed decimal values, Parquet `Binary.getBytes` always returns a copied byte array, which is unnecessary. Since the underlying implementation of `Binary` values there is guaranteed to be `ByteArraySliceBackedBinary`, we can use `Binary.toByteBuffer.array()` to steal the underlying byte array without copying it.
    
    This brings performance benefits when scanning Parquet string and binary-backed decimal columns. Note that, this trick doesn't cover binary-backed decimals with precision greater than 18.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/liancheng/spark spark-10811/eliminate-array-copying

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/8907.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #8907
    
----
commit b99158e29878f5ba73b0beb3c19f33429ecd30cd
Author: Cheng Lian <li...@databricks.com>
Date:   2015-09-24T20:41:10Z

    Eliminates unnecessary byte array copying

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10811] [SQL] Eliminates unnecessary byt...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/8907#issuecomment-143062763
  
    Hm, after some profiling, I found that stealing byte arrays using `ByteBuffer` is actual faster than the `ByteArrayThief` hack I employed in the 2nd commit. This is probably because of virtual function dispatching cost exposed by `OutputStream.write`. Will revert my last commit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10811] [SQL] Eliminates unnecessary byt...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8907#discussion_r40371067
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystRowConverter.scala ---
    @@ -332,24 +338,30 @@ private[parquet] class CatalystRowConverter(
         private def toDecimal(value: Binary): Decimal = {
           val precision = decimalType.precision
           val scale = decimalType.scale
    -      val bytes = value.getBytes
     
           if (precision <= CatalystSchemaConverter.MAX_PRECISION_FOR_INT64) {
    -        // Constructs a `Decimal` with an unscaled `Long` value if possible.
    +        // Constructs a `Decimal` with an unscaled `Long` value if possible.  The underlying
    +        // `ByteBuffer` implementation is guaranteed to be `HeapByteBuffer`, so here we are using
    +        // `Binary.toByteBuffer.array()` to steal the underlying byte array without copying it.
    +        val buffer = value.toByteBuffer
    +        val bytes = buffer.array()
    +        val start = buffer.position()
    +        val end = buffer.limit()
    +
             var unscaled = 0L
    -        var i = 0
    +        var i = start
     
    -        while (i < bytes.length) {
    +        while (i < end) {
               unscaled = (unscaled << 8) | (bytes(i) & 0xff)
               i += 1
             }
     
    -        val bits = 8 * bytes.length
    +        val bits = 8 * (end - start)
             unscaled = (unscaled << (64 - bits)) >> (64 - bits)
             Decimal(unscaled, precision, scale)
           } else {
             // Otherwise, resorts to an unscaled `BigInteger` instead.
    -        Decimal(new BigDecimal(new BigInteger(bytes), scale), precision, scale)
    +        Decimal(new BigDecimal(new BigInteger(value.getBytes), scale), precision, scale)
    --- End diff --
    
    For decimals whose precision is greater than 18, we still need to copy the byte array anyway to construct Java `BigInteger` instances.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10811] [SQL] Eliminates unnecessary byt...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8907#issuecomment-143069919
  
      [Test build #42979 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42979/console) for   PR 8907 at commit [`b99158e`](https://github.com/apache/spark/commit/b99158e29878f5ba73b0beb3c19f33429ecd30cd).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10811] [SQL] Eliminates unnecessary byt...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8907#issuecomment-143096239
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10811] [SQL] Eliminates unnecessary byt...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/8907#issuecomment-144296080
  
    @davies @cloud-fan Thanks for the review! I'm merging this to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10811] [SQL] Eliminates unnecessary byt...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8907#issuecomment-143070006
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10811] [SQL] Eliminates unnecessary byt...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8907#issuecomment-143048674
  
      [Test build #42979 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42979/consoleFull) for   PR 8907 at commit [`b99158e`](https://github.com/apache/spark/commit/b99158e29878f5ba73b0beb3c19f33429ecd30cd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10811] [SQL] Eliminates unnecessary byt...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8907#issuecomment-143063693
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10811] [SQL] Eliminates unnecessary byt...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8907#discussion_r40371826
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystRowConverter.scala ---
    @@ -302,7 +302,13 @@ private[parquet] class CatalystRowConverter(
         }
     
         override def addBinary(value: Binary): Unit = {
    -      updater.set(UTF8String.fromBytes(value.getBytes))
    +      // The underlying `ByteBuffer` implementation is guaranteed to be `HeapByteBuffer`, so here we
    +      // are using `Binary.toByteBuffer.array()` to steal the underlying byte array without copying
    +      // it.
    +      val buffer = value.toByteBuffer
    --- End diff --
    
    this still allocates the byte buffer object, doesn't it?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10811] [SQL] Eliminates unnecessary byt...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8907#issuecomment-143057207
  
      [Test build #42984 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42984/consoleFull) for   PR 8907 at commit [`d59daf6`](https://github.com/apache/spark/commit/d59daf65347ac2ea3964872517b9431431f79dfa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10811] [SQL] Eliminates unnecessary byt...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8907#issuecomment-143047536
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10811] [SQL] Eliminates unnecessary byt...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8907#issuecomment-143078366
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10811] [SQL] Eliminates unnecessary byt...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8907#issuecomment-143085812
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10811] [SQL] Eliminates unnecessary byt...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8907#issuecomment-143056483
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10811] [SQL] Eliminates unnecessary byt...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8907#issuecomment-143063722
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10811] [SQL] Eliminates unnecessary byt...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8907#discussion_r40372370
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystRowConverter.scala ---
    @@ -302,7 +302,13 @@ private[parquet] class CatalystRowConverter(
         }
     
         override def addBinary(value: Binary): Unit = {
    -      updater.set(UTF8String.fromBytes(value.getBytes))
    +      // The underlying `ByteBuffer` implementation is guaranteed to be `HeapByteBuffer`, so here we
    +      // are using `Binary.toByteBuffer.array()` to steal the underlying byte array without copying
    +      // it.
    +      val buffer = value.toByteBuffer
    --- End diff --
    
    Yeah, it allocates a wrapping ByteBuffer. I haven't tried your way yet (using fake `DataOutput` and `Binary.writeTo()` to steal the underlying array). Will give it a shot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10811] [SQL] Eliminates unnecessary byt...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8907#issuecomment-143078376
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10811] [SQL] Eliminates unnecessary byt...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8907#issuecomment-143085813
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42985/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10811] [SQL] Eliminates unnecessary byt...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8907#issuecomment-143075010
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10811] [SQL] Eliminates unnecessary byt...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8907#discussion_r40373071
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystRowConverter.scala ---
    @@ -302,7 +302,13 @@ private[parquet] class CatalystRowConverter(
         }
     
         override def addBinary(value: Binary): Unit = {
    -      updater.set(UTF8String.fromBytes(value.getBytes))
    +      // The underlying `ByteBuffer` implementation is guaranteed to be `HeapByteBuffer`, so here we
    +      // are using `Binary.toByteBuffer.array()` to steal the underlying byte array without copying
    +      // it.
    +      val buffer = value.toByteBuffer
    --- End diff --
    
    I didn't mean to try mine - I think the best way is to change Parquet to give us exactly what we want.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10811] [SQL] Eliminates unnecessary byt...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8907#issuecomment-143096240
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42992/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10811] [SQL] Eliminates unnecessary byt...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8907#issuecomment-143047485
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10811] [SQL] Eliminates unnecessary byt...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8907#issuecomment-143070007
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42979/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10811] [SQL] Eliminates unnecessary byt...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8907#discussion_r40451726
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala ---
    @@ -145,7 +145,13 @@ final class Decimal extends Ordered[Decimal] with Serializable {
         }
       }
     
    -  def toJavaBigDecimal: java.math.BigDecimal = toBigDecimal.underlying()
    +  def toJavaBigDecimal: java.math.BigDecimal = {
    +    if (decimalVal.ne(null)) {
    +      decimalVal.underlying()
    +    } else {
    +      java.math.BigDecimal.valueOf(longVal, _scale)
    --- End diff --
    
    Originally, we always construct a Scala `BigDecimal` first, and then retrieve the underlying Java `BigDecimal`. Here we just create the Java one directly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10811] [SQL] Eliminates unnecessary byt...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8907#issuecomment-143064861
  
      [Test build #42985 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42985/consoleFull) for   PR 8907 at commit [`6d85e69`](https://github.com/apache/spark/commit/6d85e698fd83f31b5a8463d04c68374185013225).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10811] [SQL] Eliminates unnecessary byt...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/8907#issuecomment-144252020
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10811] [SQL] Eliminates unnecessary byt...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8907#issuecomment-143074577
  
      [Test build #42984 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42984/console) for   PR 8907 at commit [`d59daf6`](https://github.com/apache/spark/commit/d59daf65347ac2ea3964872517b9431431f79dfa).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10811] [SQL] Eliminates unnecessary byt...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/8907#issuecomment-143078102
  
    The [last commit] [1] brings another ~5% performance boost. We should probably just use Java `BigDecimal` directly within Catalyst `Decimal`. I guess this would make decimals with large precision (> 18) a little bit faster. But that can be done in a separate PR.
    
    [1]: https://github.com/liancheng/spark/commit/851f91fdcd9a82ef314d4436b768933f8c54875c


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10811] [SQL] Eliminates unnecessary byt...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8907#issuecomment-143096170
  
      [Test build #42992 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42992/console) for   PR 8907 at commit [`851f91f`](https://github.com/apache/spark/commit/851f91fdcd9a82ef314d4436b768933f8c54875c).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10811] [SQL] Eliminates unnecessary byt...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8907#issuecomment-143085733
  
      [Test build #42985 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42985/console) for   PR 8907 at commit [`6d85e69`](https://github.com/apache/spark/commit/6d85e698fd83f31b5a8463d04c68374185013225).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10811] [SQL] Eliminates unnecessary byt...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/8907#issuecomment-144217445
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10811] [SQL] Eliminates unnecessary byt...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8907#issuecomment-143078472
  
      [Test build #42992 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42992/consoleFull) for   PR 8907 at commit [`851f91f`](https://github.com/apache/spark/commit/851f91fdcd9a82ef314d4436b768933f8c54875c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10811] [SQL] Eliminates unnecessary byt...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8907#issuecomment-143075012
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42984/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10811] [SQL] Eliminates unnecessary byt...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8907#issuecomment-143056520
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10811] [SQL] Eliminates unnecessary byt...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/8907


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org