You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by liancheng <gi...@git.apache.org> on 2014/09/09 06:12:56 UTC

[GitHub] spark pull request: [SPARK-3294][SQL] WIP: eliminates boxing costs...

GitHub user liancheng opened a pull request:

    https://github.com/apache/spark/pull/2327

    [SPARK-3294][SQL] WIP: eliminates boxing costs from in-memory columnar storage

    This is a major refactoring of the in-memory columnar storage implementation, aims to eliminate boxing costs as much as possible. The basic idea is to refactor all major interfaces into a row-based form and use them together with `SpecificMutableRow`. The difficult part is how to adapt all compression schemes, esp. `RunLengthEncoding` and `DictionaryEncoding` to this design. Since in-memory compression is disabled by default for now, and this PR should be strictly better than before no matter in-memory compression is enabled or not, maybe I'll finish that part in another PR.
    
    TODO
    
    - [ ] Benchmark
    - [ ] Eliminate boxing costs in `RunLengthEncoding`
    - [ ] Eliminate boxing costs in `DictionaryEncoding` (not easy to do without specializing `DictionaryEncoding` for every supported column type)

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/liancheng/spark prevent-boxing/unboxing

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2327.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2327
    
----
commit 7fb1ac67048114b0cf14e7d9bcbf86d544f72fa9
Author: Cheng Lian <li...@gmail.com>
Date:   2014-09-08T22:14:33Z

    Made some in-memory columnar storage interfaces row-based

commit e6cf2647789881d3cc7ced7a44407aa467e7a62e
Author: Cheng Lian <li...@gmail.com>
Date:   2014-09-08T23:57:31Z

    Removes boxing cost in IntDelta and LongDelta by providing specialized implementations

commit f338236d4ef14c39084df3ff23d1733eaf8cd7db
Author: Cheng Lian <li...@gmail.com>
Date:   2014-09-09T01:13:10Z

    Makes ColumnAccessor.extractSingle row based

commit 1d7d1443339e99d17074ef731e8fedb4985d9f63
Author: Cheng Lian <li...@gmail.com>
Date:   2014-09-09T01:25:10Z

    Made compression decoder row based

commit 9c5fae6987b283875bd9eaf315cdaebc06abe45a
Author: Cheng Lian <li...@gmail.com>
Date:   2014-09-09T01:49:32Z

    Added row based ColumnType.append/extract

commit 269bd78bb3c7efb7ca24d08bade534d459a4f74a
Author: Cheng Lian <li...@gmail.com>
Date:   2014-09-09T02:12:16Z

    Use SpecificMutableRow in InMemoryColumnarTableScan to avoid boxing

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3294][SQL] WIP: eliminates boxing costs...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/2327#issuecomment-55308174
  
    Nice speed ups.  I think they might be even more pronounced when there are multiple threads fighting for the GC.
    
    Minor comments only.  Will merge after they are addressed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3294][SQL] WIP: eliminates boxing costs...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2327#issuecomment-55205667
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20125/consoleFull) for   PR 2327 at commit [`e5d2cf2`](https://github.com/apache/spark/commit/e5d2cf229ac1d4cc7ff943ae8320c623bb4a2f9b).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3294][SQL] WIP: eliminates boxing costs...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2327#issuecomment-55249188
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/68/consoleFull) for   PR 2327 at commit [`e5d2cf2`](https://github.com/apache/spark/commit/e5d2cf229ac1d4cc7ff943ae8320c623bb4a2f9b).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  class Encoder[T <: NativeType](columnType: NativeColumnType[T]) extends compression.Encoder[T] `
      * `  class Encoder[T <: NativeType](columnType: NativeColumnType[T]) extends compression.Encoder[T] `
      * `  class Encoder[T <: NativeType](columnType: NativeColumnType[T]) extends compression.Encoder[T] `
      * `  class Encoder extends compression.Encoder[IntegerType.type] `
      * `  class Decoder(buffer: ByteBuffer, columnType: NativeColumnType[IntegerType.type])`
      * `  class Encoder extends compression.Encoder[LongType.type] `
      * `  class Decoder(buffer: ByteBuffer, columnType: NativeColumnType[LongType.type])`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3294][SQL] Eliminates boxing costs from...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/2327


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3294][SQL] WIP: eliminates boxing costs...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/2327#issuecomment-55184983
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3294][SQL] WIP: eliminates boxing costs...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2327#issuecomment-55064948
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20072/consoleFull) for   PR 2327 at commit [`97bbc4e`](https://github.com/apache/spark/commit/97bbc4ed9b9e3f1b8248db78bd57b6b33a09f2c0).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3294][SQL] Eliminates boxing costs from...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/2327#issuecomment-55508421
  
    Thanks! I've merged this to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3294][SQL] WIP: eliminates boxing costs...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2327#issuecomment-55188987
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20118/consoleFull) for   PR 2327 at commit [`489f97b`](https://github.com/apache/spark/commit/489f97baa4a80eca7fe61584586992b35969d86f).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3294][SQL] WIP: eliminates boxing costs...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2327#discussion_r17438553
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/columnar/ColumnAccessor.scala ---
    @@ -51,10 +51,12 @@ private[sql] abstract class BasicColumnAccessor[T <: DataType, JvmType](
       def hasNext = buffer.hasRemaining
     
       def extractTo(row: MutableRow, ordinal: Int) {
    --- End diff --
    
    This style is going to go away in 2.12 or 2.13 I think.  Should be `:Unit = `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3294][SQL] Eliminates boxing costs from...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2327#discussion_r17513542
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala ---
    @@ -295,8 +295,16 @@ class HiveQuerySuite extends HiveComparisonTest {
         "SELECT (CASE WHEN key > 2 THEN 3 WHEN 2 > key THEN 2 ELSE 0 END) FROM src WHERE key < 15")
     
       test("implement identity function using case statement") {
    -    val actual = sql("SELECT (CASE key WHEN key THEN key END) FROM src").collect().toSet
    -    val expected = sql("SELECT key FROM src").collect().toSet
    +    val actual = sql("SELECT (CASE key WHEN key THEN key END) FROM src")
    +      .map { case Row(i: Int) => i }
    +      .collect()
    +      .toSeq
    --- End diff --
    
    @liancheng - ping :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3294][SQL] WIP: eliminates boxing costs...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/2327#issuecomment-55061419
  
    @aarondav to expand on that, as soon as there is any nesting all of our clever tricks for eliminating allocations go out the window.  We can probably improve this in future releases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3294][SQL] WIP: eliminates boxing costs...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2327#issuecomment-55213228
  
    **[Tests timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20125/consoleFull)** after     a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3294][SQL] Eliminates boxing costs from...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2327#issuecomment-55505281
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20278/consoleFull) for   PR 2327 at commit [`4419fe4`](https://github.com/apache/spark/commit/4419fe4e7ad50c2180aecb1fde8ff10a383fa837).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3294][SQL] WIP: eliminates boxing costs...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/spark/pull/2327#issuecomment-54955570
  
    Out of curiosity, does this also eliminate boxing for nested data types?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3294][SQL] WIP: eliminates boxing costs...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2327#issuecomment-54929543
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20020/consoleFull) for   PR 2327 at commit [`269bd78`](https://github.com/apache/spark/commit/269bd78bb3c7efb7ca24d08bade534d459a4f74a).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  class Encoder[T <: NativeType](columnType: NativeColumnType[T]) extends compression.Encoder[T] `
      * `  class Encoder[T <: NativeType](columnType: NativeColumnType[T]) extends compression.Encoder[T] `
      * `  class Encoder[T <: NativeType](columnType: NativeColumnType[T]) extends compression.Encoder[T] `
      * `  class Encoder extends compression.Encoder[IntegerType.type] `
      * `  class Decoder(buffer: ByteBuffer, columnType: NativeColumnType[IntegerType.type])`
      * `  class Encoder extends compression.Encoder[LongType.type] `
      * `  class Decoder(buffer: ByteBuffer, columnType: NativeColumnType[LongType.type])`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3294][SQL] WIP: eliminates boxing costs...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2327#issuecomment-55239710
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/68/consoleFull) for   PR 2327 at commit [`e5d2cf2`](https://github.com/apache/spark/commit/e5d2cf229ac1d4cc7ff943ae8320c623bb4a2f9b).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3294][SQL] WIP: eliminates boxing costs...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2327#issuecomment-55204983
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/45/consoleFull) for   PR 2327 at commit [`e5d2cf2`](https://github.com/apache/spark/commit/e5d2cf229ac1d4cc7ff943ae8320c623bb4a2f9b).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3294][SQL] WIP: eliminates boxing costs...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/2327#issuecomment-55219180
  
    @marmbrus Please help review this one. `HiveTableScan` is also optimized BTW.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3294][SQL] WIP: eliminates boxing costs...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2327#issuecomment-55212049
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/45/consoleFull) for   PR 2327 at commit [`e5d2cf2`](https://github.com/apache/spark/commit/e5d2cf229ac1d4cc7ff943ae8320c623bb4a2f9b).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  class Encoder[T <: NativeType](columnType: NativeColumnType[T]) extends compression.Encoder[T] `
      * `  class Encoder[T <: NativeType](columnType: NativeColumnType[T]) extends compression.Encoder[T] `
      * `  class Encoder[T <: NativeType](columnType: NativeColumnType[T]) extends compression.Encoder[T] `
      * `  class Encoder extends compression.Encoder[IntegerType.type] `
      * `  class Decoder(buffer: ByteBuffer, columnType: NativeColumnType[IntegerType.type])`
      * `  class Encoder extends compression.Encoder[LongType.type] `
      * `  class Decoder(buffer: ByteBuffer, columnType: NativeColumnType[LongType.type])`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3294][SQL] WIP: eliminates boxing costs...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/2327#issuecomment-55202999
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3294][SQL] WIP: eliminates boxing costs...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2327#discussion_r17281621
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SpecificRow.scala ---
    @@ -227,7 +227,9 @@ final class SpecificMutableRow(val values: Array[MutableValue]) extends MutableR
         new SpecificMutableRow(newValues)
       }
     
    -  override def update(ordinal: Int, value: Any): Unit = values(ordinal).update(value)
    +  override def update(ordinal: Int, value: Any) {
    +    if (value == null) setNullAt(ordinal) else values(ordinal).update(value)
    +  }
    --- End diff --
    
    This change is submitted separately in #2325 as this PR may take longer time to finish.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3294][SQL] WIP: eliminates boxing costs...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/2327#issuecomment-55010238
  
    No, unlike Parquet, currently our in-memory columnar format doesn't support complex nested objects well. They are just serialized by Kryo and stored as opaque byte arrays.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3294][SQL] Eliminates boxing costs from...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2327#discussion_r17514159
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala ---
    @@ -295,8 +295,16 @@ class HiveQuerySuite extends HiveComparisonTest {
         "SELECT (CASE WHEN key > 2 THEN 3 WHEN 2 > key THEN 2 ELSE 0 END) FROM src WHERE key < 15")
     
       test("implement identity function using case statement") {
    -    val actual = sql("SELECT (CASE key WHEN key THEN key END) FROM src").collect().toSet
    -    val expected = sql("SELECT key FROM src").collect().toSet
    +    val actual = sql("SELECT (CASE key WHEN key THEN key END) FROM src")
    +      .map { case Row(i: Int) => i }
    +      .collect()
    +      .toSeq
    --- End diff --
    
    Updated :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3294][SQL] WIP: eliminates boxing costs...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/2327#issuecomment-55306364
  
    I need to look this over still, but want to remove WIP?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3294][SQL] WIP: eliminates boxing costs...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2327#issuecomment-55069727
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20072/consoleFull) for   PR 2327 at commit [`97bbc4e`](https://github.com/apache/spark/commit/97bbc4ed9b9e3f1b8248db78bd57b6b33a09f2c0).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3294][SQL] WIP: eliminates boxing costs...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2327#issuecomment-55199753
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20118/consoleFull) for   PR 2327 at commit [`489f97b`](https://github.com/apache/spark/commit/489f97baa4a80eca7fe61584586992b35969d86f).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  class Encoder[T <: NativeType](columnType: NativeColumnType[T]) extends compression.Encoder[T] `
      * `  class Encoder[T <: NativeType](columnType: NativeColumnType[T]) extends compression.Encoder[T] `
      * `  class Encoder[T <: NativeType](columnType: NativeColumnType[T]) extends compression.Encoder[T] `
      * `  class Encoder extends compression.Encoder[IntegerType.type] `
      * `  class Decoder(buffer: ByteBuffer, columnType: NativeColumnType[IntegerType.type])`
      * `  class Encoder extends compression.Encoder[LongType.type] `
      * `  class Decoder(buffer: ByteBuffer, columnType: NativeColumnType[LongType.type])`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3294][SQL] WIP: eliminates boxing costs...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2327#discussion_r17439088
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala ---
    @@ -295,8 +295,16 @@ class HiveQuerySuite extends HiveComparisonTest {
         "SELECT (CASE WHEN key > 2 THEN 3 WHEN 2 > key THEN 2 ELSE 0 END) FROM src WHERE key < 15")
     
       test("implement identity function using case statement") {
    -    val actual = sql("SELECT (CASE key WHEN key THEN key END) FROM src").collect().toSet
    -    val expected = sql("SELECT key FROM src").collect().toSet
    +    val actual = sql("SELECT (CASE key WHEN key THEN key END) FROM src")
    +      .map { case Row(i: Int) => i }
    +      .collect()
    +      .toSeq
    --- End diff --
    
    Changing this from `toSet` to `toSeq` makes the query possibly order dependent.  This could break #2164.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3294][SQL] Eliminates boxing costs from...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2327#issuecomment-55508179
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20278/consoleFull) for   PR 2327 at commit [`4419fe4`](https://github.com/apache/spark/commit/4419fe4e7ad50c2180aecb1fde8ff10a383fa837).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  class Encoder[T <: NativeType](columnType: NativeColumnType[T]) extends compression.Encoder[T] `
      * `  class Encoder[T <: NativeType](columnType: NativeColumnType[T]) extends compression.Encoder[T] `
      * `  class Encoder[T <: NativeType](columnType: NativeColumnType[T]) extends compression.Encoder[T] `
      * `  class Encoder extends compression.Encoder[IntegerType.type] `
      * `  class Decoder(buffer: ByteBuffer, columnType: NativeColumnType[IntegerType.type])`
      * `  class Encoder extends compression.Encoder[LongType.type] `
      * `  class Decoder(buffer: ByteBuffer, columnType: NativeColumnType[LongType.type])`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org