You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by maropu <gi...@git.apache.org> on 2016/03/02 06:46:34 UTC

[GitHub] spark pull request: [SPARK-13607][SQL] Improve compression perform...

GitHub user maropu opened a pull request:

    https://github.com/apache/spark/pull/11461

    [SPARK-13607][SQL] Improve compression performance for integer-typed values on cache

    ## What changes were proposed in this pull request?
    This pr improves compression performance for integer-typed values on cache to reduce GC pressure.
    A goal of this activity is to make in-memory cache size approaching to parquet formatted data size on disk. Since spark uses simpler compression algorithms than parquet does in `compressionSchemes`,
    the size of in-memory columnar cache is much bigger than parquet data on disk. In one use-case (See https://www.mail-archive.com/user@spark.apache.org/msg45241.html), 24.59GB of parquet data on disk becomes 41.7GB on cache. This pr uses bit packers implemented in parquet-column that spark already has as a package dependency.
    
    ## How was this patch tested?
    Add `DeltaBinaryPackingSuite` that uses  various input patterns for compression.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/maropu/spark BinaryPackingSpike

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11461.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11461
    
----
commit d443e90c3b623edd3dad51353ccbe2448f30db0d
Author: Takeshi YAMAMURO <li...@gmail.com>
Date:   2016-02-23T05:23:41Z

    Implement IntDeltaBinaryPacking in CompressionSchemes

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13607][SQL] Improve compression perform...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11461#issuecomment-202715552
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54410/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13607][SQL] Improve compression perform...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11461#issuecomment-202659666
  
    **[Test build #54402 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54402/consoleFull)** for PR 11461 at commit [`df20362`](https://github.com/apache/spark/commit/df20362fbefc23b58761c1843fed33bcee0e339d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #11461: [SPARK-13607][SQL] Improve compression performance for i...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/11461
  
    I think this improvement is not always necessary, so I'll close this for now. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13607][SQL] Improve compression perform...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11461#issuecomment-202687485
  
    **[Test build #54402 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54402/consoleFull)** for PR 11461 at commit [`df20362`](https://github.com/apache/spark/commit/df20362fbefc23b58761c1843fed33bcee0e339d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  class Encoder extends compression.Encoder[IntegerType.type] `
      * `  class Decoder(buffer: ByteBuffer, columnType: NativeColumnType[IntegerType.type])`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13607][SQL] Improve compression perform...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on the pull request:

    https://github.com/apache/spark/pull/11461#issuecomment-192926853
  
    @nongli @rxin ping


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13607][SQL] Improve compression perform...

Posted by nongli <gi...@git.apache.org>.

Github user nongli commented on the pull request:

    https://github.com/apache/spark/pull/11461#issuecomment-202633270
  
    can you include the benchmark code as well?
    
    What do the numbers mean? For example: IntDeltaBinaryPacking(0.182). Does this mean it is 18% of the original size?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13607][SQL] Improve compression perform...

Posted by nongli <gi...@git.apache.org>.

Github user nongli commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11461#discussion_r57654201
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/compression/compressionSchemes.scala ---
    @@ -530,3 +532,283 @@ private[columnar] case object LongDelta extends CompressionScheme {
         }
       }
     }
    +
    +/**
    + * Writes integral-type values with delta encoding and binary packing.
    + * The format is as follows:
    --- End diff --
    
    How does this relate to the parquet spec/implementation?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13607][SQL] Improve compression perform...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11461#issuecomment-202715170
  
    **[Test build #54410 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54410/consoleFull)** for PR 11461 at commit [`ae80adb`](https://github.com/apache/spark/commit/ae80adbc438909af124d745775bf6bf20798de71).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13607][SQL] Improve compression perform...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11461#issuecomment-191076135
  
    **[Test build #52296 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52296/consoleFull)** for PR 11461 at commit [`d443e90`](https://github.com/apache/spark/commit/d443e90c3b623edd3dad51353ccbe2448f30db0d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13607][SQL] Improve compression perform...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on the pull request:

    https://github.com/apache/spark/pull/11461#issuecomment-202695897
  
    @nongli okay, fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13607][SQL] Improve compression perform...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11461#issuecomment-202688055
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13607][SQL] Improve compression perform...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on the pull request:

    https://github.com/apache/spark/pull/11461#issuecomment-191077441
  
    I tried another quick check by using a `inventry` table in TPC-DS (scale=1) and the table has
    (inv_date_sk INT, inv_item_sk INT, inv_warehouse_sk INT, inv_quantity_on_hand INT) as a schema;
    ```
    parquet size on disk: 15.0MB
    IntDelta: 37.3MB
    IntDeltaBinaryPacking: 20.7MB
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13607][SQL] Improve compression perform...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on the pull request:

    https://github.com/apache/spark/pull/11461#issuecomment-205704784
  
    @nongli ping


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #11461: [SPARK-13607][SQL] Improve compression performance for i...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/11461
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13607][SQL] Improve compression perform...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on the pull request:

    https://github.com/apache/spark/pull/11461#issuecomment-202339211
  
    @nongli @rxin ping


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13607][SQL] Improve compression perform...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on the pull request:

    https://github.com/apache/spark/pull/11461#issuecomment-202653031
  
    Yes, you're right; it is the percentage of the original one.
    Okay, I'll add benchmark codes, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13607][SQL] Improve compression perform...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on the pull request:

    https://github.com/apache/spark/pull/11461#issuecomment-217120602
  
    @rxin okay, I keep this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13607][SQL] Improve compression perform...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11461#issuecomment-191109490
  
    **[Test build #52296 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52296/consoleFull)** for PR 11461 at commit [`d443e90`](https://github.com/apache/spark/commit/d443e90c3b623edd3dad51353ccbe2448f30db0d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  class Encoder extends compression.Encoder[IntegerType.type] `
      * `  class Decoder(buffer: ByteBuffer, columnType: NativeColumnType[IntegerType.type])`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #11461: [SPARK-13607][SQL] Improve compression performance for i...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/11461
  
    **[Test build #59765 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59765/consoleFull)** for PR 11461 at commit [`ae80adb`](https://github.com/apache/spark/commit/ae80adbc438909af124d745775bf6bf20798de71).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13607][SQL] Improve compression perform...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11461#discussion_r57661684
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/compression/compressionSchemes.scala ---
    @@ -530,3 +532,283 @@ private[columnar] case object LongDelta extends CompressionScheme {
         }
       }
     }
    +
    +/**
    + * Writes integral-type values with delta encoding and binary packing.
    + * The format is as follows:
    --- End diff --
    
    I just used the parquet spec./impl. as a reference; some parts of the impl. are modified to easily compute compression size in `gatherCompressibilityStats`.
    https://github.com/Parquet/parquet-mr/blob/master/parquet-column/src/main/java/parquet/column/values/delta/DeltaBinaryPackingValuesWriter.java#L39



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13607][SQL] Improve compression perform...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on the pull request:

    https://github.com/apache/spark/pull/11461#issuecomment-200240112
  
    @nongli @rxin ping


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13607][SQL] Improve compression perform...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11461#issuecomment-202715550
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #11461: [SPARK-13607][SQL] Improve compression performance for i...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/11461
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59765/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13607][SQL] Improve compression perform...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on the pull request:

    https://github.com/apache/spark/pull/11461#issuecomment-213352754
  
    @nongli ping


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13607][SQL] Improve compression perform...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11461#issuecomment-191109742
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13607][SQL] Improve compression perform...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on the pull request:

    https://github.com/apache/spark/pull/11461#issuecomment-191153285
  
    @nongli @rxin Could you give me comments on this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #11461: [SPARK-13607][SQL] Improve compression performance for i...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/11461
  
    **[Test build #59765 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59765/consoleFull)** for PR 11461 at commit [`ae80adb`](https://github.com/apache/spark/commit/ae80adbc438909af124d745775bf6bf20798de71).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13607][SQL] Improve compression perform...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11461#issuecomment-191109746
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52296/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13607][SQL] Improve compression perform...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11461#issuecomment-202695912
  
    **[Test build #54410 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54410/consoleFull)** for PR 11461 at commit [`ae80adb`](https://github.com/apache/spark/commit/ae80adbc438909af124d745775bf6bf20798de71).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #11461: [SPARK-13607][SQL] Improve compression performanc...

Posted by maropu <gi...@git.apache.org>.

Github user maropu closed the pull request at:

    https://github.com/apache/spark/pull/11461


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13607][SQL] Improve compression perform...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11461#issuecomment-202688057
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54402/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13607][SQL] Improve compression perform...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on the pull request:

    https://github.com/apache/spark/pull/11461#issuecomment-191076647
  
    The benchmark results are as follows;
    ```
    Running benchmark: INT Decode(Lower Skew)
      Running case: PassThrough(1.000)
      Running case: RunLengthEncoding(1.002)
      Running case: DictionaryEncoding(0.500)
      Running case: IntDelta(0.250)
      Running case: IntDeltaBinaryPacking(0.068)
    
    Intel(R) Core(TM) i7-4578U CPU @ 3.00GHz
    INT Decode(Lower Skew):             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -------------------------------------------------------------------------------------------
    PassThrough(1.000)                        285 /  360        235.7           4.2       1.0X
    RunLengthEncoding(1.002)                  700 /  715         95.8          10.4       0.4X
    DictionaryEncoding(0.500)                 763 /  782         88.0          11.4       0.4X
    IntDelta(0.250)                           684 /  702         98.1          10.2       0.4X
    IntDeltaBinaryPacking(0.068)              805 /  811         83.4          12.0       0.4X
    
    Running benchmark: INT Decode(Higher Skew)
      Running case: PassThrough(1.000)
      Running case: RunLengthEncoding(1.337)
      Running case: DictionaryEncoding(0.501)
      Running case: IntDelta(0.250)
      Running case: IntDeltaBinaryPacking(0.182)
    
    Intel(R) Core(TM) i7-4578U CPU @ 3.00GHz
    INT Decode(Higher Skew):            Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -------------------------------------------------------------------------------------------
    PassThrough(1.000)                        690 /  716         97.3          10.3       1.0X
    RunLengthEncoding(1.337)                 1127 / 1148         59.5          16.8       0.6X
    DictionaryEncoding(0.501)                 836 /  856         80.2          12.5       0.8X
    IntDelta(0.250)                           763 /  778         88.0          11.4       0.9X
    IntDeltaBinaryPacking(0.182)              873 /  884         76.9          13.0       0.8X
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13607][SQL] Improve compression perform...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/11461#issuecomment-217087851
  
    @maropu sorry for the delay. I think we do want to revisit this in Spark 2.1. Let's keep the pull request open and revisit this after 2.0.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13607][SQL] Improve compression perform...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on the pull request:

    https://github.com/apache/spark/pull/11461#issuecomment-209807378
  
    @nongli ping


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org