You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by HyukjinKwon <gi...@git.apache.org> on 2016/04/26 12:59:21 UTC

[GitHub] spark pull request: [SPARK-14917][SQL] Enable some ORC compression...

GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/12699

    [SPARK-14917][SQL] Enable some ORC compressions tests for writing

    ## What changes were proposed in this pull request?
    
    https://issues.apache.org/jira/browse/SPARK-14917
    
    As it is described in the JIRA, it seems Hive 1.2.1 which Spark uses now supports snappy and none.
    
    So, this PR enables some tests for writing ORC files with compression codes, `SNAPPY` and `NONE`.
    
    
    ## How was this patch tested?
    
    Unittests in `OrcQuerySuite` and `sbt scalastyle`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark SPARK-14917

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12699.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12699
    
----
commit 012674b7f7a899f220ceab60a8ef17ca8aeeb570
Author: hyukjinkwon <gu...@gmail.com>
Date:   2016-04-26T10:48:28Z

    Enable Snappy and None compression tests in ORC

commit 666da8b9c28cede8aba0d0557cbe55b42e74eedd
Author: hyukjinkwon <gu...@gmail.com>
Date:   2016-04-26T10:56:10Z

    Simplify the test logics

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14917][SQL] Enable some ORC compression...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/12699#issuecomment-214706781
  
    I'm not familiar with this part of the code


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14917][SQL] Enable some ORC compression...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12699#discussion_r61662731
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala ---
    @@ -169,39 +169,42 @@ class OrcQuerySuite extends QueryTest with BeforeAndAfterAll with OrcTest {
         }
       }
     
    -  // We only support zlib in Hive 0.12.0 now
    -  test("Default compression options for writing to an ORC file") {
    -    withOrcFile((1 to 100).map(i => (i, s"val_$i"))) { file =>
    -      assertResult(CompressionKind.ZLIB) {
    -        OrcFileOperator.getFileReader(file).get.getCompression
    -      }
    -    }
    -  }
    -
    -  // Following codec is supported in hive-0.13.1, ignore it now
    -  ignore("Other compression options for writing to an ORC file - 0.13.1 and above") {
    +  // Hive supports zlib, snappy and none for Hive 1.2.1.
    +  test("Compression options for writing to an ORC file (SNAPPY, ZLIB and NONE)") {
         val data = (1 to 100).map(i => (i, s"val_$i"))
         val conf = sqlContext.sessionState.hadoopConf
     
    +    withOrcFile(data) { file =>
    +      val expectedCompressionKind =
    +        OrcFileOperator.getFileReader(file).get.getCompression
    +      assert(CompressionKind.ZLIB === expectedCompressionKind)
    +    }
    +
         conf.set(ConfVars.HIVE_ORC_DEFAULT_COMPRESS.varname, "SNAPPY")
    --- End diff --
    
    Sure! Thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14917][SQL] Enable some ORC compression...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/12699#issuecomment-214705599
  
    BTW, in `OrcHadoopHsRelationSuite` has a little bit inappropriate test because the default codec is `ZLIB` but the test `SPARK-13543: Support for specifying compression codec for ORC via option()` tries to test `compression` option by setting the value as `ZlIb`. This does not test if the compression is properly set up or not.
    
    This is being handled in https://github.com/apache/spark/pull/12629 but not sure if this is merged (or reviewed). I can add this change in this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14917][SQL] Enable some ORC compression...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/12699


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14917][SQL] Enable some ORC compression...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/12699#issuecomment-215933632
  
    Thank you @yhuai! 
    Actually, do you mind if I try to submit a PR after walking through sql/hive tests and remove the class imports where it is possible maybe? I won't if there is any possible problem with that (eg. difficult to review or easy to make a conflict..)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14917][SQL] Enable some ORC compression...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/12699#issuecomment-214708459
  
    @rxin Could you please take a look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14917][SQL] Enable some ORC compression...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/12699#issuecomment-215933980
  
    @yhuai Sure. Thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14917][SQL] Enable some ORC compression...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12699#issuecomment-215929127
  
    **[Test build #57395 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57395/consoleFull)** for PR 12699 at commit [`eae8088`](https://github.com/apache/spark/commit/eae8088f22d0cd08738901d05c0ba84ca6a38942).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14917][SQL] Enable some ORC compression...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/12699#issuecomment-215933890
  
    How about we do it a little bit later? Maybe that will introduce conflicts  with prs from others.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14917][SQL] Enable some ORC compression...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12699#issuecomment-215926326
  
    **[Test build #57395 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57395/consoleFull)** for PR 12699 at commit [`eae8088`](https://github.com/apache/spark/commit/eae8088f22d0cd08738901d05c0ba84ca6a38942).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14917][SQL] Enable some ORC compression...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12699#issuecomment-215929373
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14917][SQL] Enable some ORC compression...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12699#issuecomment-215929374
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57395/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14917][SQL] Enable some ORC compression...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/12699#issuecomment-215933410
  
    Cool. Thanks! lgtm


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14917][SQL] Enable some ORC compression...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12699#discussion_r61662694
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala ---
    @@ -169,39 +169,42 @@ class OrcQuerySuite extends QueryTest with BeforeAndAfterAll with OrcTest {
         }
       }
     
    -  // We only support zlib in Hive 0.12.0 now
    -  test("Default compression options for writing to an ORC file") {
    -    withOrcFile((1 to 100).map(i => (i, s"val_$i"))) { file =>
    -      assertResult(CompressionKind.ZLIB) {
    -        OrcFileOperator.getFileReader(file).get.getCompression
    -      }
    -    }
    -  }
    -
    -  // Following codec is supported in hive-0.13.1, ignore it now
    -  ignore("Other compression options for writing to an ORC file - 0.13.1 and above") {
    +  // Hive supports zlib, snappy and none for Hive 1.2.1.
    +  test("Compression options for writing to an ORC file (SNAPPY, ZLIB and NONE)") {
         val data = (1 to 100).map(i => (i, s"val_$i"))
         val conf = sqlContext.sessionState.hadoopConf
     
    +    withOrcFile(data) { file =>
    +      val expectedCompressionKind =
    +        OrcFileOperator.getFileReader(file).get.getCompression
    +      assert(CompressionKind.ZLIB === expectedCompressionKind)
    +    }
    +
         conf.set(ConfVars.HIVE_ORC_DEFAULT_COMPRESS.varname, "SNAPPY")
    --- End diff --
    
    Can you set this in the option in DataFrameWriter? It should work now (we propagate the conf to the underlying hadoop conf when creating the writer).
    
    Also, let's use the string format of the key directly. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14917][SQL] Enable some ORC compression...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12699#issuecomment-214718841
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56994/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14917][SQL] Enable some ORC compression...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/12699#issuecomment-214705675
  
    Could I maybe cc @srowen please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14917][SQL] Enable some ORC compression...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12699#issuecomment-214718656
  
    **[Test build #56994 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56994/consoleFull)** for PR 12699 at commit [`666da8b`](https://github.com/apache/spark/commit/666da8b9c28cede8aba0d0557cbe55b42e74eedd).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14917][SQL] Enable some ORC compression...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12699#issuecomment-214718837
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14917][SQL] Enable some ORC compression...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12699#issuecomment-214703140
  
    **[Test build #56994 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56994/consoleFull)** for PR 12699 at commit [`666da8b`](https://github.com/apache/spark/commit/666da8b9c28cede8aba0d0557cbe55b42e74eedd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org