You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by dongjoon-hyun <gi...@git.apache.org> on 2018/02/14 18:26:29 UTC

[GitHub] spark pull request #20610: Use 'hive' for ORC

GitHub user dongjoon-hyun opened a pull request:

    https://github.com/apache/spark/pull/20610

    Use 'hive' for ORC

    ## What changes were proposed in this pull request?
    
    This PR change ORC implemention to `hive` like Spark 2.2.X
    
    ## How was this patch tested?
    
    Pass all test cases.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dongjoon-hyun/spark SPARK-ORC-DISABLE

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20610.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20610
    
----
commit 2d74b204b85db1ffcfb164a160e8f6f0d02d3f4b
Author: Dongjoon Hyun <do...@...>
Date:   2018-02-14T18:25:19Z

    Use 'hive' for ORC

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: Use 'hive' for ORC

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    **[Test build #87450 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87450/testReport)** for PR 20610 at commit [`2d74b20`](https://github.com/apache/spark/commit/2d74b204b85db1ffcfb164a160e8f6f0d02d3f4b).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    **[Test build #87460 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87460/testReport)** for PR 20610 at commit [`2769633`](https://github.com/apache/spark/commit/276963369b663c99aebcde10f4329f479f43b3ea).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    Thank you, @gatorsmile , @cloud-fan , and @viirya .


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    **[Test build #87453 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87453/testReport)** for PR 20610 at commit [`46c8697`](https://github.com/apache/spark/commit/46c8697b1981f57eeacb48bea31dec1e89f4e66a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disabl...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20610#discussion_r168347455
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala ---
    @@ -207,6 +207,16 @@ class FileStreamSourceSuite extends FileStreamSourceTest {
           .collect { case s @ StreamingRelation(dataSource, _, _) => s.schema }.head
       }
     
    +  override def beforeAll(): Unit = {
    +    super.beforeAll()
    +    spark.sessionState.conf.setConf(SQLConf.ORC_IMPLEMENTATION, "native")
    +  }
    +
    +  override def afterAll(): Unit = {
    +    spark.sessionState.conf.unsetConf(SQLConf.ORC_IMPLEMENTATION)
    +    super.afterAll()
    --- End diff --
    
    ```
        try {
          spark.sessionState.conf.unsetConf(SQLConf.ORC_IMPLEMENTATION)
        } finally {
          super.afterAll()
        }
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87460/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC implementation for Spa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/895/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/909/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disabl...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20610#discussion_r168322506
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
    @@ -20,14 +20,26 @@ package org.apache.spark.sql
     import java.io.FileNotFoundException
     
     import org.apache.hadoop.fs.Path
    +import org.scalatest.BeforeAndAfterAll
     
     import org.apache.spark.SparkException
     import org.apache.spark.sql.internal.SQLConf
     import org.apache.spark.sql.test.SharedSQLContext
     
    -class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext {
    +
    +class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext with BeforeAndAfterAll {
       import testImplicits._
     
    +  override def beforeAll(): Unit = {
    +    super.beforeAll()
    +    spark.sessionState.conf.setConf(SQLConf.ORC_IMPLEMENTATION, "native")
    +  }
    +
    +  override def afterAll(): Unit = {
    +    spark.sessionState.conf.unsetConf(SQLConf.ORC_IMPLEMENTATION)
    +    super.afterAll()
    +  }
    --- End diff --
    
    The test coverage is the same.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/915/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    Ur, can I make another PR to fix the test failures?
    ```
    Error Message
    org.apache.spark.sql.AnalysisException: Hive built-in ORC data source must be used with Hive support enabled. Please use the native ORC data source by setting 'spark.sql.orc.impl' to 'native';
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87475/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC implementation for Spa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    **[Test build #87471 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87471/testReport)** for PR 20610 at commit [`183ec21`](https://github.com/apache/spark/commit/183ec213b02ad528cb016e67ecc2bfb6394668f1).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disabl...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20610#discussion_r168395230
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1004,6 +1004,29 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
     </tr>
     </table>
     
    +## ORC Files
    +
    +Since Spark 2.3, Spark supports a vectorized ORC reader with a new ORC file format for ORC files.
    +To do that, the following configurations are newly added. The vectorized reader is used for the
    +native ORC tables (e.g., the ones created using the clause `USING ORC`) when `spark.sql.orc.impl`
    +is set to `native` and `spark.sql.orc.enableVectorizedReader` is set to `true`. For the Hive ORC
    +serde tables (e.g., the ones created using the clause `USING HIVE OPTIONS (fileFormat 'ORC')`),
    +the vectorized reader is used when `spark.sql.hive.convertMetastoreOrc` is also set to `true`.
    +
    +<table class="table">
    +  <tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th></tr>
    +  <tr>
    +    <td><code>spark.sql.orc.impl</code></td>
    +    <td><code>hive</code></td>
    +    <td>The name of ORC implementation. It can be one of <code>native</code> and <code>hive</code>. <code>native</code> means the native ORC support that is built on Apache ORC 1.4.1. `hive` means the ORC library in Hive 1.2.1 which is used prior to Spark 2.3.</td>
    --- End diff --
    
    Remove ` which is used prior to Spark 2.3`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    **[Test build #87471 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87471/testReport)** for PR 20610 at commit [`183ec21`](https://github.com/apache/spark/commit/183ec213b02ad528cb016e67ecc2bfb6394668f1).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disabl...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20610#discussion_r168347783
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1004,6 +1004,24 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
     </tr>
     </table>
     
    +## ORC Files
    +
    +Since Spark 2.3, Spark supports a vectorized ORC reader with a new ORC file format for ORC files. To do that, the following configurations are newly added. The vectorized reader is used for the native ORC tables (e.g., the ones created using the clause `USING ORC`) when `spark.sql.orc.impl` is set to `native` and `spark.sql.orc.enableVectorizedReader` is set to `true`. For the Hive ORC serde table (e.g., the ones created using the clause `USING HIVE OPTIONS (fileFormat 'ORC')`), the vectorized reader is used when `spark.sql.hive.convertMetastoreOrc` is set to `true`.
    --- End diff --
    
    `table ` -> `tables `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disabl...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20610#discussion_r168416253
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSinkSuite.scala ---
    @@ -33,6 +33,19 @@ import org.apache.spark.util.Utils
     class FileStreamSinkSuite extends StreamTest {
       import testImplicits._
     
    +  override def beforeAll(): Unit = {
    --- End diff --
    
    nit: a simpler way to fix this
    ```
    override val conf = new SQLConf().copy(SQLConf.ORC_IMPLEMENTATION -> "native")
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87450/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    **[Test build #87451 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87451/testReport)** for PR 20610 at commit [`7ff4ccf`](https://github.com/apache/spark/commit/7ff4ccf115437dadcda761890522a38960c5fde6).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    Retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87471/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disabl...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20610#discussion_r168271532
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1004,6 +1004,24 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
     </tr>
     </table>
     
    +## ORC Files
    +
    +Since Spark 2.3, Spark supports a vectorized ORC reader with a new ORC file format for ORC files. To do that, the following configurations are newly added. The vectorized reader is used for the native ORC tables (e.g., the ones created using the clause `USING ORC`) when `spark.sql.orc.impl` is set to `native` and `spark.sql.orc.enableVectorizedReader` is set to `true`. For the Hive ORC serde table (e.g., the ones created using the clause `USING HIVE OPTIONS (fileFormat 'ORC')`), the vectorized reader is used when `spark.sql.hive.convertMetastoreOrc` is set to `true`.
    +
    +<table class="table">
    +  <tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th></tr>
    +  <tr>
    +    <td><code>spark.sql.orc.impl</code></td>
    +    <td><code>hive</code></td>
    +    <td>The name of ORC implementation. It can be one of <code>native</code> and <code>hive</code>. <code>native</code> means the native ORC support that is built on Apache ORC 1.4.1. `hive` means the ORC library in Hive 1.2.1 which is used prior to Spark 2.3.</td>
    +  </tr>
    +  <tr>
    +    <td><code>spark.sql.orc.enableVectorizedReader</code></td>
    +    <td><code>true</code></td>
    +    <td>Enables vectorized orc decoding in <code>native</code> implementation. If <code>false</code>, a new non-vectorized ORC reader is used in <code>native</code> implementation. For <code>hive</code> implementation, this is ignored.</td>
    +  </tr>
    +</table>
    --- End diff --
    
    @gatorsmile . Now, this becomes a section.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20610: [SPARK-23426][SQL] Use `hive` ORC implementation ...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20610#discussion_r168268059
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1784,7 +1784,7 @@ working with timestamps in `pandas_udf`s to get the best performance, see
           <tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th></tr>
           <tr>
             <td><code>spark.sql.orc.impl</code></td>
    -        <td><code>native</code></td>
    +        <td><code>hive</code></td>
    --- End diff --
    
    Yep.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/912/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disabl...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20610#discussion_r168347968
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1004,6 +1004,24 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
     </tr>
     </table>
     
    +## ORC Files
    +
    +Since Spark 2.3, Spark supports a vectorized ORC reader with a new ORC file format for ORC files. To do that, the following configurations are newly added. The vectorized reader is used for the native ORC tables (e.g., the ones created using the clause `USING ORC`) when `spark.sql.orc.impl` is set to `native` and `spark.sql.orc.enableVectorizedReader` is set to `true`. For the Hive ORC serde table (e.g., the ones created using the clause `USING HIVE OPTIONS (fileFormat 'ORC')`), the vectorized reader is used when `spark.sql.hive.convertMetastoreOrc` is set to `true`.
    +
    +<table class="table">
    +  <tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th></tr>
    +  <tr>
    +    <td><code>spark.sql.orc.impl</code></td>
    +    <td><code>hive</code></td>
    +    <td>The name of ORC implementation. It can be one of <code>native</code> and <code>hive</code>. <code>native</code> means the native ORC support that is built on Apache ORC 1.4.1. `hive` means the ORC library in Hive 1.2.1 which is used prior to Spark 2.3.</td>
    +  </tr>
    +  <tr>
    +    <td><code>spark.sql.orc.enableVectorizedReader</code></td>
    +    <td><code>true</code></td>
    +    <td>Enables vectorized orc decoding in <code>native</code> implementation. If <code>false</code>, a new non-vectorized ORC reader is used in <code>native</code> implementation. For <code>hive</code> implementation, this is ignored.</td>
    +  </tr>
    +</table>
    --- End diff --
    
    Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/898/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    **[Test build #87453 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87453/testReport)** for PR 20610 at commit [`46c8697`](https://github.com/apache/spark/commit/46c8697b1981f57eeacb48bea31dec1e89f4e66a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext with BeforeAndAfterAll `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disabl...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20610#discussion_r168365935
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1004,6 +1004,29 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
     </tr>
     </table>
     
    +## ORC Files
    +
    +Since Spark 2.3, Spark supports a vectorized ORC reader with a new ORC file format for ORC files.
    +To do that, the following configurations are newly added. The vectorized reader is used for the
    +native ORC tables (e.g., the ones created using the clause `USING ORC`) when `spark.sql.orc.impl`
    +is set to `native` and `spark.sql.orc.enableVectorizedReader` is set to `true`. For the Hive ORC
    +serde tables (e.g., the ones created using the clause `USING HIVE OPTIONS (fileFormat 'ORC')`),
    +the vectorized reader is used when `spark.sql.hive.convertMetastoreOrc` is set to `true`.
    --- End diff --
    
    when `spark.sql.hive.convertMetastoreOrc` is (also )set to `true`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87453/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disabl...

Posted by raam86 <gi...@git.apache.org>.
Github user raam86 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20610#discussion_r177712534
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1784,7 +1784,7 @@ working with timestamps in `pandas_udf`s to get the best performance, see
           <tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th></tr>
           <tr>
             <td><code>spark.sql.orc.impl</code></td>
    -        <td><code>native</code></td>
    +        <td><code>hive</code></td>
    --- End diff --
    
    Is there a reason the `impl` was changed back to the old implementation? this breaks `spark.read.orc`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20610: [SPARK-23426][SQL] Use `hive` ORC implementation ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20610#discussion_r168267722
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -399,11 +399,11 @@ object SQLConf {
     
       val ORC_IMPLEMENTATION = buildConf("spark.sql.orc.impl")
         .doc("When native, use the native version of ORC support instead of the ORC library in Hive " +
    -      "1.2.1. It is 'hive' by default prior to Spark 2.3.")
    +      "1.2.1. It is 'hive' by default.")
         .internal()
         .stringConf
         .checkValues(Set("hive", "native"))
    -    .createWithDefault("native")
    +    .createWithDefault("hive")
    --- End diff --
    
    We also need to disable the ORC pushdown, because the ORC reader of Hive 1.2.1 has a few bugs. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    **[Test build #87451 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87451/testReport)** for PR 20610 at commit [`7ff4ccf`](https://github.com/apache/spark/commit/7ff4ccf115437dadcda761890522a38960c5fde6).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    LGTM
    
    Thanks! Merged to master/2.3


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disabl...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20610#discussion_r168355201
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1004,6 +1004,29 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
     </tr>
     </table>
     
    +## ORC Files
    +
    +Since Spark 2.3, Spark supports a vectorized ORC reader with a new ORC file format for ORC files.
    +To do that, the following configurations are newly added. The vectorized reader is used for the
    +native ORC tables (e.g., the ones created using the clause `USING ORC`) when `spark.sql.orc.impl`
    +is set to `native` and `spark.sql.orc.enableVectorizedReader` is set to `true`. For the Hive ORC
    +serde tables (e.g., the ones created using the clause `USING HIVE OPTIONS (fileFormat 'ORC')`),
    +the vectorized reader is used when `spark.sql.hive.convertMetastoreOrc` is set to `true`.
    +
    +<table class="table">
    +  <tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th></tr>
    +  <tr>
    +    <td><code>spark.sql.orc.impl</code></td>
    +    <td><code>hive</code></td>
    +    <td>The name of ORC implementation. It can be one of <code>native</code> and <code>hive</code>. <code>native</code> means the native ORC support that is built on Apache ORC 1.4.1. `hive` means the ORC library in Hive 1.2.1 which is used prior to Spark 2.3.</td>
    +  </tr>
    +  <tr>
    +    <td><code>spark.sql.orc.enableVectorizedReader</code></td>
    +    <td><code>true</code></td>
    +    <td>Enables vectorized orc decoding in <code>native</code> implementation. If <code>false</code>, a new non-vectorized ORC reader is used in <code>native</code> implementation. For <code>hive</code> implementation, this is ignored.</td>
    +  </tr>
    +</table>
    +
    --- End diff --
    
    Yes. It's disabled back. @viirya 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disabl...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20610#discussion_r168360556
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1004,6 +1004,24 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
     </tr>
     </table>
     
    +## ORC Files
    +
    +Since Spark 2.3, Spark supports a vectorized ORC reader with a new ORC file format for ORC files. To do that, the following configurations are newly added. The vectorized reader is used for the native ORC tables (e.g., the ones created using the clause `USING ORC`) when `spark.sql.orc.impl` is set to `native` and `spark.sql.orc.enableVectorizedReader` is set to `true`. For the Hive ORC serde table (e.g., the ones created using the clause `USING HIVE OPTIONS (fileFormat 'ORC')`), the vectorized reader is used when `spark.sql.hive.convertMetastoreOrc` is set to `true`.
    --- End diff --
    
    Ur, there is multiple `is set to true`. Which part do you mean?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    **[Test build #87468 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87468/testReport)** for PR 20610 at commit [`183ec21`](https://github.com/apache/spark/commit/183ec213b02ad528cb016e67ecc2bfb6394668f1).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disabl...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/20610


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disabl...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20610#discussion_r168355081
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1004,6 +1004,29 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
     </tr>
     </table>
     
    +## ORC Files
    +
    +Since Spark 2.3, Spark supports a vectorized ORC reader with a new ORC file format for ORC files.
    +To do that, the following configurations are newly added. The vectorized reader is used for the
    +native ORC tables (e.g., the ones created using the clause `USING ORC`) when `spark.sql.orc.impl`
    +is set to `native` and `spark.sql.orc.enableVectorizedReader` is set to `true`. For the Hive ORC
    +serde tables (e.g., the ones created using the clause `USING HIVE OPTIONS (fileFormat 'ORC')`),
    +the vectorized reader is used when `spark.sql.hive.convertMetastoreOrc` is set to `true`.
    +
    +<table class="table">
    +  <tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th></tr>
    +  <tr>
    +    <td><code>spark.sql.orc.impl</code></td>
    +    <td><code>hive</code></td>
    +    <td>The name of ORC implementation. It can be one of <code>native</code> and <code>hive</code>. <code>native</code> means the native ORC support that is built on Apache ORC 1.4.1. `hive` means the ORC library in Hive 1.2.1 which is used prior to Spark 2.3.</td>
    +  </tr>
    +  <tr>
    +    <td><code>spark.sql.orc.enableVectorizedReader</code></td>
    +    <td><code>true</code></td>
    +    <td>Enables vectorized orc decoding in <code>native</code> implementation. If <code>false</code>, a new non-vectorized ORC reader is used in <code>native</code> implementation. For <code>hive</code> implementation, this is ignored.</td>
    +  </tr>
    +</table>
    +
    --- End diff --
    
    The description of `spark.sql.orc.filterPushdown` is disappeared? 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    **[Test build #87450 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87450/testReport)** for PR 20610 at commit [`2d74b20`](https://github.com/apache/spark/commit/2d74b204b85db1ffcfb164a160e8f6f0d02d3f4b).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disabl...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20610#discussion_r168371933
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1004,6 +1004,29 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
     </tr>
     </table>
     
    +## ORC Files
    +
    +Since Spark 2.3, Spark supports a vectorized ORC reader with a new ORC file format for ORC files.
    +To do that, the following configurations are newly added. The vectorized reader is used for the
    +native ORC tables (e.g., the ones created using the clause `USING ORC`) when `spark.sql.orc.impl`
    +is set to `native` and `spark.sql.orc.enableVectorizedReader` is set to `true`. For the Hive ORC
    +serde tables (e.g., the ones created using the clause `USING HIVE OPTIONS (fileFormat 'ORC')`),
    +the vectorized reader is used when `spark.sql.hive.convertMetastoreOrc` is set to `true`.
    --- End diff --
    
    Thank you. I see.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87468/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    **[Test build #87460 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87460/testReport)** for PR 20610 at commit [`2769633`](https://github.com/apache/spark/commit/276963369b663c99aebcde10f4329f479f43b3ea).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disabl...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20610#discussion_r168354972
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala ---
    @@ -207,6 +207,16 @@ class FileStreamSourceSuite extends FileStreamSourceTest {
           .collect { case s @ StreamingRelation(dataSource, _, _) => s.schema }.head
       }
     
    +  override def beforeAll(): Unit = {
    +    super.beforeAll()
    +    spark.sessionState.conf.setConf(SQLConf.ORC_IMPLEMENTATION, "native")
    +  }
    +
    +  override def afterAll(): Unit = {
    +    spark.sessionState.conf.unsetConf(SQLConf.ORC_IMPLEMENTATION)
    +    super.afterAll()
    --- End diff --
    
    Thanks. Yep. It's done.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20610: [SPARK-23426][SQL] Use `hive` ORC implementation ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20610#discussion_r168267941
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1784,7 +1784,7 @@ working with timestamps in `pandas_udf`s to get the best performance, see
           <tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th></tr>
           <tr>
             <td><code>spark.sql.orc.impl</code></td>
    -        <td><code>native</code></td>
    +        <td><code>hive</code></td>
    --- End diff --
    
    We do not need this in the migration guide. Please create a new section for ORC


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    I think it makes sense to fix the test cases in the same PR, as long as they are not bug fixes. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disabl...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20610#discussion_r168531711
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSinkSuite.scala ---
    @@ -33,6 +33,19 @@ import org.apache.spark.util.Utils
     class FileStreamSinkSuite extends StreamTest {
       import testImplicits._
     
    +  override def beforeAll(): Unit = {
    --- End diff --
    
    Hi, @cloud-fan . 
    I tested it, but that doesn't work in this `FileStreamSinkSuite`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disabl...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20610#discussion_r168354630
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1004,6 +1004,24 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
     </tr>
     </table>
     
    +## ORC Files
    +
    +Since Spark 2.3, Spark supports a vectorized ORC reader with a new ORC file format for ORC files. To do that, the following configurations are newly added. The vectorized reader is used for the native ORC tables (e.g., the ones created using the clause `USING ORC`) when `spark.sql.orc.impl` is set to `native` and `spark.sql.orc.enableVectorizedReader` is set to `true`. For the Hive ORC serde table (e.g., the ones created using the clause `USING HIVE OPTIONS (fileFormat 'ORC')`), the vectorized reader is used when `spark.sql.hive.convertMetastoreOrc` is set to `true`.
    --- End diff --
    
    is set to `true` -> is also set to `true`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    **[Test build #87475 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87475/testReport)** for PR 20610 at commit [`19b50b1`](https://github.com/apache/spark/commit/19b50b1eb5dcdf02ecd515b5d27d0256c7f4a3ab).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    No problem.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/904/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87451/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    **[Test build #87475 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87475/testReport)** for PR 20610 at commit [`19b50b1`](https://github.com/apache/spark/commit/19b50b1eb5dcdf02ecd515b5d27d0256c7f4a3ab).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    **[Test build #87468 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87468/testReport)** for PR 20610 at commit [`183ec21`](https://github.com/apache/spark/commit/183ec213b02ad528cb016e67ecc2bfb6394668f1).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20610: [SPARK-23426][SQL] Use `hive` ORC implementation ...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20610#discussion_r168267868
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -399,11 +399,11 @@ object SQLConf {
     
       val ORC_IMPLEMENTATION = buildConf("spark.sql.orc.impl")
         .doc("When native, use the native version of ORC support instead of the ORC library in Hive " +
    -      "1.2.1. It is 'hive' by default prior to Spark 2.3.")
    +      "1.2.1. It is 'hive' by default.")
         .internal()
         .stringConf
         .checkValues(Set("hive", "native"))
    -    .createWithDefault("native")
    +    .createWithDefault("hive")
    --- End diff --
    
    Oh, right.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disabl...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20610#discussion_r168360639
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1004,6 +1004,29 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
     </tr>
     </table>
     
    +## ORC Files
    +
    +Since Spark 2.3, Spark supports a vectorized ORC reader with a new ORC file format for ORC files.
    +To do that, the following configurations are newly added. The vectorized reader is used for the
    +native ORC tables (e.g., the ones created using the clause `USING ORC`) when `spark.sql.orc.impl`
    +is set to `native` and `spark.sql.orc.enableVectorizedReader` is set to `true`. For the Hive ORC
    +serde tables (e.g., the ones created using the clause `USING HIVE OPTIONS (fileFormat 'ORC')`),
    +the vectorized reader is used when `spark.sql.hive.convertMetastoreOrc` is set to `true`.
    --- End diff --
    
    @viirya . I split into multiple lines. Could you point out once more?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disabl...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20610#discussion_r168401130
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1004,6 +1004,29 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
     </tr>
     </table>
     
    +## ORC Files
    +
    +Since Spark 2.3, Spark supports a vectorized ORC reader with a new ORC file format for ORC files.
    +To do that, the following configurations are newly added. The vectorized reader is used for the
    +native ORC tables (e.g., the ones created using the clause `USING ORC`) when `spark.sql.orc.impl`
    +is set to `native` and `spark.sql.orc.enableVectorizedReader` is set to `true`. For the Hive ORC
    +serde tables (e.g., the ones created using the clause `USING HIVE OPTIONS (fileFormat 'ORC')`),
    +the vectorized reader is used when `spark.sql.hive.convertMetastoreOrc` is also set to `true`.
    +
    +<table class="table">
    +  <tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th></tr>
    +  <tr>
    +    <td><code>spark.sql.orc.impl</code></td>
    +    <td><code>hive</code></td>
    +    <td>The name of ORC implementation. It can be one of <code>native</code> and <code>hive</code>. <code>native</code> means the native ORC support that is built on Apache ORC 1.4.1. `hive` means the ORC library in Hive 1.2.1 which is used prior to Spark 2.3.</td>
    --- End diff --
    
    Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20610
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/896/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org