You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by dongjoon-hyun <gi...@git.apache.org> on 2018/10/03 21:12:14 UTC

[GitHub] spark pull request #22622: [SPARK-25635][SQL][BUILD] Support selective direc...

GitHub user dongjoon-hyun opened a pull request:

    https://github.com/apache/spark/pull/22622

    [SPARK-25635][SQL][BUILD] Support selective direct encoding in native ORC write

    ## What changes were proposed in this pull request?
    
    
    Before ORC 1.5.3, `orc.dictionary.key.threshold` and `hive.exec.orc.dictionary.key.size.threshold` are applied for all columns. This has been a big huddle to enable dictionary encoding. From ORC 1.5.3, `orc.column.encoding.direct` is added to enforce direct encoding selectively in a column-wise manner. This PR aims to add that feature by upgrading ORC from 1.5.2 to 1.5.3.
    
    The followings are the patches in ORC 1.5.3 and this feature is the only one related to Spark directly.
    ```
    ORC-406: ORC: Char(n) and Varchar(n) writers truncate to n bytes & corrupts multi-byte data (gopalv)
    ORC-403: [C++] Add checks to avoid invalid offsets in InputStream
    ORC-405: Remove calcite as a dependency from the benchmarks.
    ORC-375: Fix libhdfs on gcc7 by adding #include <functional> two places.
    ORC-383: Parallel builds fails with ConcurrentModificationException
    ORC-382: Apache rat exclusions + add rat check to travis
    ORC-401: Fix incorrect quoting in specification.
    ORC-385: Change RecordReader to extend Closeable.
    ORC-384: [C++] fix memory leak when loading non-ORC files
    ORC-391: [c++] parseType does not accept underscore in the field name
    ORC-397: Allow selective disabling of dictionary encoding. Original patch was by Mithun Radhakrishnan.
    ORC-389: Add ability to not decode Acid metadata columns
    ```
    
    ## How was this patch tested?
    
    Pass the Jenkins with newly added test cases.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dongjoon-hyun/spark SPARK-25635

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22622.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22622
    
----
commit 39b7fd63c4ce5cbe6dc628ffb0170aef361461ef
Author: Dongjoon Hyun <do...@...>
Date:   2018-10-03T19:03:44Z

    [SPARK-25635][SQL][BUILD] Support selective direct encoding in native ORC write

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96907/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22622: [SPARK-25635][SQL][BUILD] Support selective direc...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22622#discussion_r222535254
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala ---
    @@ -284,4 +350,8 @@ class OrcSourceSuite extends OrcSuite with SharedSQLContext {
       test("Check BloomFilter creation") {
         testBloomFilterCreation(Kind.BLOOM_FILTER_UTF8) // After ORC-101
       }
    +
    +  test("Enforce direct encoding column-wise selectively") {
    +    testSelectiveDictionaryEncoding(true)
    --- End diff --
    
    how about `testSelectiveDictionaryEncoding(isSelective = true)`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3696/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22622: [SPARK-25635][SQL][BUILD] Support selective direc...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22622#discussion_r222868495
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala ---
    @@ -182,4 +182,12 @@ class HiveOrcSourceSuite extends OrcSuite with TestHiveSingleton {
           }
         }
       }
    +
    +  test("Enforce direct encoding column-wise selectively") {
    +    Seq(true, false).foreach { convertMetastore =>
    +      withSQLConf(HiveUtils.CONVERT_METASTORE_ORC.key -> s"$convertMetastore") {
    +        testSelectiveDictionaryEncoding(isSelective = false)
    --- End diff --
    
    When we change Spark behavior later, this test will be adapted according to it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96964/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22622: [SPARK-25635][SQL][BUILD] Support selective direc...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22622#discussion_r222535553
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala ---
    @@ -115,6 +116,71 @@ abstract class OrcSuite extends OrcTest with BeforeAndAfterAll {
         }
       }
     
    +  protected def testSelectiveDictionaryEncoding(isSelective: Boolean) {
    +    val tableName = "orcTable"
    +
    +    withTempDir { dir =>
    +      withTable(tableName) {
    +        val sqlStatement = orcImp match {
    +          case "native" =>
    +            s"""
    +               |CREATE TABLE $tableName (zipcode STRING, uuid STRING, value DOUBLE)
    +               |USING ORC
    +               |OPTIONS (
    +               |  path '${dir.toURI}',
    +               |  orc.dictionary.key.threshold '1.0',
    +               |  orc.column.encoding.direct 'uuid'
    --- End diff --
    
    How about changing column name? I thought it's some kind of enum to represent encoding stuff.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    **[Test build #96907 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96907/testReport)** for PR 22622 at commit [`39b7fd6`](https://github.com/apache/spark/commit/39b7fd63c4ce5cbe6dc628ffb0170aef361461ef).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    **[Test build #96977 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96977/testReport)** for PR 22622 at commit [`70016e4`](https://github.com/apache/spark/commit/70016e4896a42315e82141ac995bf12eded07f51).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22622: [SPARK-25635][SQL][BUILD] Support selective direc...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22622#discussion_r222868074
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala ---
    @@ -115,6 +116,69 @@ abstract class OrcSuite extends OrcTest with BeforeAndAfterAll {
         }
       }
     
    +  protected def testSelectiveDictionaryEncoding(isSelective: Boolean) {
    +    val tableName = "orcTable"
    +
    +    withTempDir { dir =>
    +      withTable(tableName) {
    +        val sqlStatement = orcImp match {
    +          case "native" =>
    +            s"""
    +               |CREATE TABLE $tableName (zipcode STRING, uniqColumn STRING, value DOUBLE)
    +               |USING ORC
    +               |OPTIONS (
    +               |  path '${dir.toURI}',
    +               |  orc.dictionary.key.threshold '1.0',
    +               |  orc.column.encoding.direct 'uniqColumn'
    +               |)
    +            """.stripMargin
    +          case "hive" =>
    +            s"""
    +               |CREATE TABLE $tableName (zipcode STRING, uniqColumn STRING, value DOUBLE)
    +               |STORED AS ORC
    +               |LOCATION '${dir.toURI}'
    +               |TBLPROPERTIES (
    +               |  orc.dictionary.key.threshold '1.0',
    +               |  hive.exec.orc.dictionary.key.size.threshold '1.0',
    +               |  orc.column.encoding.direct 'uniqColumn'
    +               |)
    +            """.stripMargin
    +          case impl =>
    +            throw new UnsupportedOperationException(s"Unknown ORC implementation: $impl")
    +        }
    +
    +        sql(sqlStatement)
    +        sql(s"INSERT INTO $tableName VALUES ('94086', 'random-uuid-string', 0.0)")
    +
    +        val partFiles = dir.listFiles()
    +          .filter(f => f.isFile && !f.getName.startsWith(".") && !f.getName.startsWith("_"))
    +        assert(partFiles.length === 1)
    +
    +        val orcFilePath = new Path(partFiles.head.getAbsolutePath)
    +        val readerOptions = OrcFile.readerOptions(new Configuration())
    +        val reader = OrcFile.createReader(orcFilePath, readerOptions)
    +        var recordReader: RecordReaderImpl = null
    +        try {
    +          recordReader = reader.rows.asInstanceOf[RecordReaderImpl]
    +
    +          // Check the kind
    +          val stripe = recordReader.readStripeFooter(reader.getStripes.get(0))
    +          assert(stripe.getColumns(1).getKind === DICTIONARY_V2)
    --- End diff --
    
    Sure!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3662/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22622: [SPARK-25635][SQL][BUILD] Support selective direc...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22622


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    Retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96949/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    Thank you, @HyukjinKwon !


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3701/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22622: [SPARK-25635][SQL][BUILD] Support selective direc...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22622#discussion_r223077569
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala ---
    @@ -115,6 +116,69 @@ abstract class OrcSuite extends OrcTest with BeforeAndAfterAll {
         }
       }
     
    +  protected def testSelectiveDictionaryEncoding(isSelective: Boolean) {
    +    val tableName = "orcTable"
    +
    +    withTempDir { dir =>
    +      withTable(tableName) {
    +        val sqlStatement = orcImp match {
    +          case "native" =>
    +            s"""
    +               |CREATE TABLE $tableName (zipcode STRING, uniqColumn STRING, value DOUBLE)
    +               |USING ORC
    +               |OPTIONS (
    +               |  path '${dir.toURI}',
    +               |  orc.dictionary.key.threshold '1.0',
    +               |  orc.column.encoding.direct 'uniqColumn'
    --- End diff --
    
    Maybe, dictionary encoding could be a good candidate; `parquet.enable.dictionary` and `orc.dictionary.key.threshold` et al.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3688/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    **[Test build #96963 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96963/testReport)** for PR 22622 at commit [`70016e4`](https://github.com/apache/spark/commit/70016e4896a42315e82141ac995bf12eded07f51).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22622: [SPARK-25635][SQL][BUILD] Support selective direc...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22622#discussion_r222750086
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala ---
    @@ -115,6 +116,71 @@ abstract class OrcSuite extends OrcTest with BeforeAndAfterAll {
         }
       }
     
    +  protected def testSelectiveDictionaryEncoding(isSelective: Boolean) {
    +    val tableName = "orcTable"
    +
    +    withTempDir { dir =>
    +      withTable(tableName) {
    +        val sqlStatement = orcImp match {
    +          case "native" =>
    +            s"""
    +               |CREATE TABLE $tableName (zipcode STRING, uuid STRING, value DOUBLE)
    +               |USING ORC
    +               |OPTIONS (
    +               |  path '${dir.toURI}',
    +               |  orc.dictionary.key.threshold '1.0',
    +               |  orc.column.encoding.direct 'uuid'
    +               |)
    +            """.stripMargin
    +          case "hive" =>
    +            s"""
    +               |CREATE TABLE $tableName (zipcode STRING, uuid STRING, value DOUBLE)
    +               |STORED AS ORC
    +               |LOCATION '${dir.toURI}'
    +               |TBLPROPERTIES (
    +               |  orc.dictionary.key.threshold '1.0',
    +               |  hive.exec.orc.dictionary.key.size.threshold '1.0',
    +               |  orc.column.encoding.direct 'uuid'
    +               |)
    +            """.stripMargin
    +          case impl =>
    +            throw new UnsupportedOperationException(s"Unknown ORC implementation: $impl")
    +        }
    +
    +        sql(sqlStatement)
    +        sql(s"INSERT INTO $tableName VALUES ('94086', 'random-uuid-string', 0.0)")
    +
    +        val partFiles = dir.listFiles()
    +          .filter(f => f.isFile && !f.getName.startsWith(".") && !f.getName.startsWith("_"))
    +        assert(partFiles.length === 1)
    +
    +        val orcFilePath = new Path(partFiles.head.getAbsolutePath)
    +        val readerOptions = OrcFile.readerOptions(new Configuration())
    +        val reader = OrcFile.createReader(orcFilePath, readerOptions)
    +        var recordReader: RecordReaderImpl = null
    +        try {
    +          recordReader = reader.rows.asInstanceOf[RecordReaderImpl]
    +
    +          // Check the kind
    +          val stripe = recordReader.readStripeFooter(reader.getStripes.get(0))
    +          if (isSelective) {
    +            assert(stripe.getColumns(1).getKind === DICTIONARY_V2)
    --- End diff --
    
    For this, I will update like the following.
    ```
              assert(stripe.getColumns(1).getKind === DICTIONARY_V2)
              if (isSelective) {
                assert(stripe.getColumns(2).getKind === DIRECT_V2)
              } else {
                assert(stripe.getColumns(2).getKind === DICTIONARY_V2)
              }
              assert(stripe.getColumns(3).getKind === DIRECT)
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    **[Test build #96949 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96949/testReport)** for PR 22622 at commit [`65ac786`](https://github.com/apache/spark/commit/65ac7869188e93473a7fd7f43575b728207ff218).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    **[Test build #96957 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96957/testReport)** for PR 22622 at commit [`65ac786`](https://github.com/apache/spark/commit/65ac7869188e93473a7fd7f43575b728207ff218).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    Build failure is irrelevant to this PR.
    ```
    [error] (hive-thriftserver/compile:compileIncremental) javac returned nonzero exit code
    [error] Total time: 587 s, completed Oct 4, 2018 8:11:23 PM
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22622: [SPARK-25635][SQL][BUILD] Support selective direc...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22622#discussion_r223158108
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala ---
    @@ -115,6 +116,69 @@ abstract class OrcSuite extends OrcTest with BeforeAndAfterAll {
         }
       }
     
    +  protected def testSelectiveDictionaryEncoding(isSelective: Boolean) {
    +    val tableName = "orcTable"
    +
    +    withTempDir { dir =>
    +      withTable(tableName) {
    +        val sqlStatement = orcImp match {
    +          case "native" =>
    +            s"""
    +               |CREATE TABLE $tableName (zipcode STRING, uniqColumn STRING, value DOUBLE)
    +               |USING ORC
    +               |OPTIONS (
    +               |  path '${dir.toURI}',
    +               |  orc.dictionary.key.threshold '1.0',
    +               |  orc.column.encoding.direct 'uniqColumn'
    --- End diff --
    
    https://issues.apache.org/jira/browse/SPARK-25656 is created for that.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22622: [SPARK-25635][SQL][BUILD] Support selective direc...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22622#discussion_r222880436
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala ---
    @@ -115,6 +116,69 @@ abstract class OrcSuite extends OrcTest with BeforeAndAfterAll {
         }
       }
     
    +  protected def testSelectiveDictionaryEncoding(isSelective: Boolean) {
    +    val tableName = "orcTable"
    +
    +    withTempDir { dir =>
    +      withTable(tableName) {
    +        val sqlStatement = orcImp match {
    +          case "native" =>
    +            s"""
    +               |CREATE TABLE $tableName (zipcode STRING, uniqColumn STRING, value DOUBLE)
    +               |USING ORC
    +               |OPTIONS (
    +               |  path '${dir.toURI}',
    +               |  orc.dictionary.key.threshold '1.0',
    +               |  orc.column.encoding.direct 'uniqColumn'
    --- End diff --
    
    Ur, Apache ORC is an independent Apache project which has its own website and documents. We should respect that. If we introduce new ORC configuration one by one in Apache Spark website, it will eventually duplicate Apache ORC document in Apache Spark document.
    
    We had better guide ORC fans to Apache ORC website. If something is missing there, they can file an ORC JIRA, not SPARK JIRA.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22622: [SPARK-25635][SQL][BUILD] Support selective direc...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22622#discussion_r223058276
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala ---
    @@ -115,6 +116,69 @@ abstract class OrcSuite extends OrcTest with BeforeAndAfterAll {
         }
       }
     
    +  protected def testSelectiveDictionaryEncoding(isSelective: Boolean) {
    +    val tableName = "orcTable"
    +
    +    withTempDir { dir =>
    +      withTable(tableName) {
    +        val sqlStatement = orcImp match {
    +          case "native" =>
    +            s"""
    +               |CREATE TABLE $tableName (zipcode STRING, uniqColumn STRING, value DOUBLE)
    +               |USING ORC
    +               |OPTIONS (
    +               |  path '${dir.toURI}',
    +               |  orc.dictionary.key.threshold '1.0',
    +               |  orc.column.encoding.direct 'uniqColumn'
    --- End diff --
    
    That sounds like a different issue. This PR covers both `TBLPROPERTIES` and `OPTIONS` syntaxes where are designed for that configuration-purpose historically. I mean this is not about data-source specific PR. Also, the scope of this PR is only write-side configurations.
    
    In any way, +1 for adding some introduction section for both Parquet/ORC examples there. We had better give both read/write side configuration examples, too. Could you file a JIRA issue for that?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22622: [SPARK-25635][SQL][BUILD] Support selective direc...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22622#discussion_r222868403
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala ---
    @@ -182,4 +182,12 @@ class HiveOrcSourceSuite extends OrcSuite with TestHiveSingleton {
           }
         }
       }
    +
    +  test("Enforce direct encoding column-wise selectively") {
    +    Seq(true, false).foreach { convertMetastore =>
    +      withSQLConf(HiveUtils.CONVERT_METASTORE_ORC.key -> s"$convertMetastore") {
    +        testSelectiveDictionaryEncoding(isSelective = false)
    --- End diff --
    
    Yep. This is based on the current behavior which is a little related to your CTAS PR. Only read-path works as expected.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96963/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    **[Test build #96907 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96907/testReport)** for PR 22622 at commit [`39b7fd6`](https://github.com/apache/spark/commit/39b7fd63c4ce5cbe6dc628ffb0170aef361461ef).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    @gatorsmile . Could you review this again? For your comment, I added [SPARK-25656 Add an example section about how to use Parquet/ORC library options](https://issues.apache.org/jira/browse/SPARK-25656).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96957/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    Which looks now fixed in https://github.com/apache/spark/commit/5ae20cf1a96a33f5de4435fcfb55914d64466525


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    **[Test build #96957 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96957/testReport)** for PR 22622 at commit [`65ac786`](https://github.com/apache/spark/commit/65ac7869188e93473a7fd7f43575b728207ff218).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    Could you review this, @gatorsmile and @cloud-fan ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    **[Test build #96949 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96949/testReport)** for PR 22622 at commit [`65ac786`](https://github.com/apache/spark/commit/65ac7869188e93473a7fd7f43575b728207ff218).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    **[Test build #96964 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96964/testReport)** for PR 22622 at commit [`70016e4`](https://github.com/apache/spark/commit/70016e4896a42315e82141ac995bf12eded07f51).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3695/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22622: [SPARK-25635][SQL][BUILD] Support selective direc...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22622#discussion_r222855990
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala ---
    @@ -115,6 +116,69 @@ abstract class OrcSuite extends OrcTest with BeforeAndAfterAll {
         }
       }
     
    +  protected def testSelectiveDictionaryEncoding(isSelective: Boolean) {
    +    val tableName = "orcTable"
    +
    +    withTempDir { dir =>
    +      withTable(tableName) {
    +        val sqlStatement = orcImp match {
    +          case "native" =>
    +            s"""
    +               |CREATE TABLE $tableName (zipcode STRING, uniqColumn STRING, value DOUBLE)
    +               |USING ORC
    +               |OPTIONS (
    +               |  path '${dir.toURI}',
    +               |  orc.dictionary.key.threshold '1.0',
    +               |  orc.column.encoding.direct 'uniqColumn'
    +               |)
    +            """.stripMargin
    +          case "hive" =>
    +            s"""
    +               |CREATE TABLE $tableName (zipcode STRING, uniqColumn STRING, value DOUBLE)
    +               |STORED AS ORC
    +               |LOCATION '${dir.toURI}'
    +               |TBLPROPERTIES (
    +               |  orc.dictionary.key.threshold '1.0',
    +               |  hive.exec.orc.dictionary.key.size.threshold '1.0',
    +               |  orc.column.encoding.direct 'uniqColumn'
    +               |)
    +            """.stripMargin
    +          case impl =>
    +            throw new UnsupportedOperationException(s"Unknown ORC implementation: $impl")
    +        }
    +
    +        sql(sqlStatement)
    +        sql(s"INSERT INTO $tableName VALUES ('94086', 'random-uuid-string', 0.0)")
    +
    +        val partFiles = dir.listFiles()
    +          .filter(f => f.isFile && !f.getName.startsWith(".") && !f.getName.startsWith("_"))
    +        assert(partFiles.length === 1)
    +
    +        val orcFilePath = new Path(partFiles.head.getAbsolutePath)
    +        val readerOptions = OrcFile.readerOptions(new Configuration())
    +        val reader = OrcFile.createReader(orcFilePath, readerOptions)
    +        var recordReader: RecordReaderImpl = null
    +        try {
    +          recordReader = reader.rows.asInstanceOf[RecordReaderImpl]
    +
    +          // Check the kind
    +          val stripe = recordReader.readStripeFooter(reader.getStripes.get(0))
    +          assert(stripe.getColumns(1).getKind === DICTIONARY_V2)
    --- End diff --
    
    Could you write some comments to explain what `DICTIONARY_V2 `, `DIRECT_V2 ` and `DIRECT` are?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    Thank you for review, @HyukjinKwon . Sure, I'll update like that.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22622: [SPARK-25635][SQL][BUILD] Support selective direc...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22622#discussion_r222865396
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala ---
    @@ -182,4 +182,12 @@ class HiveOrcSourceSuite extends OrcSuite with TestHiveSingleton {
           }
         }
       }
    +
    +  test("Enforce direct encoding column-wise selectively") {
    +    Seq(true, false).foreach { convertMetastore =>
    +      withSQLConf(HiveUtils.CONVERT_METASTORE_ORC.key -> s"$convertMetastore") {
    +        testSelectiveDictionaryEncoding(isSelective = false)
    --- End diff --
    
    So even with `CONVERT_METASTORE_ORC` as true, we still can't use selective direct encoding?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    **[Test build #96964 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96964/testReport)** for PR 22622 at commit [`70016e4`](https://github.com/apache/spark/commit/70016e4896a42315e82141ac995bf12eded07f51).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    Thank you, @gatorsmile, @HyukjinKwon , @viirya , @dilipbiswal !


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3685/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22622: [SPARK-25635][SQL][BUILD] Support selective direc...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22622#discussion_r222535182
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala ---
    @@ -115,6 +116,71 @@ abstract class OrcSuite extends OrcTest with BeforeAndAfterAll {
         }
       }
     
    +  protected def testSelectiveDictionaryEncoding(isSelective: Boolean) {
    +    val tableName = "orcTable"
    +
    +    withTempDir { dir =>
    +      withTable(tableName) {
    +        val sqlStatement = orcImp match {
    +          case "native" =>
    +            s"""
    +               |CREATE TABLE $tableName (zipcode STRING, uuid STRING, value DOUBLE)
    +               |USING ORC
    +               |OPTIONS (
    +               |  path '${dir.toURI}',
    +               |  orc.dictionary.key.threshold '1.0',
    +               |  orc.column.encoding.direct 'uuid'
    +               |)
    +            """.stripMargin
    +          case "hive" =>
    +            s"""
    +               |CREATE TABLE $tableName (zipcode STRING, uuid STRING, value DOUBLE)
    +               |STORED AS ORC
    +               |LOCATION '${dir.toURI}'
    +               |TBLPROPERTIES (
    +               |  orc.dictionary.key.threshold '1.0',
    +               |  hive.exec.orc.dictionary.key.size.threshold '1.0',
    +               |  orc.column.encoding.direct 'uuid'
    +               |)
    +            """.stripMargin
    +          case impl =>
    +            throw new UnsupportedOperationException(s"Unknown ORC implementation: $impl")
    +        }
    +
    +        sql(sqlStatement)
    +        sql(s"INSERT INTO $tableName VALUES ('94086', 'random-uuid-string', 0.0)")
    +
    +        val partFiles = dir.listFiles()
    +          .filter(f => f.isFile && !f.getName.startsWith(".") && !f.getName.startsWith("_"))
    +        assert(partFiles.length === 1)
    +
    +        val orcFilePath = new Path(partFiles.head.getAbsolutePath)
    +        val readerOptions = OrcFile.readerOptions(new Configuration())
    +        val reader = OrcFile.createReader(orcFilePath, readerOptions)
    +        var recordReader: RecordReaderImpl = null
    +        try {
    +          recordReader = reader.rows.asInstanceOf[RecordReaderImpl]
    +
    +          // Check the kind
    +          val stripe = recordReader.readStripeFooter(reader.getStripes.get(0))
    +          if (isSelective) {
    +            assert(stripe.getColumns(1).getKind === DICTIONARY_V2)
    --- End diff --
    
    @dongjoon-hyun, how about:
    
    ```
    assert(stripe.getColumns(1).getKind === DICTIONARY_V2)
    assert(stripe.getColumns(3).getKind === DIRECT)
    if (isSelective) {
      assert(stripe.getColumns(2).getKind === DIRECT_V2)
    } else {
      assert(stripe.getColumns(2).getKind === DICTIONARY_V2)
    }
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22622: [SPARK-25635][SQL][BUILD] Support selective direc...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22622#discussion_r222911645
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala ---
    @@ -115,6 +116,69 @@ abstract class OrcSuite extends OrcTest with BeforeAndAfterAll {
         }
       }
     
    +  protected def testSelectiveDictionaryEncoding(isSelective: Boolean) {
    +    val tableName = "orcTable"
    +
    +    withTempDir { dir =>
    +      withTable(tableName) {
    +        val sqlStatement = orcImp match {
    +          case "native" =>
    +            s"""
    +               |CREATE TABLE $tableName (zipcode STRING, uniqColumn STRING, value DOUBLE)
    +               |USING ORC
    +               |OPTIONS (
    +               |  path '${dir.toURI}',
    +               |  orc.dictionary.key.threshold '1.0',
    +               |  orc.column.encoding.direct 'uniqColumn'
    --- End diff --
    
    Also give an example?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22622: [SPARK-25635][SQL][BUILD] Support selective direc...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22622#discussion_r222876905
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala ---
    @@ -115,6 +116,69 @@ abstract class OrcSuite extends OrcTest with BeforeAndAfterAll {
         }
       }
     
    +  protected def testSelectiveDictionaryEncoding(isSelective: Boolean) {
    +    val tableName = "orcTable"
    +
    +    withTempDir { dir =>
    +      withTable(tableName) {
    +        val sqlStatement = orcImp match {
    +          case "native" =>
    +            s"""
    +               |CREATE TABLE $tableName (zipcode STRING, uniqColumn STRING, value DOUBLE)
    +               |USING ORC
    +               |OPTIONS (
    +               |  path '${dir.toURI}',
    +               |  orc.dictionary.key.threshold '1.0',
    +               |  orc.column.encoding.direct 'uniqColumn'
    --- End diff --
    
    This new feature needs a doc update. We need to let our end users how to use it. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22622: [SPARK-25635][SQL][BUILD] Support selective direc...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22622#discussion_r222871049
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala ---
    @@ -182,4 +182,12 @@ class HiveOrcSourceSuite extends OrcSuite with TestHiveSingleton {
           }
         }
       }
    +
    +  test("Enforce direct encoding column-wise selectively") {
    +    Seq(true, false).foreach { convertMetastore =>
    +      withSQLConf(HiveUtils.CONVERT_METASTORE_ORC.key -> s"$convertMetastore") {
    +        testSelectiveDictionaryEncoding(isSelective = false)
    --- End diff --
    
    Ok. I see. Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    **[Test build #96963 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96963/testReport)** for PR 22622 at commit [`70016e4`](https://github.com/apache/spark/commit/70016e4896a42315e82141ac995bf12eded07f51).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96977/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    Thanks! Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22622: [SPARK-25635][SQL][BUILD] Support selective direc...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22622#discussion_r222911529
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala ---
    @@ -115,6 +116,69 @@ abstract class OrcSuite extends OrcTest with BeforeAndAfterAll {
         }
       }
     
    +  protected def testSelectiveDictionaryEncoding(isSelective: Boolean) {
    +    val tableName = "orcTable"
    +
    +    withTempDir { dir =>
    +      withTable(tableName) {
    +        val sqlStatement = orcImp match {
    +          case "native" =>
    +            s"""
    +               |CREATE TABLE $tableName (zipcode STRING, uniqColumn STRING, value DOUBLE)
    +               |USING ORC
    +               |OPTIONS (
    +               |  path '${dir.toURI}',
    +               |  orc.dictionary.key.threshold '1.0',
    +               |  orc.column.encoding.direct 'uniqColumn'
    --- End diff --
    
    I am fine either way. However, our current doc does not explain we are passing the data source specific options to the underlying data source:
    
    https://spark.apache.org/docs/latest/sql-programming-guide.html#manually-specifying-options
    
    Could you help improve it? 



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22622: [SPARK-25635][SQL][BUILD] Support selective direct encod...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22622
  
    **[Test build #96977 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96977/testReport)** for PR 22622 at commit [`70016e4`](https://github.com/apache/spark/commit/70016e4896a42315e82141ac995bf12eded07f51).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org