You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by dongjoon-hyun <gi...@git.apache.org> on 2018/11/03 06:50:07 UTC

[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...

GitHub user dongjoon-hyun opened a pull request:

    https://github.com/apache/spark/pull/22932

    [SPARK-25102][SQL] Write Spark version to ORC/Parquet file metadata

    ## What changes were proposed in this pull request?
    
    Currently, Spark writes Spark version number into Hive Table properties with `spark.sql.create.version`.
    ```
    parameters:{
      spark.sql.sources.schema.part.0={
        "type":"struct",
        "fields":[{"name":"a","type":"integer","nullable":true,"metadata":{}}]
      },
      transient_lastDdlTime=1541142761, 
      spark.sql.sources.schema.numParts=1,
      spark.sql.create.version=2.4.0
    }
    ```
    
    This PR aims to write Spark versions to ORC/Parquet file metadata with `org.apache.spark.sql.create.version`. It's different from Hive Table property key `spark.sql.create.version`, but it seems that we cannot change that for backward compatibility.
    
    **ORC (`native` and `hive` implmentation)**
    ```
    File Version: 0.12 with ORC_135
    ...
    User Metadata:
      org.apache.spark.sql.create.version=3.0.0-SNAPSHOT
    ```
    
    **PARQUET**
    ```
    creator:     parquet-mr version 1.10.0 (build 031a6654009e3b82020012a18434c582bd74c73a)
    extra:       org.apache.spark.sql.create.version = 3.0.0-SNAPSHOT
    extra:       org.apache.spark.sql.parquet.row.metadata = {"type":"struct","fields":[{"name":"id","type":"long","nullable":false,"metadata":{}}]}
    ```
    
    ## How was this patch tested?
    
    Pass the Jenkins with newly added test cases.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dongjoon-hyun/spark SPARK-25102

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22932.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22932
    
----
commit 601ccbb4e20a068469839bc71870230cfb6fd7a1
Author: Dongjoon Hyun <do...@...>
Date:   2018-11-03T06:43:48Z

    [SPARK-25102][SQL] Write Spark version to ORC/Parquet file metadata

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22932#discussion_r232430599
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala ---
    @@ -274,6 +278,15 @@ private[orc] class OrcOutputWriter(
     
       override def close(): Unit = {
         if (recordWriterInstantiated) {
    +      // Hive 1.2.1 ORC initializes its private `writer` field at the first write.
    +      try {
    +        val writerField = recordWriter.getClass.getDeclaredField("writer")
    +        writerField.setAccessible(true)
    +        val writer = writerField.get(recordWriter).asInstanceOf[Writer]
    +        writer.addUserMetadata(SPARK_VERSION_METADATA_KEY, UTF_8.encode(SPARK_VERSION_SHORT))
    +      } catch {
    +        case NonFatal(e) => log.warn(e.toString, e)
    +      }
    --- End diff --
    
    For this case, I'll refactor out all the new code (line 281 ~ 289).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    Retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22932#discussion_r232444034
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcOutputWriter.scala ---
    @@ -36,11 +37,17 @@ private[orc] class OrcOutputWriter(
       private[this] val serializer = new OrcSerializer(dataSchema)
     
       private val recordWriter = {
    -    new OrcOutputFormat[OrcStruct]() {
    +    val orcOutputFormat = new OrcOutputFormat[OrcStruct]() {
           override def getDefaultWorkFile(context: TaskAttemptContext, extension: String): Path = {
             new Path(path)
           }
    -    }.getRecordWriter(context)
    +    }
    +    val filename = orcOutputFormat.getDefaultWorkFile(context, ".orc")
    +    val options = OrcMapRedOutputFormat.buildOptions(context.getConfiguration)
    +    val writer = OrcFile.createWriter(filename, options)
    +    val recordWriter = new OrcMapreduceRecordWriter[OrcStruct](writer)
    --- End diff --
    
    Right. To avoid reflection, this was the only way.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4741/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98430/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4734/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4867/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    **[Test build #98420 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98420/testReport)** for PR 22932 at commit [`601ccbb`](https://github.com/apache/spark/commit/601ccbb4e20a068469839bc71870230cfb6fd7a1).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    double checked. A late LGTM too


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22932#discussion_r232428173
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcOutputWriter.scala ---
    @@ -36,11 +41,17 @@ private[orc] class OrcOutputWriter(
       private[this] val serializer = new OrcSerializer(dataSchema)
     
       private val recordWriter = {
    -    new OrcOutputFormat[OrcStruct]() {
    +    val orcOutputFormat = new OrcOutputFormat[OrcStruct]() {
           override def getDefaultWorkFile(context: TaskAttemptContext, extension: String): Path = {
             new Path(path)
           }
    -    }.getRecordWriter(context)
    +    }
    +    val filename = orcOutputFormat.getDefaultWorkFile(context, ".orc")
    +    val options = OrcMapRedOutputFormat.buildOptions(context.getConfiguration)
    +    val writer = OrcFile.createWriter(filename, options)
    +    val recordWriter = new OrcMapreduceRecordWriter[OrcStruct](writer)
    +    writer.addUserMetadata(SPARK_VERSION_METADATA_KEY, UTF_8.encode(SPARK_VERSION_SHORT))
    --- End diff --
    
    Thank you for review, @gatorsmile . Sure. I'll refactor out the following line.
    ```
    writer.addUserMetadata(SPARK_VERSION_METADATA_KEY, UTF_8.encode(SPARK_VERSION_SHORT))
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    **[Test build #98420 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98420/testReport)** for PR 22932 at commit [`601ccbb`](https://github.com/apache/spark/commit/601ccbb4e20a068469839bc71870230cfb6fd7a1).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4733/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22932#discussion_r230564513
  
    --- Diff: sql/core/src/test/resources/sql-tests/results/describe-part-after-analyze.sql.out ---
    @@ -93,7 +93,7 @@ Partition Values    	[ds=2017-08-01, hr=10]
     Location [not included in comparison]sql/core/spark-warehouse/t/ds=2017-08-01/hr=10	                    
     Created Time [not included in comparison]
     Last Access [not included in comparison]
    -Partition Statistics	1121 bytes, 3 rows  	                    
    +Partition Statistics	1229 bytes, 3 rows  	                    
    --- End diff --
    
    Right, @gatorsmile .


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    **[Test build #98456 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98456/testReport)** for PR 22932 at commit [`ef49a27`](https://github.com/apache/spark/commit/ef49a277d3fd39c6fd91b3fcda65f660b833ec95).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98429/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98420/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    **[Test build #98429 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98429/testReport)** for PR 22932 at commit [`1ed6368`](https://github.com/apache/spark/commit/1ed63683f2f8f7361a83892d38c84f40e2464590).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    LGTM. Thanks! Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98625/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    **[Test build #98625 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98625/testReport)** for PR 22932 at commit [`396540a`](https://github.com/apache/spark/commit/396540ad6ca4565bf4ed6aa3fd4f61766b579994).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    Could you review this please, @gatorsmile ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    Does it have different values for
    
    new native ORC writer, old Hive ORC writer
    
    ________________________________
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22932#discussion_r231775619
  
    --- Diff: sql/core/src/test/resources/sql-tests/results/describe-part-after-analyze.sql.out ---
    @@ -93,7 +93,7 @@ Partition Values    	[ds=2017-08-01, hr=10]
     Location [not included in comparison]sql/core/spark-warehouse/t/ds=2017-08-01/hr=10	                    
     Created Time [not included in comparison]
     Last Access [not included in comparison]
    -Partition Statistics	1121 bytes, 3 rows  	                    
    +Partition Statistics	1229 bytes, 3 rows  	                    
    --- End diff --
    
    Nice catch! Hmm. I think we should not measure the bytes in the test case.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22932#discussion_r232443802
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcOutputWriter.scala ---
    @@ -36,11 +37,17 @@ private[orc] class OrcOutputWriter(
       private[this] val serializer = new OrcSerializer(dataSchema)
     
       private val recordWriter = {
    -    new OrcOutputFormat[OrcStruct]() {
    +    val orcOutputFormat = new OrcOutputFormat[OrcStruct]() {
           override def getDefaultWorkFile(context: TaskAttemptContext, extension: String): Path = {
             new Path(path)
           }
    -    }.getRecordWriter(context)
    +    }
    +    val filename = orcOutputFormat.getDefaultWorkFile(context, ".orc")
    +    val options = OrcMapRedOutputFormat.buildOptions(context.getConfiguration)
    +    val writer = OrcFile.createWriter(filename, options)
    +    val recordWriter = new OrcMapreduceRecordWriter[OrcStruct](writer)
    --- End diff --
    
    This is basically copied from getRecordWriter


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22932#discussion_r230604337
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/package.scala ---
    @@ -44,4 +44,13 @@ package object sql {
       type Strategy = SparkStrategy
     
       type DataFrame = Dataset[Row]
    +
    +  /**
    +   * Metadata key which is used to write Spark version in the followings:
    +   * - Parquet file metadata
    +   * - ORC file metadata
    +   *
    +   * Note that Hive table property `spark.sql.create.version` also has Spark version.
    +   */
    +  private[sql] val CREATE_VERSION = "org.apache.spark.sql.create.version"
    --- End diff --
    
    Is this a pre-existing key? Seems that `org.apache.spark.version` should be enough.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4807/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22932#discussion_r232428893
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala ---
    @@ -274,6 +278,15 @@ private[orc] class OrcOutputWriter(
     
       override def close(): Unit = {
         if (recordWriterInstantiated) {
    +      // Hive 1.2.1 ORC initializes its private `writer` field at the first write.
    +      try {
    +        val writerField = recordWriter.getClass.getDeclaredField("writer")
    +        writerField.setAccessible(true)
    +        val writer = writerField.get(recordWriter).asInstanceOf[Writer]
    +        writer.addUserMetadata(SPARK_VERSION_METADATA_KEY, UTF_8.encode(SPARK_VERSION_SHORT))
    +      } catch {
    +        case NonFatal(e) => log.warn(e.toString, e)
    +      }
    --- End diff --
    
    BTW, as you expected, we cannot use a single function for this. The `Writer` are not the same.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    Thank you for review, @felixcheung . Could you elaborate a little bit more? Here, three writers are used: new native ORC writer, old Hive ORC writer, and native Parquet writer.
    > a prop for whether it's written using the native writer?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4756/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22932


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98421/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22932#discussion_r230547020
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala ---
    @@ -314,6 +316,21 @@ abstract class OrcSuite extends OrcTest with BeforeAndAfterAll {
           checkAnswer(spark.read.orc(path.getCanonicalPath), Row(ts))
         }
       }
    +
    --- End diff --
    
    Please note that the following test case is executed twice; `OrcSourceSuite` and `HiveOrcSourceSuite`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22932#discussion_r232424626
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcOutputWriter.scala ---
    @@ -36,11 +41,17 @@ private[orc] class OrcOutputWriter(
       private[this] val serializer = new OrcSerializer(dataSchema)
     
       private val recordWriter = {
    -    new OrcOutputFormat[OrcStruct]() {
    +    val orcOutputFormat = new OrcOutputFormat[OrcStruct]() {
           override def getDefaultWorkFile(context: TaskAttemptContext, extension: String): Path = {
             new Path(path)
           }
    -    }.getRecordWriter(context)
    +    }
    +    val filename = orcOutputFormat.getDefaultWorkFile(context, ".orc")
    +    val options = OrcMapRedOutputFormat.buildOptions(context.getConfiguration)
    +    val writer = OrcFile.createWriter(filename, options)
    +    val recordWriter = new OrcMapreduceRecordWriter[OrcStruct](writer)
    +    writer.addUserMetadata(SPARK_VERSION_METADATA_KEY, UTF_8.encode(SPARK_VERSION_SHORT))
    --- End diff --
    
    Could we create a separate function for adding these metadata?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    Yes. It does. If you use `spark.sql.orc.impl=hive`. It has a different version number like the following.
    ```
    File Version: 0.12 with HIVE_8732
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    The last commit will pass the test. The previous one fails due to `spaces at the end`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98539/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98456/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98671/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22932#discussion_r231777190
  
    --- Diff: sql/core/src/test/resources/sql-tests/results/describe-part-after-analyze.sql.out ---
    @@ -93,7 +93,7 @@ Partition Values    	[ds=2017-08-01, hr=10]
     Location [not included in comparison]sql/core/spark-warehouse/t/ds=2017-08-01/hr=10	                    
     Created Time [not included in comparison]
     Last Access [not included in comparison]
    -Partition Statistics	1121 bytes, 3 rows  	                    
    +Partition Statistics	1229 bytes, 3 rows  	                    
    --- End diff --
    
    Hmmm .. yea, I think we should avoid ..


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4905/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22932#discussion_r231791191
  
    --- Diff: sql/core/src/test/resources/sql-tests/results/describe-part-after-analyze.sql.out ---
    @@ -93,7 +93,7 @@ Partition Values    	[ds=2017-08-01, hr=10]
     Location [not included in comparison]sql/core/spark-warehouse/t/ds=2017-08-01/hr=10	                    
     Created Time [not included in comparison]
     Last Access [not included in comparison]
    -Partition Statistics	1121 bytes, 3 rows  	                    
    +Partition Statistics	1229 bytes, 3 rows  	                    
    --- End diff --
    
    It's filed and I made [a PR for SPARK-25971](https://github.com/apache/spark/pull/22972) for SQLQueryTestSuite.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    **[Test build #98539 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98539/testReport)** for PR 22932 at commit [`ef49a27`](https://github.com/apache/spark/commit/ef49a277d3fd39c6fd91b3fcda65f660b833ec95).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    I see. Thanks, @gatorsmile .


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4740/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    **[Test build #98430 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98430/testReport)** for PR 22932 at commit [`f5d35b4`](https://github.com/apache/spark/commit/f5d35b42092a7af2b545b4145daf9172ea6a8e32).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22932#discussion_r231775020
  
    --- Diff: sql/core/src/test/resources/sql-tests/results/describe-part-after-analyze.sql.out ---
    @@ -93,7 +93,7 @@ Partition Values    	[ds=2017-08-01, hr=10]
     Location [not included in comparison]sql/core/spark-warehouse/t/ds=2017-08-01/hr=10	                    
     Created Time [not included in comparison]
     Last Access [not included in comparison]
    -Partition Statistics	1121 bytes, 3 rows  	                    
    +Partition Statistics	1229 bytes, 3 rows  	                    
    --- End diff --
    
    Hm, does it mean that basically the tests will be failed or fixed for official releases (since it doesn't have `-SNAPSHOT`)?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    Retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    **[Test build #98456 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98456/testReport)** for PR 22932 at commit [`ef49a27`](https://github.com/apache/spark/commit/ef49a277d3fd39c6fd91b3fcda65f660b833ec95).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    **[Test build #98430 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98430/testReport)** for PR 22932 at commit [`f5d35b4`](https://github.com/apache/spark/commit/f5d35b42092a7af2b545b4145daf9172ea6a8e32).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    **[Test build #98429 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98429/testReport)** for PR 22932 at commit [`1ed6368`](https://github.com/apache/spark/commit/1ed63683f2f8f7361a83892d38c84f40e2464590).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4735/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    Will take a look this week. Thanks for your work!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    **[Test build #98421 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98421/testReport)** for PR 22932 at commit [`601ccbb`](https://github.com/apache/spark/commit/601ccbb4e20a068469839bc71870230cfb6fd7a1).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22932#discussion_r230610261
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/package.scala ---
    @@ -44,4 +44,13 @@ package object sql {
       type Strategy = SparkStrategy
     
       type DataFrame = Dataset[Row]
    +
    +  /**
    +   * Metadata key which is used to write Spark version in the followings:
    +   * - Parquet file metadata
    +   * - ORC file metadata
    +   *
    +   * Note that Hive table property `spark.sql.create.version` also has Spark version.
    +   */
    +  private[sql] val CREATE_VERSION = "org.apache.spark.sql.create.version"
    --- End diff --
    
    Thank you for review, @hvanhovell . Yes, we can use that `org.apache.spark.version` since this is a new key.
    
    Although Hive table property `spark.sql.create.version` has `.create.` part, it seems that we don't need to follow that convention here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    **[Test build #98671 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98671/testReport)** for PR 22932 at commit [`04457be`](https://github.com/apache/spark/commit/04457be5bc8e6023a9b9c2e71f9a123869465cbd).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    **[Test build #98539 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98539/testReport)** for PR 22932 at commit [`ef49a27`](https://github.com/apache/spark/commit/ef49a277d3fd39c6fd91b3fcda65f660b833ec95).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22932#discussion_r230563752
  
    --- Diff: sql/core/src/test/resources/sql-tests/results/describe-part-after-analyze.sql.out ---
    @@ -93,7 +93,7 @@ Partition Values    	[ds=2017-08-01, hr=10]
     Location [not included in comparison]sql/core/spark-warehouse/t/ds=2017-08-01/hr=10	                    
     Created Time [not included in comparison]
     Last Access [not included in comparison]
    -Partition Statistics	1121 bytes, 3 rows  	                    
    +Partition Statistics	1229 bytes, 3 rows  	                    
    --- End diff --
    
    This is caused by adding `org.apache.spark.sql.create.version = 3.0.0-SNAPSHOT`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    Could you review this, @gatorsmile ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    @felixcheung . If the question is about writer versions, Spark/Hive works on top of ORC/Parquet library. ORC/Parquet library already writes its specific version for that purpose. For me, it looks enough.
    
    **ORC**
    ```
    $ orc-tools meta /tmp/o
    File Version: 0.12 with ORC_135
    ```
    
    **Parquet**
    ```
    $ parquet-tools meta /tmp/o
    creator:     parquet-mr version 1.10.0 (build 031a6654009e3b82020012a18434c582bd74c73a)
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22932#discussion_r232424657
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala ---
    @@ -274,6 +278,15 @@ private[orc] class OrcOutputWriter(
     
       override def close(): Unit = {
         if (recordWriterInstantiated) {
    +      // Hive 1.2.1 ORC initializes its private `writer` field at the first write.
    +      try {
    +        val writerField = recordWriter.getClass.getDeclaredField("writer")
    +        writerField.setAccessible(true)
    +        val writer = writerField.get(recordWriter).asInstanceOf[Writer]
    +        writer.addUserMetadata(SPARK_VERSION_METADATA_KEY, UTF_8.encode(SPARK_VERSION_SHORT))
    +      } catch {
    +        case NonFatal(e) => log.warn(e.toString, e)
    +      }
    --- End diff --
    
    The same comment here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    Thank you so much!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    **[Test build #98625 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98625/testReport)** for PR 22932 at commit [`396540a`](https://github.com/apache/spark/commit/396540ad6ca4565bf4ed6aa3fd4f61766b579994).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    **[Test build #98671 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98671/testReport)** for PR 22932 at commit [`04457be`](https://github.com/apache/spark/commit/04457be5bc8e6023a9b9c2e71f9a123869465cbd).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22932
  
    **[Test build #98421 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98421/testReport)** for PR 22932 at commit [`601ccbb`](https://github.com/apache/spark/commit/601ccbb4e20a068469839bc71870230cfb6fd7a1).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org