You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by dongjoon-hyun <gi...@git.apache.org> on 2018/04/18 05:33:19 UTC

[GitHub] spark pull request #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC...

GitHub user dongjoon-hyun opened a pull request:

    https://github.com/apache/spark/pull/21093

    [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4.3

    ## What changes were proposed in this pull request?
    
    This PR updates Apache ORC dependencies to 1.4.3 released on February 9th. Apache ORC 1.4.2 release removes unnecessary dependencies and 1.4.3 has 5 more patches (https://s.apache.org/Fll8).
    
    Especially, the following ORC-285 is fixed at 1.4.3.
    
    ```scala
    scala> val df = Seq(Array.empty[Float]).toDF()
    
    scala> df.write.format("orc").save("/tmp/floatarray")
    
    scala> spark.read.orc("/tmp/floatarray")
    res1: org.apache.spark.sql.DataFrame = [value: array<float>]
    
    scala> spark.read.orc("/tmp/floatarray").show()
    18/02/12 22:09:10 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1)
    java.io.IOException: Error reading file: file:/tmp/floatarray/part-00000-9c0b461b-4df1-4c23-aac1-3e4f349ac7d6-c000.snappy.orc
    	at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1191)
    	at org.apache.orc.mapreduce.OrcMapreduceRecordReader.ensureBatch(OrcMapreduceRecordReader.java:78)
    ...
    Caused by: java.io.EOFException: Read past EOF for compressed stream Stream for column 2 kind DATA position: 0 length: 0 range: 0 offset: 0 limit: 0
    ```
    
    ## How was this patch tested?
    
    Pass the Jenkins test.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dongjoon-hyun/spark SPARK-23340-2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21093.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21093
    
----
commit fc5d976ffb33ebec996415ac1296196f8458a01f
Author: Dongjoon Hyun <do...@...>
Date:   2018-02-17T08:25:36Z

    [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4.3
    
    This PR updates Apache ORC dependencies to 1.4.3 released on February 9th. Apache ORC 1.4.2 release removes unnecessary dependencies and 1.4.3 has 5 more patches (https://s.apache.org/Fll8).
    
    Especially, the following ORC-285 is fixed at 1.4.3.
    
    ```scala
    scala> val df = Seq(Array.empty[Float]).toDF()
    
    scala> df.write.format("orc").save("/tmp/floatarray")
    
    scala> spark.read.orc("/tmp/floatarray")
    res1: org.apache.spark.sql.DataFrame = [value: array<float>]
    
    scala> spark.read.orc("/tmp/floatarray").show()
    18/02/12 22:09:10 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1)
    java.io.IOException: Error reading file: file:/tmp/floatarray/part-00000-9c0b461b-4df1-4c23-aac1-3e4f349ac7d6-c000.snappy.orc
    	at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1191)
    	at org.apache.orc.mapreduce.OrcMapreduceRecordReader.ensureBatch(OrcMapreduceRecordReader.java:78)
    ...
    Caused by: java.io.EOFException: Read past EOF for compressed stream Stream for column 2 kind DATA position: 0 length: 0 range: 0 offset: 0 limit: 0
    ```
    
    Pass the Jenkins test.
    
    Author: Dongjoon Hyun <do...@apache.org>
    
    Closes #20511 from dongjoon-hyun/SPARK-23340.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21093
  
    **[Test build #89507 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89507/testReport)** for PR 21093 at commit [`fc5d976`](https://github.com/apache/spark/commit/fc5d976ffb33ebec996415ac1296196f8458a01f).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21093
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2445/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21093
  
    **[Test build #89532 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89532/testReport)** for PR 21093 at commit [`fc5d976`](https://github.com/apache/spark/commit/fc5d976ffb33ebec996415ac1296196f8458a01f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21093#discussion_r182814133
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcQuerySuite.scala ---
    @@ -208,4 +208,14 @@ class HiveOrcQuerySuite extends OrcQueryTest with TestHiveSingleton {
           }
         }
       }
    +
    +  test("SPARK-23340 Empty float/double array columns raise EOFException") {
    --- End diff --
    
    nvm. I found the original PR has them too. This is just a backport. Normally, we often refer to the original PR https://github.com/apache/spark/pull/20511 in the PR description


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/21093
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun closed the pull request at:

    https://github.com/apache/spark/pull/21093


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21093
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21093#discussion_r182813427
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcQuerySuite.scala ---
    @@ -208,4 +208,14 @@ class HiveOrcQuerySuite extends OrcQueryTest with TestHiveSingleton {
           }
         }
       }
    +
    +  test("SPARK-23340 Empty float/double array columns raise EOFException") {
    --- End diff --
    
    Just to confirm that these two tests are in the master branch, right?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21093
  
    **[Test build #89484 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89484/testReport)** for PR 21093 at commit [`fc5d976`](https://github.com/apache/spark/commit/fc5d976ffb33ebec996415ac1296196f8458a01f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21093
  
    **[Test build #89489 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89489/testReport)** for PR 21093 at commit [`fc5d976`](https://github.com/apache/spark/commit/fc5d976ffb33ebec996415ac1296196f8458a01f).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21093
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2417/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21093
  
    **[Test build #89507 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89507/testReport)** for PR 21093 at commit [`fc5d976`](https://github.com/apache/spark/commit/fc5d976ffb33ebec996415ac1296196f8458a01f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/21093
  
    @gatorsmile . Sorry for late response. I'm currently at Dataworks Summit Berlin.
    
    I took a look. It seems that the last two failures are due to `JsonInferSchema` of `BucketedWriteWithoutHiveSupportSuite`.
    - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89507/testReport/
    - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89522/testReport/
    
    
    ```
    [info] - write bucketed data *** FAILED *** (4 seconds, 698 milliseconds)
    [info]   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 84.0 failed 1 times, most recent failure: Lost task 0.0 in stage 84.0 (TID 86, localhost, executor driver): java.lang.IllegalStateException: LiveListenerBus is stopped.
    [info] 	at org.apache.spark.scheduler.LiveListenerBus.addToQueue(LiveListenerBus.scala:97)
    [info] 	at org.apache.spark.scheduler.LiveListenerBus.addToStatusQueue(LiveListenerBus.scala:80)
    [info] 	at org.apache.spark.sql.internal.SharedState.<init>(SharedState.scala:93)
    [info] 	at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:117)
    [info] 	at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:117)
    [info] 	at scala.Option.getOrElse(Option.scala:121)
    [info] 	at org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:117)
    [info] 	at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:116)
    [info] 	at org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:286)
    [info] 	at org.apache.spark.sql.test.TestSparkSession.sessionState$lzycompute(TestSQLContext.scala:42)
    [info] 	at org.apache.spark.sql.test.TestSparkSession.sessionState(TestSQLContext.scala:41)
    [info] 	at org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$1.apply(SparkSession.scala:92)
    [info] 	at org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$1.apply(SparkSession.scala:92)
    [info] 	at scala.Option.map(Option.scala:146)
    [info] 	at org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:92)
    [info] 	at org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:91)
    [info] 	at org.apache.spark.sql.internal.SQLConf$.get(SQLConf.scala:110)
    [info] 	at org.apache.spark.sql.types.DataType.sameType(DataType.scala:84)
    [info] 	at org.apache.spark.sql.catalyst.analysis.TypeCoercion$$anonfun$1.apply(TypeCoercion.scala:105)
    [info] 	at org.apache.spark.sql.catalyst.analysis.TypeCoercion$$anonfun$1.apply(TypeCoercion.scala:86)
    [info] 	at org.apache.spark.sql.execution.datasources.json.JsonInferSchema$.compatibleType(JsonInferSchema.scala:271)
    [info] 	at org.apache.spark.sql.execution.datasources.json.JsonInferSchema$$anonfun$org$apache$spark$sql$execution$datasources$json$JsonInferSchema$$compatibleRootType$1.apply(JsonInferSchema.scala:262)
    [info] 	at 
    ```
    
    I compared with `branch-2.3` itself. Unfortunately, recent`branch-2.3` itself is unstable. There is no success during last 18 runs for SBT builds.
    
    - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.6/
    - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21093
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21093
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89489/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21093
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2454/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21093
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89484/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21093
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89522/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun closed the pull request at:

    https://github.com/apache/spark/pull/21093


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21093
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2422/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21093
  
    LGTM
    
    Also cc @omalley 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/21093
  
    Oops. I mistakenly click `close and comments` button.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21093
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/21093
  
    Thank you all! :D


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21093
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89507/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21093
  
    **[Test build #89489 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89489/testReport)** for PR 21093 at commit [`fc5d976`](https://github.com/apache/spark/commit/fc5d976ffb33ebec996415ac1296196f8458a01f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/21093
  
    `BucketedWriteWithoutHiveSupportSuite` is testing for `Seq("parquet", "json")`, and the testsuite fails after `insertInto` and at `write bucketed data`. So, the flakiness seems to be irrelevant to this patch. Also, the final one passed 8 hours ago.
    
    ```
    [info] BucketedWriteWithoutHiveSupportSuite:
    [info] - bucketed by non-existing column (28 milliseconds)
    [info] - numBuckets be greater than 0 but less than 100000 (10 milliseconds)
    [info] - specify sorting columns without bucketing columns (8 milliseconds)
    [info] - sorting by non-orderable column (34 milliseconds)
    [info] - write bucketed data using save() (9 milliseconds)
    [info] - write bucketed data using insertInto() (9 milliseconds)
    ... Error starts
    [info] - write bucketed data *** FAILED *** (4 seconds, 601 milliseconds)
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21093
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21093
  
    @dongjoon-hyun Do you have the bandwidth and see why these tests are flaky?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21093
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21093
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC...

Posted by dongjoon-hyun <gi...@git.apache.org>.
GitHub user dongjoon-hyun reopened a pull request:

    https://github.com/apache/spark/pull/21093

    [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4.3

    ## What changes were proposed in this pull request?
    
    This PR updates Apache ORC dependencies to 1.4.3 released on February 9th. Apache ORC 1.4.2 release removes unnecessary dependencies and 1.4.3 has 5 more patches (https://s.apache.org/Fll8).
    
    Especially, the following ORC-285 is fixed at 1.4.3.
    
    ```scala
    scala> val df = Seq(Array.empty[Float]).toDF()
    
    scala> df.write.format("orc").save("/tmp/floatarray")
    
    scala> spark.read.orc("/tmp/floatarray")
    res1: org.apache.spark.sql.DataFrame = [value: array<float>]
    
    scala> spark.read.orc("/tmp/floatarray").show()
    18/02/12 22:09:10 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1)
    java.io.IOException: Error reading file: file:/tmp/floatarray/part-00000-9c0b461b-4df1-4c23-aac1-3e4f349ac7d6-c000.snappy.orc
    	at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1191)
    	at org.apache.orc.mapreduce.OrcMapreduceRecordReader.ensureBatch(OrcMapreduceRecordReader.java:78)
    ...
    Caused by: java.io.EOFException: Read past EOF for compressed stream Stream for column 2 kind DATA position: 0 length: 0 range: 0 offset: 0 limit: 0
    ```
    
    ## How was this patch tested?
    
    Pass the Jenkins test.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dongjoon-hyun/spark SPARK-23340-2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21093.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21093
    
----
commit fc5d976ffb33ebec996415ac1296196f8458a01f
Author: Dongjoon Hyun <do...@...>
Date:   2018-02-17T08:25:36Z

    [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4.3
    
    This PR updates Apache ORC dependencies to 1.4.3 released on February 9th. Apache ORC 1.4.2 release removes unnecessary dependencies and 1.4.3 has 5 more patches (https://s.apache.org/Fll8).
    
    Especially, the following ORC-285 is fixed at 1.4.3.
    
    ```scala
    scala> val df = Seq(Array.empty[Float]).toDF()
    
    scala> df.write.format("orc").save("/tmp/floatarray")
    
    scala> spark.read.orc("/tmp/floatarray")
    res1: org.apache.spark.sql.DataFrame = [value: array<float>]
    
    scala> spark.read.orc("/tmp/floatarray").show()
    18/02/12 22:09:10 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1)
    java.io.IOException: Error reading file: file:/tmp/floatarray/part-00000-9c0b461b-4df1-4c23-aac1-3e4f349ac7d6-c000.snappy.orc
    	at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1191)
    	at org.apache.orc.mapreduce.OrcMapreduceRecordReader.ensureBatch(OrcMapreduceRecordReader.java:78)
    ...
    Caused by: java.io.EOFException: Read past EOF for compressed stream Stream for column 2 kind DATA position: 0 length: 0 range: 0 offset: 0 limit: 0
    ```
    
    Pass the Jenkins test.
    
    Author: Dongjoon Hyun <do...@apache.org>
    
    Closes #20511 from dongjoon-hyun/SPARK-23340.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21093
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/21093
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21093
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89532/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21093
  
    **[Test build #89522 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89522/testReport)** for PR 21093 at commit [`fc5d976`](https://github.com/apache/spark/commit/fc5d976ffb33ebec996415ac1296196f8458a01f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21093
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2433/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/21093
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21093
  
    **[Test build #89522 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89522/testReport)** for PR 21093 at commit [`fc5d976`](https://github.com/apache/spark/commit/fc5d976ffb33ebec996415ac1296196f8458a01f).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21093
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21093
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/21093
  
    Thank you for review, @cloud-fan and @gatorsmile .


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21093
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21093
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21093
  
    **[Test build #89532 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89532/testReport)** for PR 21093 at commit [`fc5d976`](https://github.com/apache/spark/commit/fc5d976ffb33ebec996415ac1296196f8458a01f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21093
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21093
  
    **[Test build #89484 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89484/testReport)** for PR 21093 at commit [`fc5d976`](https://github.com/apache/spark/commit/fc5d976ffb33ebec996415ac1296196f8458a01f).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21093
  
    Thanks! Merged to 2.3


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org