You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by steveloughran <gi...@git.apache.org> on 2018/08/22 17:26:52 UTC

[GitHub] spark pull request #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark...

GitHub user steveloughran opened a pull request:

    https://github.com/apache/spark/pull/22186

    [SPARK-25183][SQL] Spark HiveServer2 to use Spark ShutdownHookManager

    ## What changes were proposed in this pull request?
    
    Switch `org.apache.hive.service.server.HiveServer2` to register its shutdown callback with Spark's `ShutdownHookManager`, rather than direct with the Java Runtime callback.
    
    This avoids race conditions in shutdown where the filesystem is shutdown before the flush/write/rename of the event log is completed, particularly on object stores where the write and rename can be slow.
    
    ## How was this patch tested?
    
    There's no explicit unit for test this, which is consistent with every other shutdown hook in the codebase.
    
    * There's an implicit test when the scalatest process is halted.
    * More manual/integration testing is needed.
    
    HADOOP-15679 has added the ability to explicitly execute the hadoop shutdown hook sequence which spark uses; that could be stabilized for testing if desired, after which all the spark hooks could be tested. Until then: external system tests only.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/steveloughran/spark BUG/SPARK-25183-shutdown

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22186.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22186
    
----
commit 8fbeb59c8a96dd7f7ed6982009bac59ab3fa87ce
Author: Steve Loughran <st...@...>
Date:   2018-08-22T17:13:19Z

    SPARK-25183 Spark HiveServer2 to use Spark ShutdownHookManager for shutdown hook
    
    Change-Id: I9a0885660efda4ec6277e0237ca7eada0b43533f

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22186: [SPARK-25183][SQL][WIP] Spark HiveServer2 to use Spark S...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on the issue:

    https://github.com/apache/spark/pull/22186
  
    my local maven build *did* work, so maybe its a javac/JVM version thing. Will move back to a java class callback.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/22186
  
    Merging to master branch.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22186: [SPARK-25183][SQL][WIP] Spark HiveServer2 to use Spark S...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22186
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/22186
  
    I see. Thanks for the explain, I checked the code again, yes you're right. Let me retrigger the test again, will merge it if everything is fine.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22186: [SPARK-25183][SQL][WIP] Spark HiveServer2 to use Spark S...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on the issue:

    https://github.com/apache/spark/pull/22186
  
    My local build wasn't including that module; it now does and the link works with a subclass of `AbstractFunction0<BoxUnit>`.
    
    The local tests are failing under maven with hive/jackson mismatch though. I'm going to consider that a separate issue.
    
    ```
    #c=cvalue;d=dvalue
    - SPARK-16563 ThriftCLIService FetchResults repeat fetching result *** FAILED ***
      java.sql.SQLException: java.lang.NoSuchMethodError: org.json4s.jackson.JsonMethods$.parse$default$3()Z
      at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:296)
      at org.apache.spark.sql.hive.thriftserver.HiveThriftJdbcTest$$anonfun$withMultipleConnectionJdbcStatement$2.apply(HiveThriftServer2Suites.scala:814)
      at org.apache.spark.sql.hive.thriftserver.HiveThriftJdbcTest$$anonfun$withMultipleConnectionJdbcStatement$2.apply(HiveThriftServer2Suites.scala:813)
      at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
      at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
      at org.apache.spark.sql.hive.thriftserver.HiveThriftJdbcTest.withMultipleConnectionJdbcStatement(HiveThriftServer2Suites.scala:813)
      at org.apache.spark.sql.hive.thriftserver.HiveThriftJdbcTest.withJdbcStatement(HiveThriftServer2Suites.scala:822)
      at org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite$$anonfun$2$$anonfun$apply$mcV$sp$2.apply(HiveThriftServer2Suites.scala:100)
      at org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite$$anonfun$2$$anonfun$apply$mcV$sp$2.apply(HiveThriftServer2Suites.scala:96)
      at org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite.org$apache$spark$sql$hive$thriftserver$HiveThriftBinaryServerSuite$$withCLIServiceClient(HiveThriftServer2Suites.scala:71)
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22186
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95518/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22186
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2723/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22186: [SPARK-25183][SQL][WIP] Spark HiveServer2 to use Spark S...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22186
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22186: [SPARK-25183][SQL][WIP] Spark HiveServer2 to use Spark S...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22186
  
    **[Test build #95385 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95385/testReport)** for PR 22186 at commit [`fbced52`](https://github.com/apache/spark/commit/fbced52e5687cd5eb6a06c3b9bca5cbeb9343002).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22186: [SPARK-25183][SQL][WIP] Spark HiveServer2 to use Spark S...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on the issue:

    https://github.com/apache/spark/pull/22186
  
    The latest patch builds locally
    
    Maven test outcome
    * lots of json missing method errors, clearly jackson version problems of some kind
    * I don't see log messages of hive shutdown appearing in the output, though after all the tests finish I do get a log showing the FS cleanup is going on
    
    ```
    18/08/28 22:09:58 INFO ShutdownHookManager: Shutdown hook called
    18/08/28 22:09:58 INFO ShutdownHookManager: Deleting directory ...spark/sql/hive-thriftserver/target/tmp/
    ```
    
    I think it might be possible to actually test whether the shutdown hook was added by calling remove(hook) in a test and verifying that the hook was found, that is : it was registered. Some caching of the hook and a package-level removeHook method in the HiveServer, though wiring it all the way up to a test case would be tricky...


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22186
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2453/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22186
  
    **[Test build #95114 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95114/testReport)** for PR 22186 at commit [`8fbeb59`](https://github.com/apache/spark/commit/8fbeb59c8a96dd7f7ed6982009bac59ab3fa87ce).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22186
  
    **[Test build #95518 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95518/testReport)** for PR 22186 at commit [`fbced52`](https://github.com/apache/spark/commit/fbced52e5687cd5eb6a06c3b9bca5cbeb9343002).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on the issue:

    https://github.com/apache/spark/pull/22186
  
    thanks


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/22186
  
    Jenkins, retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22186: [SPARK-25183][SQL][WIP] Spark HiveServer2 to use Spark S...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on the issue:

    https://github.com/apache/spark/pull/22186
  
    Not sure what is up with the build here; worked with mvn locally. Possibly my use of a java 8 lamda-expression as the hook?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22186
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22186
  
    **[Test build #95518 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95518/testReport)** for PR 22186 at commit [`fbced52`](https://github.com/apache/spark/commit/fbced52e5687cd5eb6a06c3b9bca5cbeb9343002).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22186: [SPARK-25183][SQL][WIP] Spark HiveServer2 to use Spark S...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/22186
  
    My local maven build also failed.
    
    I think the problem is that`ShutdownHookManager` is implemented in Scala, the complied method signature may be different when invoked from Java, I'm not sure how Scala anonymous function is translated to Java, but it seems like due to this issue.
    
    (Maven has some detailed failure information, whereas SBT doesn't have anything).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22186
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22186


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22186
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95114/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22186
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22186: [SPARK-25183][SQL][WIP] Spark HiveServer2 to use Spark S...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22186
  
    **[Test build #95385 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95385/testReport)** for PR 22186 at commit [`fbced52`](https://github.com/apache/spark/commit/fbced52e5687cd5eb6a06c3b9bca5cbeb9343002).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22186: [SPARK-25183][SQL][WIP] Spark HiveServer2 to use Spark S...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22186
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2646/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22186
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22186: [SPARK-25183][SQL][WIP] Spark HiveServer2 to use Spark S...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on the issue:

    https://github.com/apache/spark/pull/22186
  
    This will eliminate a race condition between FS shutdown (in the hadoop shutdown manager) and the hive callback. Theres a risk today that the filesystems will be closed before that event log close()/rename() is called, so things don't get saved —and this can happen with any FS.
    
    registering the shutdown hook via the spark APIs, with a priority > than the FS shutdown, guarantees that it will be called before the FS shutdown. But it doesn't guarantee that the operation will complete within the 10s time limit hard coded into Hadoop 2.8.x+ for any single shutdown hook to complete. It is going to work in HDFS except in the special case of HDFS NN lock or GC pause.
    
    The Hadoop configurable delay of [HADOOP-15679](https://issues.apache.org/jira/browse/HADOOP-15679) needs to go in. I've increased the default timeout to 30s there for more forgiveness with HDFS, and for object stores with O(data) renames people should configure it with a timeout of minutes, or, if they want to turn it off altogether, hours. 
    
    I'm backporting HADOOP-15679 to all branches 2.8.x+, so all hadoop versions with that timeout will have the timeout configurable & the default time extended.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22186: [SPARK-25183][SQL][WIP] Spark HiveServer2 to use Spark S...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/22186
  
    The fix itself LGTM, but I don't think this could solve the STS shutdown hook conflict problem with Hadoop.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22186: [SPARK-25183][SQL][WIP] Spark HiveServer2 to use Spark S...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22186
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95385/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22186
  
    **[Test build #95114 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95114/testReport)** for PR 22186 at commit [`8fbeb59`](https://github.com/apache/spark/commit/8fbeb59c8a96dd7f7ed6982009bac59ab3fa87ce).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org