You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@livy.apache.org by HyukjinKwon <gi...@git.apache.org> on 2018/10/15 03:09:54 UTC

[GitHub] incubator-livy pull request #121: [WIP][LIVY-518][BUILD] Support Spark 2.4

GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/incubator-livy/pull/121

    [WIP][LIVY-518][BUILD] Support Spark 2.4

    ## What changes were proposed in this pull request?
    
    This PR proposes to support Spark 2.4 in Livy.
    
    ## How was this patch tested?
    
    Unit tests and manual tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/incubator-livy spark24-support

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-livy/pull/121.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #121
    
----
commit 7393630fed6b5f95ef053a1f31bf7d233dd6c6fe
Author: hyukjinkwon <gu...@...>
Date:   2018-10-15T03:08:18Z

    [LIVY-518][BUILD] Support Spark 2.4

----


---

[GitHub] incubator-livy pull request #121: [LIVY-518][BUILD] Support Spark 2.4

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/incubator-livy/pull/121#discussion_r231110630
  
    --- Diff: rsc/src/test/java/org/apache/livy/rsc/TestSparkClient.java ---
    @@ -271,7 +275,7 @@ public void call(LivyClient client) throws Exception {
     
       @Test
       public void testSparkSQLJob() throws Exception {
    -    runTest(true, new TestFunction() {
    +    runTest(true, false, new TestFunction() {
    --- End diff --
    
    did you manage to find out something more about this?


---

[GitHub] incubator-livy issue #121: [WIP][LIVY-518][BUILD] Support Spark 2.4

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/incubator-livy/pull/121
  
    cc @vanzin, @mgaido91 (already you're here tho :-) ) and @jerryshao.


---

[GitHub] incubator-livy issue #121: [LIVY-518][BUILD] Support Spark 2.4

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/incubator-livy/pull/121
  
    Hmm... it might be good to open a Spark bug. Should be easy to create a repro - basically what you have in the shell session you tried above, but in a small scala class. (Start session - use it - stop it - remove metastore db - start new session - try to use it.)
    
    Given that this seems to be a Spark issue, guess we'll go with the test changes.


---

[GitHub] incubator-livy pull request #121: [LIVY-518][BUILD] Support Spark 2.4

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/incubator-livy/pull/121#discussion_r231343333
  
    --- Diff: rsc/src/test/java/org/apache/livy/rsc/TestSparkClient.java ---
    @@ -271,7 +275,7 @@ public void call(LivyClient client) throws Exception {
     
       @Test
       public void testSparkSQLJob() throws Exception {
    -    runTest(true, new TestFunction() {
    +    runTest(true, false, new TestFunction() {
    --- End diff --
    
    Okay, let me take a look and get back to you soon. 


---

[GitHub] incubator-livy pull request #121: [WIP][LIVY-518][BUILD] Support Spark 2.4

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/incubator-livy/pull/121#discussion_r225189164
  
    --- Diff: rsc/src/test/java/org/apache/livy/rsc/TestSparkClient.java ---
    @@ -271,7 +275,7 @@ public void call(LivyClient client) throws Exception {
     
       @Test
       public void testSparkSQLJob() throws Exception {
    -    runTest(true, new TestFunction() {
    +    runTest(true, false, new TestFunction() {
    --- End diff --
    
    > I am not 100% sure why the test becomes failed. The root cause seems that multiple Hive clients(?) access to derby and metastore as it creates Hive client multiple times(?).
    
    Wouldn't it fail with 2.3 too then? I don't expect any difference among the 2 versions on that. Am I missing something?


---

[GitHub] incubator-livy issue #121: [WIP][LIVY-518][BUILD] Support Spark 2.4

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/incubator-livy/pull/121
  
    2.4 is not yet officially released. I am manually verifying this within my local.


---

[GitHub] incubator-livy pull request #121: [LIVY-518][BUILD] Support Spark 2.4

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-livy/pull/121


---

[GitHub] incubator-livy pull request #121: [WIP][LIVY-518][BUILD] Support Spark 2.4

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/incubator-livy/pull/121#discussion_r225185417
  
    --- Diff: rsc/src/test/java/org/apache/livy/rsc/TestSparkClient.java ---
    @@ -271,7 +275,7 @@ public void call(LivyClient client) throws Exception {
     
       @Test
       public void testSparkSQLJob() throws Exception {
    -    runTest(true, new TestFunction() {
    +    runTest(true, false, new TestFunction() {
    --- End diff --
    
    I am running Travis CI against RC3 (https://travis-ci.org/HyukjinKwon/incubator-livy/builds/441687251)


---

[GitHub] incubator-livy pull request #121: [LIVY-518][BUILD] Support Spark 2.4

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/incubator-livy/pull/121#discussion_r231341354
  
    --- Diff: rsc/src/test/java/org/apache/livy/rsc/TestSparkClient.java ---
    @@ -271,7 +275,7 @@ public void call(LivyClient client) throws Exception {
     
       @Test
       public void testSparkSQLJob() throws Exception {
    -    runTest(true, new TestFunction() {
    +    runTest(true, false, new TestFunction() {
    --- End diff --
    
    I think this deserves some more investigation. It seems to maybe be an issue in Spark, even if it might not affect Livy outside of tests.
    
    If I re-enable the old test, and comment out these lines, things pass:
    
    ```
          File metaStore = new File("metastore_db");
          if (metaStore.exists()) {
            FileUtils.deleteDirectory(metaStore);
          }
    ```
    
    This suite runs multiple different SparkContext / SparkSession instances in the same JVM. Each one is supposed to be independent of one another, so the lines above shouldn't cause problems.
    
    Since they seem to be causing problems, either Livy is not shutting down sessions appropriately, or Spark is keeping state across different SparkSession initializations. Neither is very good...


---

[GitHub] incubator-livy issue #121: [WIP][LIVY-518][BUILD] Support Spark 2.4

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/incubator-livy/pull/121
  
    I will update after 2.4.0 is officially released.


---

[GitHub] incubator-livy pull request #121: [WIP][LIVY-518][BUILD] Support Spark 2.4

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/incubator-livy/pull/121#discussion_r225081934
  
    --- Diff: README.md ---
    @@ -57,7 +57,7 @@ Required python packages for building Livy:
     To run Livy, you will also need a Spark installation. You can get Spark releases at
     https://spark.apache.org/downloads.html.
     
    -Livy requires Spark 2.2 or 2.3. You can switch to a different version of Spark by setting the
    +Livy requires Spark 2.2, 2.3 or 2.4. You can switch to a different version of Spark by setting the
    --- End diff --
    
    nit: 2.2+?


---

[GitHub] incubator-livy issue #121: [WIP][LIVY-518][BUILD] Support Spark 2.4

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/incubator-livy/pull/121
  
    Hi, @vanzin , @mgaido91 , @HyukjinKwon . 
    Apache Spark 2.4.0 is on Maven Central now. Can we proceed this?


---

[GitHub] incubator-livy issue #121: [LIVY-518][BUILD] Support Spark 2.4

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/incubator-livy/pull/121
  
    Thanks guys!


---

[GitHub] incubator-livy issue #121: [WIP][LIVY-518][BUILD] Support Spark 2.4

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/incubator-livy/pull/121
  
    @vanzin, looks https://github.com/apache/spark/commit/a75571b46f813005a6d4b076ec39081ffab11844#diff-f697551d2f00bfb336406b6fe6b516fe causes the test failure. At the very least, I can see the unittests pass without that specific commit. I suspect somehow different derby is picked up and it behaves differently, which causes a test failure related - which is pretty a Spark issue or misuse of restarting Hive session.
    
    The test case fixes are actually more correct because it enables what it needs.


---

[GitHub] incubator-livy pull request #121: [LIVY-518][BUILD] Support Spark 2.4

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/incubator-livy/pull/121#discussion_r231342303
  
    --- Diff: rsc/src/test/java/org/apache/livy/rsc/TestSparkClient.java ---
    @@ -271,7 +275,7 @@ public void call(LivyClient client) throws Exception {
     
       @Test
       public void testSparkSQLJob() throws Exception {
    -    runTest(true, new TestFunction() {
    +    runTest(true, false, new TestFunction() {
    --- End diff --
    
    @vanzin, thanks. I will investigate it within few days ... but would you mind if I do that as a followup job? I still suspect if it's specific to tests and shouldn't affect the functionailities within Livy.


---

[GitHub] incubator-livy issue #121: [WIP][LIVY-518][BUILD] Support Spark 2.4

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/incubator-livy/pull/121
  
    As of RC3, all the unit tests were passed (https://travis-ci.org/HyukjinKwon/incubator-livy/builds/441687251).
    
    I am running tests against RC 5 - https://travis-ci.org/HyukjinKwon/incubator-livy/builds/448148672


---

[GitHub] incubator-livy issue #121: [LIVY-518][BUILD] Support Spark 2.4

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/incubator-livy/pull/121
  
    Merging to master.


---

[GitHub] incubator-livy issue #121: [WIP][LIVY-518][BUILD] Support Spark 2.4

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/incubator-livy/pull/121
  
    @vanzin, weird.
    
    ```
    $ ./bin/spark-shell
    scala> sql("CREATE TABLE tblA(a int)")
    scala> spark.stop()
    ```
    
    ```
    $ rm -fr metastore_db
    $ rm -fr spark-warehouse
    $ rm -fr derby.log
    ```
    
    ```
    scala> import org.apache.spark.sql.SparkSession
    scala> val spark = SparkSession.builder().enableHiveSupport().getOrCreate()
    scala> sql("CREATE TABLE tblA(a int)")
    ```
    
    These steps reproduce the same error all in Spark 2.3.0, 2.3.1 and 2.4.0 - I remember we never properly supported to stop and start Hive enabled support Spark session in this way. SparkR tests were failed due to the same reason. Hmm .. I am taking a look.


---

[GitHub] incubator-livy pull request #121: [WIP][LIVY-518][BUILD] Support Spark 2.4

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/incubator-livy/pull/121#discussion_r225185258
  
    --- Diff: rsc/src/test/java/org/apache/livy/rsc/TestSparkClient.java ---
    @@ -271,7 +275,7 @@ public void call(LivyClient client) throws Exception {
     
       @Test
       public void testSparkSQLJob() throws Exception {
    -    runTest(true, new TestFunction() {
    +    runTest(true, false, new TestFunction() {
    --- End diff --
    
    I am not 100% sure why the test becomes failed. The root cause seems that multiple Hive clients(?) access to derby and metastore as it creates Hive client multiple times(?).
    
    I spent a whole day and failed to find why and just decided to fix the test since it looks the problem is specific to the test scope so far.
    
    So, I fixed the test `testSparkSQLJob` to use Spark only and `testHiveJob` to use Hive support. This reminds of me that the current issue at SparkR where the tests can't start and stop multiple Hive support sessions (it ended up with one hive support session),
    



---

[GitHub] incubator-livy issue #121: [LIVY-518][BUILD] Support Spark 2.4

Posted by codecov-io <gi...@git.apache.org>.
Github user codecov-io commented on the issue:

    https://github.com/apache/incubator-livy/pull/121
  
    # [Codecov](https://codecov.io/gh/apache/incubator-livy/pull/121?src=pr&el=h1) Report
    > Merging [#121](https://codecov.io/gh/apache/incubator-livy/pull/121?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-livy/commit/0bcc290dd2bccef85645f1a35938c241974b5c17?src=pr&el=desc) will **increase** coverage by `0.09%`.
    > The diff coverage is `100%`.
    
    [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-livy/pull/121/graphs/tree.svg?width=650&token=0MkVbiUFwE&height=150&src=pr)](https://codecov.io/gh/apache/incubator-livy/pull/121?src=pr&el=tree)
    
    ```diff
    @@             Coverage Diff              @@
    ##             master     #121      +/-   ##
    ============================================
    + Coverage     70.87%   70.96%   +0.09%     
    + Complexity      926      925       -1     
    ============================================
      Files           100      100              
      Lines          5510     5511       +1     
      Branches        829      829              
    ============================================
    + Hits           3905     3911       +6     
    + Misses         1065     1059       -6     
    - Partials        540      541       +1
    ```
    
    
    | [Impacted Files](https://codecov.io/gh/apache/incubator-livy/pull/121?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
    |---|---|---|---|
    | [...n/scala/org/apache/livy/utils/LivySparkUtils.scala](https://codecov.io/gh/apache/incubator-livy/pull/121/diff?src=pr&el=tree#diff-c2VydmVyL3NyYy9tYWluL3NjYWxhL29yZy9hcGFjaGUvbGl2eS91dGlscy9MaXZ5U3BhcmtVdGlscy5zY2FsYQ==) | `66.66% <100%> (+0.46%)` | `0 <0> (ø)` | :arrow_down: |
    | [...java/org/apache/livy/rsc/rpc/KryoMessageCodec.java](https://codecov.io/gh/apache/incubator-livy/pull/121/diff?src=pr&el=tree#diff-cnNjL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9saXZ5L3JzYy9ycGMvS3J5b01lc3NhZ2VDb2RlYy5qYXZh) | `94.54% <0%> (-3.64%)` | `18% <0%> (-1%)` | |
    | [...ain/java/org/apache/livy/rsc/driver/RSCDriver.java](https://codecov.io/gh/apache/incubator-livy/pull/121/diff?src=pr&el=tree#diff-cnNjL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9saXZ5L3JzYy9kcml2ZXIvUlNDRHJpdmVyLmphdmE=) | `77.96% <0%> (ø)` | `41% <0%> (ø)` | :arrow_down: |
    | [.../java/org/apache/livy/rsc/driver/SparkEntries.java](https://codecov.io/gh/apache/incubator-livy/pull/121/diff?src=pr&el=tree#diff-cnNjL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9saXZ5L3JzYy9kcml2ZXIvU3BhcmtFbnRyaWVzLmphdmE=) | `65.62% <0%> (+1.56%)` | `10% <0%> (ø)` | :arrow_down: |
    | [...c/src/main/java/org/apache/livy/rsc/RSCClient.java](https://codecov.io/gh/apache/incubator-livy/pull/121/diff?src=pr&el=tree#diff-cnNjL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9saXZ5L3JzYy9SU0NDbGllbnQuamF2YQ==) | `83.85% <0%> (+3.72%)` | `26% <0%> (ø)` | :arrow_down: |
    
    ------
    
    [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-livy/pull/121?src=pr&el=continue).
    > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
    > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
    > Powered by [Codecov](https://codecov.io/gh/apache/incubator-livy/pull/121?src=pr&el=footer). Last update [0bcc290...0705f18](https://codecov.io/gh/apache/incubator-livy/pull/121?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).



---

[GitHub] incubator-livy pull request #121: [LIVY-518][BUILD] Support Spark 2.4

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/incubator-livy/pull/121#discussion_r231342733
  
    --- Diff: rsc/src/test/java/org/apache/livy/rsc/TestSparkClient.java ---
    @@ -271,7 +275,7 @@ public void call(LivyClient client) throws Exception {
     
       @Test
       public void testSparkSQLJob() throws Exception {
    -    runTest(true, new TestFunction() {
    +    runTest(true, false, new TestFunction() {
    --- End diff --
    
    I'd rather not have to change this test. So it seems worth it to at least try to understand if it's an issue with Livy or Spark.
    
    If it's in Spark we may have to do this regardless, but it would be good to know that's the case.


---

[GitHub] incubator-livy pull request #121: [WIP][LIVY-518][BUILD] Support Spark 2.4

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/incubator-livy/pull/121#discussion_r225194172
  
    --- Diff: rsc/src/test/java/org/apache/livy/rsc/TestSparkClient.java ---
    @@ -271,7 +275,7 @@ public void call(LivyClient client) throws Exception {
     
       @Test
       public void testSparkSQLJob() throws Exception {
    -    runTest(true, new TestFunction() {
    +    runTest(true, false, new TestFunction() {
    --- End diff --
    
    This also particularly happens when the session that created Hive client is restarted with Hive support enabled multiple times(?). This was able to be reproduced in the local as well. I am still digging the root cause but so far the scope is specified to this test only.


---

[GitHub] incubator-livy issue #121: [WIP][LIVY-518][BUILD] Support Spark 2.4

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/incubator-livy/pull/121
  
    Yup, let me rebase and get rid of [WIP] tag.


---

[GitHub] incubator-livy pull request #121: [WIP][LIVY-518][BUILD] Support Spark 2.4

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/incubator-livy/pull/121#discussion_r225191745
  
    --- Diff: rsc/src/test/java/org/apache/livy/rsc/TestSparkClient.java ---
    @@ -271,7 +275,7 @@ public void call(LivyClient client) throws Exception {
     
       @Test
       public void testSparkSQLJob() throws Exception {
    -    runTest(true, new TestFunction() {
    +    runTest(true, false, new TestFunction() {
    --- End diff --
    
    I am not very sure on that but that's what's going on here in 2.4.0 - I doubly checked in my local and Travis CI.


---

[GitHub] incubator-livy issue #121: [LIVY-518][BUILD] Support Spark 2.4

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/incubator-livy/pull/121
  
    Great! Thank you, @HyukjinKwon , @mgaido91 , @vanzin !


---