You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@toree.apache.org by lresende <gi...@git.apache.org> on 2017/07/17 23:00:09 UTC

[GitHub] incubator-toree pull request #128: [TOREE-425] Force sparkContext initializa...

GitHub user lresende opened a pull request:

    https://github.com/apache/incubator-toree/pull/128

    [TOREE-425] Force sparkContext initialization in cluster mode

    Kernels running in yarn-cluster mode (when launched via spark-submit)
    must initialize a SparkContext in order for the Spark Yarn code
    to register the application as RUNNING

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/lresende/incubator-toree toree-425

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-toree/pull/128.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #128
    
----
commit f80a5484eb0a100b66227ee1e3b4968c568a978b
Author: Luciano Resende <lr...@apache.org>
Date:   2017-07-17T22:56:56Z

    [TOREE-425] Force sparkContext initialization in cluster mode
    
    Kernels running in yarn-cluster mode (when launched via spark-submit)
    must initialize a SparkContext in order for the Spark Yarn code
    to register the application as RUNNING

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-toree pull request #128: [TOREE-425] Force sparkContext initializa...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/incubator-toree/pull/128#discussion_r127858771
  
    --- Diff: kernel/src/main/scala/org/apache/toree/kernel/api/Kernel.scala ---
    @@ -20,8 +20,11 @@ package org.apache.toree.kernel.api
     import java.io.{InputStream, PrintStream}
     import java.net.URI
     import java.util.concurrent.{ConcurrentHashMap, TimeUnit, TimeoutException}
    +
    --- End diff --
    
    Nit: unnecessary changes in imports make backports and branch maintenance harder. I'd prefer not to have these changes unless there is a stated style that this changes conforms to.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-toree pull request #128: [TOREE-425] Force sparkContext initializa...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/incubator-toree/pull/128#discussion_r127858852
  
    --- Diff: kernel/src/main/scala/org/apache/toree/boot/layer/ComponentInitialization.scala ---
    @@ -94,6 +98,18 @@ trait StandardComponentInitialization extends ComponentInitialization {
     
       }
     
    +  def initializeSparkContext(config:Config, kernel:Kernel) = {
    +    // TOREE:425 Spark cluster mode requires a context to be initialized before
    +    // it register the application as Running
    +    if ( SparkUtils.isSparkClusterMode(kernel.sparkConf) ) {
    --- End diff --
    
    Nit: Are the spaces needed? I think this is non-standard for Scala or Java projects.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-toree pull request #128: [TOREE-425] Force sparkContext initializa...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/incubator-toree/pull/128#discussion_r127859204
  
    --- Diff: kernel/src/main/scala/org/apache/toree/boot/layer/ComponentInitialization.scala ---
    @@ -82,6 +84,8 @@ trait StandardComponentInitialization extends ComponentInitialization {
     
         initializePlugins(config, pluginManager)
     
    +    initializeSparkContext(config, kernel)
    --- End diff --
    
    Should this be here, or in the kernel's initialization? I think we eventually want Toree to not require Spark, in which case we would have a SparkKernel and a regular Kernel. Then it would be the responsibility of the SparkKernel to detect that the application is in cluster mode and initialize. Doing this sooner rather than later would avoid more changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-toree pull request #128: [TOREE-425] Force sparkContext initializa...

Posted by lresende <gi...@git.apache.org>.
Github user lresende closed the pull request at:

    https://github.com/apache/incubator-toree/pull/128


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-toree pull request #128: [TOREE-425] Force sparkContext initializa...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/incubator-toree/pull/128#discussion_r127858615
  
    --- Diff: kernel/src/main/scala/org/apache/toree/kernel/api/Kernel.scala ---
    @@ -414,13 +417,15 @@ class Kernel (
               Await.result(sessionFuture, Duration(100, TimeUnit.MILLISECONDS))
             } catch {
               case timeout: TimeoutException =>
    -            // getting the session is taking a long time, so assume that Spark
    -            // is starting and print a message
    -            display.content(
    -              MIMEType.PlainText, "Waiting for a Spark session to start...")
    +            // in cluster mode, the sparkContext is forced to initialize
    +            if (SparkUtils.isSparkClusterMode(defaultSparkConf) == false) {
    --- End diff --
    
    Instead of preventing the message from being sent, I think this should update the logic so that this uses the second case. That case just creates a new context without the futures or Await calls.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---