You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by ozzieba <gi...@git.apache.org> on 2018/01/15 19:06:29 UTC

[GitHub] spark pull request #20272: [SPARK-23078] [CORE] allow Spark Thrift Server to...

GitHub user ozzieba opened a pull request:

    https://github.com/apache/spark/pull/20272

    [SPARK-23078] [CORE] allow Spark Thrift Server to run in Kubernetes Cluster mode

    ## What changes were proposed in this pull request?
    
    allow Spark Thrift Server to run in Kubernetes Cluster mode
    
    ## How was this patch tested?
    
    Manually

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ozzieba/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20272.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20272
    
----
commit 62ae925af514d8974b2a56066704c3ebcf53bf3d
Author: Oz Ben-Ami <oz...@...>
Date:   2018-01-15T19:02:25Z

    allow Spark Thrift Server to run in Kubernetes Cluster mode

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20272: [SPARK-23078] [CORE] [K8s] allow Spark Thrift Server to ...

Posted by deveshk0 <gi...@git.apache.org>.
Github user deveshk0 commented on the issue:

    https://github.com/apache/spark/pull/20272
  
    I have built spark with the same changes for thrift server. It is running fine for me.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20272: [SPARK-23078] [CORE] allow Spark Thrift Server to run in...

Posted by ozzieba <gi...@git.apache.org>.
Github user ozzieba commented on the issue:

    https://github.com/apache/spark/pull/20272
  
    @foxish @felixcheung I wrote a [test](https://github.com/ozzieba/spark-integration/commit/2c77c7d4ec9d1a82b0c097073eb71ebdfbac15b7), but I'm having trouble with Minikube on Windows, and I can't run on a remote Kubernetes cluster. I'll try to get minikube running in a different way and keep you updated


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20272: [SPARK-23078] [CORE] [K8s] allow Spark Thrift Server to ...

Posted by nrchakradhar <gi...@git.apache.org>.
Github user nrchakradhar commented on the issue:

    https://github.com/apache/spark/pull/20272
  
    Its working for us in our test environment. No issues so far. Only thing to take care is the clean-up of the driver manually when the thrift server is re-installed or started due to any reason.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20272: [SPARK-23078] [CORE] allow Spark Thrift Server to run in...

Posted by ozzieba <gi...@git.apache.org>.
Github user ozzieba commented on the issue:

    https://github.com/apache/spark/pull/20272
  
    I'm getting stuck on https://github.com/apache-spark-on-k8s/spark-integration/blob/master/integration-test/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/KubernetesSuite.scala#L106, will look again tomorrow


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20272: [SPARK-23078] [CORE] [K8s] allow Spark Thrift Server to ...

Posted by foxish <gi...@git.apache.org>.
Github user foxish commented on the issue:

    https://github.com/apache/spark/pull/20272
  
    Makes sense. The change LGTM.
    
    On Jan 29, 2018 10:23 AM, "Jiang Xingbo" <no...@github.com> wrote:
    
    > IIUC there was a issue in launching Thrift Server on YARN cluster mode,
    > and I'm not sure whether it has been fixed (maybe @jerryshao
    > <https://github.com/jerryshao> can kindly check that?) Anyway that is not
    > a problem on Spark side, therefore should not affect the Kubernetes cluster
    > mode.
    >
    > —
    > You are receiving this because you were mentioned.
    > Reply to this email directly, view it on GitHub
    > <https://github.com/apache/spark/pull/20272#issuecomment-361337893>, or mute
    > the thread
    > <https://github.com/notifications/unsubscribe-auth/AA3U51U4wlskTcUiDRn3JSJUIg2Dx7vfks5tPgyDgaJpZM4Re1NU>
    > .
    >



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20272: [SPARK-23078] [CORE] [K8s] allow Spark Thrift Server to ...

Posted by liyinan926 <gi...@git.apache.org>.
Github user liyinan926 commented on the issue:

    https://github.com/apache/spark/pull/20272
  
    @felixcheung I think yes and with https://github.com/apache/spark/pull/21748, users should be able to run the Thrift server in a pod.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20272: [SPARK-23078] [CORE] [K8s] allow Spark Thrift Server to ...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/20272
  
    So how are you expected to contact the thrift server after it's up, and even figure out where it started? Isn't the container unreachable from the outside world unless you do some port mapping on the host?
    
    I see part of that is explained in the bug, but sounds like this change needs better user documentation.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20272: [SPARK-23078] [CORE] allow Spark Thrift Server to run in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20272
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20272: [SPARK-23078] [CORE] allow Spark Thrift Server to run in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20272
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20272: [SPARK-23078] [CORE] [K8s] allow Spark Thrift Server to ...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/20272
  
    Is this aligned with the "in cluster client"? @foxish @mccheah 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20272: [SPARK-23078] [CORE] [K8s] allow Spark Thrift Server to ...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/20272
  
    If this can get a rebase, and maybe a few sentences in the k8s docs, I'll merge.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20272: [SPARK-23078] [CORE] [K8s] allow Spark Thrift Server to ...

Posted by deveshk0 <gi...@git.apache.org>.
Github user deveshk0 commented on the issue:

    https://github.com/apache/spark/pull/20272
  
    is this coming with spark 2.4 ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20272: [SPARK-23078] [CORE] allow Spark Thrift Server to run in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20272
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20272: [SPARK-23078] [CORE] [K8s] allow Spark Thrift Server to ...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/20272
  
    cc @liyinan926 Do you have some time to verify this?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20272: [SPARK-23078] [CORE] [K8s] allow Spark Thrift Server to ...

Posted by ozzieba <gi...@git.apache.org>.
Github user ozzieba commented on the issue:

    https://github.com/apache/spark/pull/20272
  
    @vanzin In general Kubernetes makes this super easy:
    - The most basic workflow is to use the driver pod name (which is in the output of Spark Submit, or can be found with `kubectl get pods`), and run  `kubectl port-forward spark-app-driver-podname 31416:10000`, which will automatically [forward](https://kubernetes.io/docs/tasks/access-application-cluster/port-forward-access-application-cluster/) localhost:31416 to the pod's port 10000. Any jdbc application can then be used to query `jdbc:hive2://localhost:31416`. This is the approach I took in the integration tests linked above.
    - Alternatively, any other application on the cluster can simply use spark-app-driver-podname:10000, which will be resolved by [kube-dns](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/).
    - For persistent external access one can run `kubectl expose pod spark-app-driver-podname --type=NodePort --port 10000` to create a Kubernetes [Service](https://kubernetes.io/docs/concepts/services-networking/service/) which will accept connections on a particular port of *every* node on the cluster and send them to the driver's port 10000. 
    - On a cloud environment, using `type=LoadBalancer` in the above will create a global [load balancer](https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/) that can be used to access Thrift from across the Internet.
    - One can also define a service that automatically selects the driver using a user-specified label (eg `bin/spark-submit.sh --conf spark.kubernetes.driver.label.mythriftserver=true` followed by `kubectl expose pod --selector=mythriftserver=true`)
    
    Perhaps the key point is that these are all core Kubernetes features, and not specific to Spark in any way. Users familiar with Kubernetes should be able to find the approach that works best for their environment and use case.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20272: [SPARK-23078] [CORE] [K8s] allow Spark Thrift Server to ...

Posted by suryag10 <gi...@git.apache.org>.
Github user suryag10 commented on the issue:

    https://github.com/apache/spark/pull/20272
  
    > If this can get a rebase, and maybe a few sentences in the k8s docs, I'll merge.
    
    Hi, I had rebased this patch to the latest master and this patch was missing another fix to run the STS on K8S cluster mode. Following is the PR
    
     https://github.com/apache/spark/pull/22433
    
    Can you please review this once?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20272: [SPARK-23078] [CORE] [K8s] allow Spark Thrift Server to ...

Posted by ozzieba <gi...@git.apache.org>.
Github user ozzieba commented on the issue:

    https://github.com/apache/spark/pull/20272
  
    Ultimately I see this as being in the realm of Kubernetes knowledge, rather than Spark knowledge. These features are well-documented in Kubernetes documentation, and I am not sure there's a need to replicate that here. For a user new to both Spark and Kubernetes a combined tutorial could certainly be helpful, but that should likely be a more comprehensive undertaking.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20272: [SPARK-23078] [CORE] [K8s] allow Spark Thrift Server to ...

Posted by ah- <gi...@git.apache.org>.
Github user ah- commented on the issue:

    https://github.com/apache/spark/pull/20272
  
    Is anyone still looking at this? It seems like it should just work?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20272: [SPARK-23078] [CORE] [K8s] allow Spark Thrift Server to ...

Posted by ozzieba <gi...@git.apache.org>.
Github user ozzieba commented on the issue:

    https://github.com/apache/spark/pull/20272
  
    @foxish per SPARK-5176 and the associated [PR](https://github.com/apache/spark/pull/4137), it seems there was a technical issue with spark-internal that wouldn't allow Thrift Server to run on YARN cluster mode. It was deemed a minor fix at the time, so perhaps the changes were made in the meantime. Regardless, it does now work at least on Kubernetes


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20272: [SPARK-23078] [CORE] [K8s] allow Spark Thrift Server to ...

Posted by ozzieba <gi...@git.apache.org>.
Github user ozzieba commented on the issue:

    https://github.com/apache/spark/pull/20272
  
    @andrewor14 @liancheng can you chime in?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20272: [SPARK-23078] [CORE] allow Spark Thrift Server to...

Posted by ozzieba <gi...@git.apache.org>.
Github user ozzieba commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20272#discussion_r162108920
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
    @@ -328,7 +328,7 @@ object SparkSubmit extends CommandLineUtils with Logging {
             printErrorAndExit("Cluster deploy mode is not applicable to Spark shells.")
           case (_, CLUSTER) if isSqlShell(args.mainClass) =>
             printErrorAndExit("Cluster deploy mode is not applicable to Spark SQL shell.")
    -      case (_, CLUSTER) if isThriftServer(args.mainClass) =>
    +      case (_, CLUSTER) if (clusterManager != KUBERNETES) && isThriftServer(args.mainClass) =>
             printErrorAndExit("Cluster deploy mode is not applicable to Spark Thrift server.")
    --- End diff --
    
    Can you elaborate on what might go wrong? I have been using it successfully to run queries, is there anything else that should be tested?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20272: [SPARK-23078] [CORE] [K8s] allow Spark Thrift Server to ...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/20272
  
    It's really not that much work to write a couple of sentences about this in the running-on-k8s docs, even if you're just pointing people to the kubernetes documentation. Is that really so controversial?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20272: [SPARK-23078] [CORE] [K8s] allow Spark Thrift Server to ...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/20272
  
    IIUC there was a issue in launching Thrift Server on YARN cluster mode, and I'm not sure whether it has been fixed (maybe @jerryshao can kindly check that?) Anyway that is not a problem on Spark side, therefore should not affect the Kubernetes cluster mode.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20272: [SPARK-23078] [CORE] allow Spark Thrift Server to...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20272#discussion_r161979319
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
    @@ -328,7 +328,7 @@ object SparkSubmit extends CommandLineUtils with Logging {
             printErrorAndExit("Cluster deploy mode is not applicable to Spark shells.")
           case (_, CLUSTER) if isSqlShell(args.mainClass) =>
             printErrorAndExit("Cluster deploy mode is not applicable to Spark SQL shell.")
    -      case (_, CLUSTER) if isThriftServer(args.mainClass) =>
    +      case (_, CLUSTER) if (clusterManager != KUBERNETES) && isThriftServer(args.mainClass) =>
             printErrorAndExit("Cluster deploy mode is not applicable to Spark Thrift server.")
    --- End diff --
    
    I haven't dug through but this might break if there has been any assumption that thrift is not running in cluster mode?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20272: [SPARK-23078] [CORE] [K8s] allow Spark Thrift Server to ...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/20272
  
    Great, how about explaining that in some place more visible to Spark users than a PR on github.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20272: [SPARK-23078] [CORE] [K8s] allow Spark Thrift Server to ...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/20272
  
    >IIUC there was a issue in launching Thrift Server on YARN cluster mode, and I'm not sure whether it has been fixed (maybe @jerryshao can kindly check that?)
    
    Sorry I cannot remember the issue. Yarn cluster mode doesn't support thriftserver.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20272: [SPARK-23078] [CORE] [K8s] allow Spark Thrift Server to ...

Posted by foxish <gi...@git.apache.org>.
Github user foxish commented on the issue:

    https://github.com/apache/spark/pull/20272
  
    @jiangxb1987 Is there any specific owner of the thrift server that we can ping here? The testing looks good - so, all we're waiting for is confirmation from them on the original intent behind disallowing the thrift server in cluster mode.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20272: [SPARK-23078] [CORE] allow Spark Thrift Server to run in...

Posted by foxish <gi...@git.apache.org>.
Github user foxish commented on the issue:

    https://github.com/apache/spark/pull/20272
  
    @ozzieba, can we add a test to our integration test set to ensure this works? It's rather late in the Spark 2.3 release, and I'd be apprehensive about adding things that haven't been extensively tested.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20272: [SPARK-23078] [CORE] [K8s] allow Spark Thrift Server to ...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/20272
  
    still need to run tests https://github.com/apache/spark/pull/20272#pullrequestreview-108271893



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20272: [SPARK-23078] [CORE] allow Spark Thrift Server to run in...

Posted by ozzieba <gi...@git.apache.org>.
Github user ozzieba commented on the issue:

    https://github.com/apache/spark/pull/20272
  
    @foxish @felixcheung Now verified my test runs successfully https://github.com/apache-spark-on-k8s/spark-integration/pull/38


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20272: [SPARK-23078] [CORE] [K8s] allow Spark Thrift Server to ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20272
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org