You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by GitBox <gi...@apache.org> on 2020/04/10 06:00:24 UTC

[GitHub] [zeppelin] Leemoonsoo opened a new pull request #3728: [ZEPPELIN-4748] Format Spark web ui url dynamically on Kubernetes

Leemoonsoo opened a new pull request #3728: [ZEPPELIN-4748] Format Spark web ui url dynamically on Kubernetes
URL: https://github.com/apache/zeppelin/pull/3728
 
 
   ### What is this PR for?
   When Zeppelin is running on Kubernetes, SparkUI URL should be dynamically generated, while Kubernetes Service name for Spark interpreter Pod is generated on runtime. And Ingress controller or reverse-proxy route traffic to SparkUI.
   
   Problem is, depends on those Ingress or reverse proxy configuration, different SparkUI url format might be required.
   
   Currently, generated url format is hardcoded to "//<PORT>-<SERVICE_NAME>.<SERVICE_DOMAIN>". And letting user set 'zeppelin.spark.uiWebUrl' with static value doesn't help at all while url is decided on runtime.
   
   This PR accept [jinja template](https://jinja.palletsprojects.com/en/2.11.x/) string from 'zeppelin.spark.uiWebUrl' and bind 3 variables 'PORT', 'SERVICE_NAME', 'SERVICE_DOMAIN'. Therefore any URL pattern required by Ingress/Reverse-proxy can be specified. Each variable has values
   
    * PORT - spark ui port
    * SERVICE_NAME - [Service](https://kubernetes.io/docs/concepts/services-networking/service/) name for Spark Interpreter Pod.
    * SERVICE_DOMAIN - value of SERVICE_DOMAIN env variable.
   
   For example, when spark UI is running on port '4040', Service name for Spark interpreter pod is 'spark-wcoyqq', SERVICE_DOMAIN is my.domain.io,
   
   ```
   https://port-{{PORT}}-{{SERVICE_NAME}}.{{SERVICE_DOMAIN}}
   ```
   value on 'zeppelin.spark.uiWebUrl' property will generate Spark UI link with address
   
   ```
   https://port-4040-spark-wcoyqq.mydomain.io
   ```
   
   
   
   ### What type of PR is it?
   Improvement
   
   ### What is the Jira issue?
   https://issues.apache.org/jira/browse/ZEPPELIN-4748
   
   ### Questions:
   * Does the licenses files need update? no
   * Is there breaking changes for older versions? no
   * Does this needs documentation? yes
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zeppelin] asfgit closed pull request #3728: [ZEPPELIN-4748] Format Spark web ui url dynamically on Kubernetes

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #3728: [ZEPPELIN-4748] Format Spark web ui url dynamically on Kubernetes
URL: https://github.com/apache/zeppelin/pull/3728
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zeppelin] Leemoonsoo commented on issue #3728: [ZEPPELIN-4748] Format Spark web ui url dynamically on Kubernetes

Posted by GitBox <gi...@apache.org>.
Leemoonsoo commented on issue #3728: [ZEPPELIN-4748] Format Spark web ui url dynamically on Kubernetes
URL: https://github.com/apache/zeppelin/pull/3728#issuecomment-612965590
 
 
   Since I think this PR can be still useful with possible future improvements discussed above and does not change the current behavior, I'm merging this to master and branch-0.9, if no further comment.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zeppelin] Leemoonsoo edited a comment on issue #3728: [ZEPPELIN-4748] Format Spark web ui url dynamically on Kubernetes

Posted by GitBox <gi...@apache.org>.
Leemoonsoo edited a comment on issue #3728: [ZEPPELIN-4748] Format Spark web ui url dynamically on Kubernetes
URL: https://github.com/apache/zeppelin/pull/3728#issuecomment-612094455
 
 
   Regarding using Deployment for instead of Pod, I generally agree but I think we need one consideration.
   
   Some interpreters (like markdown, jdbc) are stateless. So Pod can silently be re-created.
   However stateful interpreters, like Spark, Python, when Pod is re-created, it loses all states and users need to know about that.
   
   So, unless we have some mechanism to notify a user about Pod re-creation, I think it is not a bad idea to let Pod disappear and let the user get an error message about that.
   
   What do you think?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zeppelin] Leemoonsoo commented on issue #3728: [ZEPPELIN-4748] Format Spark web ui url dynamically on Kubernetes

Posted by GitBox <gi...@apache.org>.
Leemoonsoo commented on issue #3728: [ZEPPELIN-4748] Format Spark web ui url dynamically on Kubernetes
URL: https://github.com/apache/zeppelin/pull/3728#issuecomment-612091801
 
 
   Hi @Reamer. Thanks for the comment.
   
   Proxy sutff via nginx and go-dnsmasq is there to make Zeppelin on Kubernetes out of the box, including SparkUI, on minkube environment. The current configuration out of the Box does not include Ingress controller.
   
   So, it is up to the user to how to configure Ingress controller. One way (a) is like you did let each interpreter create Ingress configuration in their creation, another way (b) is to let one ingress controller accept all traffic and make nginx proxy route traffic to individual interpreters.
   
   I think it is better to let the user decide (a) or (b). while
     - some ingress controller (like cloud provider's ingress controller) takes few minutes to be created, and some does not (like nginx ingress controller)
     - some ingress controller takes additional cost (like cloud provider's ingress controller) on each creation, and some does not (like nginx ingress controller)
     - ingress controller need to be customized for each user (like host, path, domain name, subdomain name, tls, etc) 
   
   So, how about make out-of-box configuration ready for both (a) and (b), and let user decide which one to use?
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zeppelin] Leemoonsoo commented on issue #3728: [ZEPPELIN-4748] Format Spark web ui url dynamically on Kubernetes

Posted by GitBox <gi...@apache.org>.
Leemoonsoo commented on issue #3728: [ZEPPELIN-4748] Format Spark web ui url dynamically on Kubernetes
URL: https://github.com/apache/zeppelin/pull/3728#issuecomment-612435229
 
 
   > You misunderstood me a bit about how I want to use k8s deployments. The interpreter should run in a pod, only the Zeppelin server should be controlled by a deployment. When the Zeppelin server pod dies, the deployment controller creates a new Zeppelin server pod. Child k8 resources created by the old pod should be deleted, including interpreter pods, roles, role binding, services, etc.
   > 
   > I think it is quite complicated to recreate a stateful Interpreter pod with a K8s-deployment, because it's internally controlled by K8s and the Zeppelin server doesn't know about it.
   > To simplify this, we should trigger the creation of a new pod from the Zeppelin server.
   > 
   > It is possible that we change the out-of-box configuration (b) to a Zeppelin server deployment?
   
   Ah right. I misunderstood. I agree Zeppelin server deployment needs to be Deployment.  Since you are already using Deployment controller, can you make a pull request for this?
   
   > I will add version (a), if I have time.
   
   That would be great!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zeppelin] Reamer commented on issue #3728: [ZEPPELIN-4748] Format Spark web ui url dynamically on Kubernetes

Posted by GitBox <gi...@apache.org>.
Reamer commented on issue #3728: [ZEPPELIN-4748] Format Spark web ui url dynamically on Kubernetes
URL: https://github.com/apache/zeppelin/pull/3728#issuecomment-611928956
 
 
   Hi @Leemoonsoo,
   Can we completely remove the proxy stuff via nginx and go-dnsmasq in the Zeppelin pod and use k8s resources?
   
   I tried to workaround some things with the goal of a cleaner k8s deployment.
   At the moment I use this [interpreter-spec.](https://gist.github.com/Reamer/c4d777ab76aae1bbbf1c64fea59a8a45) This creates an ingress and a service resource for every interpreter pod. Because of that, I doesn't need the nginx-proxy and dnymasq.
   I disabled sparkUI completely, because the URL in zeppelin-server GUI is wrong, but I can access the sparkUI manually via ingress.
   
   I use a [K8s-Deployment](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/) to manage the zeppelin server. The current [pod example](https://github.com/apache/zeppelin/blob/master/k8s/zeppelin-server.yaml#L77-L166) is unsatisfactory, because if a [Node dies](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-lifetime) the pod also dies.
   My zeppelin-server deployment looks like this. [GitHub-Gist](https://gist.github.com/Reamer/bac4e568064809160c969034ed5a4b18).
   Because of the deployment, this [condition in zeppelin code](https://github.com/apache/zeppelin/blob/98b26a12f4ebe73e9ab9aa011e9a8a56731addbe/zeppelin-plugins/launcher/k8s-standard/src/main/java/org/apache/zeppelin/interpreter/launcher/K8sStandardInterpreterLauncher.java#L112) isn't truth any more. Therefore I'm using [a small workaround](https://gist.github.com/Reamer/c4d777ab76aae1bbbf1c64fea59a8a45#file-interpreter-spec-yml-L51).
   
   We should be able to render a correct webUI for the zeppelin GUI, because the zeppelin-server itself creates the pod, service and ingress object, 
   
   I hope that are not to much changes and you can follow my idea.
   
   What you're thinking about that?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zeppelin] Reamer commented on issue #3728: [ZEPPELIN-4748] Format Spark web ui url dynamically on Kubernetes

Posted by GitBox <gi...@apache.org>.
Reamer commented on issue #3728: [ZEPPELIN-4748] Format Spark web ui url dynamically on Kubernetes
URL: https://github.com/apache/zeppelin/pull/3728#issuecomment-612383791
 
 
   > So, how about make out-of-box configuration ready for both (a) and (b), and let user decide which one to use?
   
   I totally agree with you. Thanks for your explanations. I host K8s on my own hardware, so I wasn't aware of the additional costs and time in public clouds.
   
   You misunderstood me a bit about how I want to use k8s deployments. The interpreter should run in a pod, only the Zeppelin server should be controlled by a deployment. When the Zeppelin server pod dies, the deployment controller creates a new Zeppelin server pod. Child k8 resources created by the old pod should be deleted, including interpreter pods, roles, role binding, services, etc.
   
   I think it is quite complicated to recreate a stateful Interpreter pod with a K8s-deployment, because it's internally controlled by K8s and the Zeppelin server doesn't know about it.
   To simplify this, we should trigger the creation of a new pod from the Zeppelin server.
   
   It is possible that we change the out-of-box configuration (b) to a Zeppelin server deployment?
   
   I will add version (a), if I have time.
   
   Best Regards
   Reamer

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zeppelin] Leemoonsoo commented on issue #3728: [ZEPPELIN-4748] Format Spark web ui url dynamically on Kubernetes

Posted by GitBox <gi...@apache.org>.
Leemoonsoo commented on issue #3728: [ZEPPELIN-4748] Format Spark web ui url dynamically on Kubernetes
URL: https://github.com/apache/zeppelin/pull/3728#issuecomment-612094455
 
 
   Regarding using Deployment for instead of Pod, I generally agree but I think we need one consideration.
   
   Some interpreters (like markdown, jdbc) are stateless. So Pod can silently re-create.
   However stateful interpreters, like Spark, Python, when Pod is re-created, it loses all states and users need to know about it.
   
   So, unless we have some mechanism to notify user about Pod re-creation, I think it is not a bad idea to let Pod disappear and let the user get an error message about that.
   
   What do you think?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services