You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Gurvinder Singh <gu...@uninett.no> on 2016/05/21 16:30:08 UTC

spark on kubernetes

Hi,

I am currently working on deploying Spark on kuberentes (K8s) and it is
working fine. I am running Spark with standalone mode and checkpointing
the state to shared system. So if master fails K8s starts it and from
checkpoint it recover the earlier state and things just works fine. I
have an issue with the Spark master Web UI to access the worker and
application UI links. In brief, kubernetes service model allows me to
expose the master service to internet, but accessing the
application/workers UI is not possible as then I have to expose them too
individually and given I can have multiple application it becomes hard
to manage.

One solution can be that the master can act as reverse proxy to access
information/state/logs from application/workers. As it has the
information about their endpoint when application/worker register with
master, so when a user initiate a request to access the information,
master can proxy the request to corresponding endpoint.

So I am wondering if someone has already done work in this direction
then it would be great to know. If not then would the community will be
interesting in such feature. If yes then how and where I should get
started as it would be helpful for me to have some guidance to start
working on this.

Kind Regards,
Gurvinder

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: spark on kubernetes

Posted by Gurvinder Singh <gu...@uninett.no>.
On 05/22/2016 08:30 AM, Sun Rui wrote:
> I think \u201creverse proxy\u201d is beneficial  to monitoring a cluster in a
> secure way. This feature is not only desired for Spark on standalone,
> but also Spark on YARN, and also projects other than spark.
I think to secure the Spark you can use any reverse proxy out there e.g
Knox or light weight as nginx/node-http-proxy or pick your favorite
language. There are even oauth2-proxy
(https://github.com/bitly/oauth2_proxy) too which can secure for example
Spark UI using github/google accounts.

But the issue here is that in the Spark master UI page has links to
information about workers which points to their internal IP addresses,
so you need to have either VPN or on the same network to get the worker
information e.g logs, etc. Same goes for application UI as driver is
inside the spark cluster network.

So the idea is that the Spark master UI can act as a reverse proxy to
get these information. for example

Worker with ID worker1 running at IP address 10.2.3.4:8081 in current
situation a user accessing the master UI and want to see information
from worker1, user needs to either connect VPN or have 10.2.3.4
accessible from his/her machine. So the proposal is to have a
functionality in spark master UI where to access the worker with ID
worker1 the link will be like spark-master.com/worker1 when user access
this link, master will proxy this to 10.2.3.4:8081 and back. So user
does not need to be on the same network.

This will really simplify the spark ui access in general case too where
you will need to expose only one IP to the public.

I have done preliminary study of the code and it seems Spark is using
Jetty for it and Jetty has ProxyServlet which can serve this purpose. So
would be good to know if community is interested in having such a
feature and get together to add it then :)

- Gurvinder
> 
> Maybe Apache Knox can help you. Not sure how Knox can integrate with Spark.
>> On May 22, 2016, at 00:30, Gurvinder Singh <gurvinder.singh@uninett.no
>> <ma...@uninett.no>> wrote:
>>
>> standalone mod
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: spark on kubernetes

Posted by Sun Rui <su...@163.com>.
I think “reverse proxy” is beneficial  to monitoring a cluster in a secure way. This feature is not only desired for Spark on standalone, but also Spark on YARN, and also projects other than spark.

Maybe Apache Knox can help you. Not sure how Knox can integrate with Spark.
> On May 22, 2016, at 00:30, Gurvinder Singh <gu...@uninett.no> wrote:
> 
> standalone mod


Re: spark on kubernetes

Posted by Gurvinder Singh <gu...@uninett.no>.
OK created this issue https://issues.apache.org/jira/browse/SPARK-15487
please comment on this and also let me know if anyone want to
collaborate on implementing it. Its my first contribution to Spark so
will be exciting.

- Gurvinder
On 05/23/2016 07:55 PM, Gurvinder Singh wrote:
> On 05/23/2016 07:18 PM, Radoslaw Gruchalski wrote:
>> Sounds surprisingly close to this:
>> https://github.com/apache/spark/pull/9608
>>
> I might have overlooked it but bridge mode work appears to make Spark
> work with docker containers and able to communicate with them when
> running on more than one machines.
> 
> Here I am trying to enable getting information from Spark UI
> irrespective of Spark running in containers or not. Spark UI's link to
> workers and application drivers are pointing to internal/protected
> network. So to get this information from user's machine, he/she has to
> connect to VPN. Therefore the proposal is to make Spark master UI
> reverse proxy this information back to user. So only Spark master UI
> needs to be opened up to internet and there is no need to change
> anything else how Spark runs internally either in Standalone mode, Mesos
> or in containers on kubernetes.
> 
> - Gurvinder
>> I can ressurect the work on the bridge mode for Spark 2. The reason why
>> the work on the old one was suspended was because Spark was going
>> through so many changes at that time that a lot of work done, was wiped
>> out by the changes towards 2.0.
>>
>> I know that Lightbend was also interested in having bridge mode.
>>
>> \u2013
>> Best regards,\u2028
>> Radek Gruchalski
>> \u2028radek@gruchalski.com <ma...@gruchalski.com>
>> de.linkedin.com/in/radgruchalski
>>
>> *Confidentiality:
>> *This communication is intended for the above-named person and may be
>> confidential and/or legally privileged.
>> If it has come to you in error you must take no action based on it, nor
>> must you copy or show it to anyone; please delete/destroy and inform the
>> sender immediately.
>>
>>
>> On May 23, 2016 at 7:14:51 PM, Timothy Chen (tnachen@gmail.com
>> <ma...@gmail.com>) wrote:
>>
>>> This will also simplify Mesos users as well, DCOS has to work around
>>> this with our own proxying.
>>>
>>> Tim
>>>
>>> On Sun, May 22, 2016 at 11:53 PM, Gurvinder Singh
>>> <gu...@uninett.no> wrote:
>>>> Hi Reynold,
>>>>
>>>> So if that's OK with you, can I go ahead and create JIRA for this. As it
>>>> seems this feature is missing currently and can benefit not just for
>>>> kubernetes users but in general Spark standalone mode users too.
>>>>
>>>> - Gurvinder
>>>> On 05/22/2016 12:49 PM, Gurvinder Singh wrote:
>>>>> On 05/22/2016 10:23 AM, Sun Rui wrote:
>>>>>> If it is possible to rewrite URL in outbound responses in Knox or other reverse proxy, would that solve your issue?
>>>>> Any process which can keep track of workers and application drivers IP
>>>>> addresses and route traffic to those will work. Considering Spark Master
>>>>> does exactly this due to all workers and application has to register to
>>>>> the master, therefore I propose master to be the place to add such a
>>>>> functionality.
>>>>>
>>>>> I am not aware with Knox capabilities but Nginx or any other normal
>>>>> reverse proxy will not be able to this on its own due to dynamic nature
>>>>> of application drivers and to some extent workers too.
>>>>>
>>>>> - Gurvinder
>>>>>>> On May 22, 2016, at 14:55, Gurvinder Singh <gu...@uninett.no> wrote:
>>>>>>>
>>>>>>> On 05/22/2016 08:32 AM, Reynold Xin wrote:
>>>>>>>> Kubernetes itself already has facilities for http proxy, doesn't it?
>>>>>>>>
>>>>>>> Yeah kubernetes has ingress controller which can act the L7 load
>>>>>>> balancer and router traffic to Spark UI in this case. But I am referring
>>>>>>> to link present in UI to worker and application UI. Replied in the
>>>>>>> detail to Sun Rui's mail where I gave example of possible scenario.
>>>>>>>
>>>>>>> - Gurvinder
>>>>>>>>
>>>>>>>> On Sat, May 21, 2016 at 9:30 AM, Gurvinder Singh
>>>>>>>> <gurvinder.singh@uninett.no <ma...@uninett.no>> wrote:
>>>>>>>>
>>>>>>>>    Hi,
>>>>>>>>
>>>>>>>>    I am currently working on deploying Spark on kuberentes (K8s) and it is
>>>>>>>>    working fine. I am running Spark with standalone mode and checkpointing
>>>>>>>>    the state to shared system. So if master fails K8s starts it and from
>>>>>>>>    checkpoint it recover the earlier state and things just works fine. I
>>>>>>>>    have an issue with the Spark master Web UI to access the worker and
>>>>>>>>    application UI links. In brief, kubernetes service model allows me to
>>>>>>>>    expose the master service to internet, but accessing the
>>>>>>>>    application/workers UI is not possible as then I have to expose them too
>>>>>>>>    individually and given I can have multiple application it becomes hard
>>>>>>>>    to manage.
>>>>>>>>
>>>>>>>>    One solution can be that the master can act as reverse proxy to access
>>>>>>>>    information/state/logs from application/workers. As it has the
>>>>>>>>    information about their endpoint when application/worker register with
>>>>>>>>    master, so when a user initiate a request to access the information,
>>>>>>>>    master can proxy the request to corresponding endpoint.
>>>>>>>>
>>>>>>>>    So I am wondering if someone has already done work in this direction
>>>>>>>>    then it would be great to know. If not then would the community will be
>>>>>>>>    interesting in such feature. If yes then how and where I should get
>>>>>>>>    started as it would be helpful for me to have some guidance to start
>>>>>>>>    working on this.
>>>>>>>>
>>>>>>>>    Kind Regards,
>>>>>>>>    Gurvinder
>>>>>>>>
>>>>>>>>    ---------------------------------------------------------------------
>>>>>>>>    To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>>>>>>>    <ma...@spark.apache.org>
>>>>>>>>    For additional commands, e-mail: dev-help@spark.apache.org
>>>>>>>>    <ma...@spark.apache.org>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>>>>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>>>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: spark on kubernetes

Posted by Gurvinder Singh <gu...@uninett.no>.
On 05/23/2016 07:18 PM, Radoslaw Gruchalski wrote:
> Sounds surprisingly close to this:
> https://github.com/apache/spark/pull/9608
> 
I might have overlooked it but bridge mode work appears to make Spark
work with docker containers and able to communicate with them when
running on more than one machines.

Here I am trying to enable getting information from Spark UI
irrespective of Spark running in containers or not. Spark UI's link to
workers and application drivers are pointing to internal/protected
network. So to get this information from user's machine, he/she has to
connect to VPN. Therefore the proposal is to make Spark master UI
reverse proxy this information back to user. So only Spark master UI
needs to be opened up to internet and there is no need to change
anything else how Spark runs internally either in Standalone mode, Mesos
or in containers on kubernetes.

- Gurvinder
> I can ressurect the work on the bridge mode for Spark 2. The reason why
> the work on the old one was suspended was because Spark was going
> through so many changes at that time that a lot of work done, was wiped
> out by the changes towards 2.0.
> 
> I know that Lightbend was also interested in having bridge mode.
> 
> \u2013
> Best regards,\u2028
> Radek Gruchalski
> \u2028radek@gruchalski.com <ma...@gruchalski.com>
> de.linkedin.com/in/radgruchalski
> 
> *Confidentiality:
> *This communication is intended for the above-named person and may be
> confidential and/or legally privileged.
> If it has come to you in error you must take no action based on it, nor
> must you copy or show it to anyone; please delete/destroy and inform the
> sender immediately.
> 
> 
> On May 23, 2016 at 7:14:51 PM, Timothy Chen (tnachen@gmail.com
> <ma...@gmail.com>) wrote:
> 
>> This will also simplify Mesos users as well, DCOS has to work around
>> this with our own proxying.
>>
>> Tim
>>
>> On Sun, May 22, 2016 at 11:53 PM, Gurvinder Singh
>> <gu...@uninett.no> wrote:
>> > Hi Reynold,
>> >
>> > So if that's OK with you, can I go ahead and create JIRA for this. As it
>> > seems this feature is missing currently and can benefit not just for
>> > kubernetes users but in general Spark standalone mode users too.
>> >
>> > - Gurvinder
>> > On 05/22/2016 12:49 PM, Gurvinder Singh wrote:
>> >> On 05/22/2016 10:23 AM, Sun Rui wrote:
>> >>> If it is possible to rewrite URL in outbound responses in Knox or other reverse proxy, would that solve your issue?
>> >> Any process which can keep track of workers and application drivers IP
>> >> addresses and route traffic to those will work. Considering Spark Master
>> >> does exactly this due to all workers and application has to register to
>> >> the master, therefore I propose master to be the place to add such a
>> >> functionality.
>> >>
>> >> I am not aware with Knox capabilities but Nginx or any other normal
>> >> reverse proxy will not be able to this on its own due to dynamic nature
>> >> of application drivers and to some extent workers too.
>> >>
>> >> - Gurvinder
>> >>>> On May 22, 2016, at 14:55, Gurvinder Singh <gu...@uninett.no> wrote:
>> >>>>
>> >>>> On 05/22/2016 08:32 AM, Reynold Xin wrote:
>> >>>>> Kubernetes itself already has facilities for http proxy, doesn't it?
>> >>>>>
>> >>>> Yeah kubernetes has ingress controller which can act the L7 load
>> >>>> balancer and router traffic to Spark UI in this case. But I am referring
>> >>>> to link present in UI to worker and application UI. Replied in the
>> >>>> detail to Sun Rui's mail where I gave example of possible scenario.
>> >>>>
>> >>>> - Gurvinder
>> >>>>>
>> >>>>> On Sat, May 21, 2016 at 9:30 AM, Gurvinder Singh
>> >>>>> <gurvinder.singh@uninett.no <ma...@uninett.no>> wrote:
>> >>>>>
>> >>>>>    Hi,
>> >>>>>
>> >>>>>    I am currently working on deploying Spark on kuberentes (K8s) and it is
>> >>>>>    working fine. I am running Spark with standalone mode and checkpointing
>> >>>>>    the state to shared system. So if master fails K8s starts it and from
>> >>>>>    checkpoint it recover the earlier state and things just works fine. I
>> >>>>>    have an issue with the Spark master Web UI to access the worker and
>> >>>>>    application UI links. In brief, kubernetes service model allows me to
>> >>>>>    expose the master service to internet, but accessing the
>> >>>>>    application/workers UI is not possible as then I have to expose them too
>> >>>>>    individually and given I can have multiple application it becomes hard
>> >>>>>    to manage.
>> >>>>>
>> >>>>>    One solution can be that the master can act as reverse proxy to access
>> >>>>>    information/state/logs from application/workers. As it has the
>> >>>>>    information about their endpoint when application/worker register with
>> >>>>>    master, so when a user initiate a request to access the information,
>> >>>>>    master can proxy the request to corresponding endpoint.
>> >>>>>
>> >>>>>    So I am wondering if someone has already done work in this direction
>> >>>>>    then it would be great to know. If not then would the community will be
>> >>>>>    interesting in such feature. If yes then how and where I should get
>> >>>>>    started as it would be helpful for me to have some guidance to start
>> >>>>>    working on this.
>> >>>>>
>> >>>>>    Kind Regards,
>> >>>>>    Gurvinder
>> >>>>>
>> >>>>>    ---------------------------------------------------------------------
>> >>>>>    To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> >>>>>    <ma...@spark.apache.org>
>> >>>>>    For additional commands, e-mail: dev-help@spark.apache.org
>> >>>>>    <ma...@spark.apache.org>
>> >>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>>> ---------------------------------------------------------------------
>> >>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> >>>> For additional commands, e-mail: dev-help@spark.apache.org
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> >>> For additional commands, e-mail: dev-help@spark.apache.org
>> >>>
>> >>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> >> For additional commands, e-mail: dev-help@spark.apache.org
>> >>
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> > For additional commands, e-mail: dev-help@spark.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> For additional commands, e-mail: dev-help@spark.apache.org
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: spark on kubernetes

Posted by Radoslaw Gruchalski <ra...@gruchalski.com>.
Sounds surprisingly close to this:
https://github.com/apache/spark/pull/9608

I can ressurect the work on the bridge mode for Spark 2. The reason why the work on the old one was suspended was because Spark was going through so many changes at that time that a lot of work done, was wiped out by the changes towards 2.0.

I know that Lightbend was also interested in having bridge mode.
–  
Best regards,

Radek Gruchalski

radek@gruchalski.com
de.linkedin.com/in/radgruchalski

Confidentiality:
This communication is intended for the above-named person and may be confidential and/or legally privileged.
If it has come to you in error you must take no action based on it, nor must you copy or show it to anyone; please delete/destroy and inform the sender immediately.

On May 23, 2016 at 7:14:51 PM, Timothy Chen (tnachen@gmail.com) wrote:

This will also simplify Mesos users as well, DCOS has to work around  
this with our own proxying.  

Tim  

On Sun, May 22, 2016 at 11:53 PM, Gurvinder Singh  
<gu...@uninett.no> wrote:  
> Hi Reynold,  
>  
> So if that's OK with you, can I go ahead and create JIRA for this. As it  
> seems this feature is missing currently and can benefit not just for  
> kubernetes users but in general Spark standalone mode users too.  
>  
> - Gurvinder  
> On 05/22/2016 12:49 PM, Gurvinder Singh wrote:  
>> On 05/22/2016 10:23 AM, Sun Rui wrote:  
>>> If it is possible to rewrite URL in outbound responses in Knox or other reverse proxy, would that solve your issue?  
>> Any process which can keep track of workers and application drivers IP  
>> addresses and route traffic to those will work. Considering Spark Master  
>> does exactly this due to all workers and application has to register to  
>> the master, therefore I propose master to be the place to add such a  
>> functionality.  
>>  
>> I am not aware with Knox capabilities but Nginx or any other normal  
>> reverse proxy will not be able to this on its own due to dynamic nature  
>> of application drivers and to some extent workers too.  
>>  
>> - Gurvinder  
>>>> On May 22, 2016, at 14:55, Gurvinder Singh <gu...@uninett.no> wrote:  
>>>>  
>>>> On 05/22/2016 08:32 AM, Reynold Xin wrote:  
>>>>> Kubernetes itself already has facilities for http proxy, doesn't it?  
>>>>>  
>>>> Yeah kubernetes has ingress controller which can act the L7 load  
>>>> balancer and router traffic to Spark UI in this case. But I am referring  
>>>> to link present in UI to worker and application UI. Replied in the  
>>>> detail to Sun Rui's mail where I gave example of possible scenario.  
>>>>  
>>>> - Gurvinder  
>>>>>  
>>>>> On Sat, May 21, 2016 at 9:30 AM, Gurvinder Singh  
>>>>> <gurvinder.singh@uninett.no <ma...@uninett.no>> wrote:  
>>>>>  
>>>>> Hi,  
>>>>>  
>>>>> I am currently working on deploying Spark on kuberentes (K8s) and it is  
>>>>> working fine. I am running Spark with standalone mode and checkpointing  
>>>>> the state to shared system. So if master fails K8s starts it and from  
>>>>> checkpoint it recover the earlier state and things just works fine. I  
>>>>> have an issue with the Spark master Web UI to access the worker and  
>>>>> application UI links. In brief, kubernetes service model allows me to  
>>>>> expose the master service to internet, but accessing the  
>>>>> application/workers UI is not possible as then I have to expose them too  
>>>>> individually and given I can have multiple application it becomes hard  
>>>>> to manage.  
>>>>>  
>>>>> One solution can be that the master can act as reverse proxy to access  
>>>>> information/state/logs from application/workers. As it has the  
>>>>> information about their endpoint when application/worker register with  
>>>>> master, so when a user initiate a request to access the information,  
>>>>> master can proxy the request to corresponding endpoint.  
>>>>>  
>>>>> So I am wondering if someone has already done work in this direction  
>>>>> then it would be great to know. If not then would the community will be  
>>>>> interesting in such feature. If yes then how and where I should get  
>>>>> started as it would be helpful for me to have some guidance to start  
>>>>> working on this.  
>>>>>  
>>>>> Kind Regards,  
>>>>> Gurvinder  
>>>>>  
>>>>> ---------------------------------------------------------------------  
>>>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org  
>>>>> <ma...@spark.apache.org>  
>>>>> For additional commands, e-mail: dev-help@spark.apache.org  
>>>>> <ma...@spark.apache.org>  
>>>>>  
>>>>>  
>>>>  
>>>>  
>>>> ---------------------------------------------------------------------  
>>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org  
>>>> For additional commands, e-mail: dev-help@spark.apache.org  
>>>>  
>>>  
>>>  
>>>  
>>> ---------------------------------------------------------------------  
>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org  
>>> For additional commands, e-mail: dev-help@spark.apache.org  
>>>  
>>  
>>  
>> ---------------------------------------------------------------------  
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org  
>> For additional commands, e-mail: dev-help@spark.apache.org  
>>  
>  
>  
> ---------------------------------------------------------------------  
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org  
> For additional commands, e-mail: dev-help@spark.apache.org  
>  

---------------------------------------------------------------------  
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org  
For additional commands, e-mail: dev-help@spark.apache.org  


Re: spark on kubernetes

Posted by Timothy Chen <tn...@gmail.com>.
This will also simplify Mesos users as well, DCOS has to work around
this with our own proxying.

Tim

On Sun, May 22, 2016 at 11:53 PM, Gurvinder Singh
<gu...@uninett.no> wrote:
> Hi Reynold,
>
> So if that's OK with you, can I go ahead and create JIRA for this. As it
> seems this feature is missing currently and can benefit not just for
> kubernetes users but in general Spark standalone mode users too.
>
> - Gurvinder
> On 05/22/2016 12:49 PM, Gurvinder Singh wrote:
>> On 05/22/2016 10:23 AM, Sun Rui wrote:
>>> If it is possible to rewrite URL in outbound responses in Knox or other reverse proxy, would that solve your issue?
>> Any process which can keep track of workers and application drivers IP
>> addresses and route traffic to those will work. Considering Spark Master
>> does exactly this due to all workers and application has to register to
>> the master, therefore I propose master to be the place to add such a
>> functionality.
>>
>> I am not aware with Knox capabilities but Nginx or any other normal
>> reverse proxy will not be able to this on its own due to dynamic nature
>> of application drivers and to some extent workers too.
>>
>> - Gurvinder
>>>> On May 22, 2016, at 14:55, Gurvinder Singh <gu...@uninett.no> wrote:
>>>>
>>>> On 05/22/2016 08:32 AM, Reynold Xin wrote:
>>>>> Kubernetes itself already has facilities for http proxy, doesn't it?
>>>>>
>>>> Yeah kubernetes has ingress controller which can act the L7 load
>>>> balancer and router traffic to Spark UI in this case. But I am referring
>>>> to link present in UI to worker and application UI. Replied in the
>>>> detail to Sun Rui's mail where I gave example of possible scenario.
>>>>
>>>> - Gurvinder
>>>>>
>>>>> On Sat, May 21, 2016 at 9:30 AM, Gurvinder Singh
>>>>> <gurvinder.singh@uninett.no <ma...@uninett.no>> wrote:
>>>>>
>>>>>    Hi,
>>>>>
>>>>>    I am currently working on deploying Spark on kuberentes (K8s) and it is
>>>>>    working fine. I am running Spark with standalone mode and checkpointing
>>>>>    the state to shared system. So if master fails K8s starts it and from
>>>>>    checkpoint it recover the earlier state and things just works fine. I
>>>>>    have an issue with the Spark master Web UI to access the worker and
>>>>>    application UI links. In brief, kubernetes service model allows me to
>>>>>    expose the master service to internet, but accessing the
>>>>>    application/workers UI is not possible as then I have to expose them too
>>>>>    individually and given I can have multiple application it becomes hard
>>>>>    to manage.
>>>>>
>>>>>    One solution can be that the master can act as reverse proxy to access
>>>>>    information/state/logs from application/workers. As it has the
>>>>>    information about their endpoint when application/worker register with
>>>>>    master, so when a user initiate a request to access the information,
>>>>>    master can proxy the request to corresponding endpoint.
>>>>>
>>>>>    So I am wondering if someone has already done work in this direction
>>>>>    then it would be great to know. If not then would the community will be
>>>>>    interesting in such feature. If yes then how and where I should get
>>>>>    started as it would be helpful for me to have some guidance to start
>>>>>    working on this.
>>>>>
>>>>>    Kind Regards,
>>>>>    Gurvinder
>>>>>
>>>>>    ---------------------------------------------------------------------
>>>>>    To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>>>>    <ma...@spark.apache.org>
>>>>>    For additional commands, e-mail: dev-help@spark.apache.org
>>>>>    <ma...@spark.apache.org>
>>>>>
>>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> For additional commands, e-mail: dev-help@spark.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: spark on kubernetes

Posted by Gurvinder Singh <gu...@uninett.no>.
Hi Reynold,

So if that's OK with you, can I go ahead and create JIRA for this. As it
seems this feature is missing currently and can benefit not just for
kubernetes users but in general Spark standalone mode users too.

- Gurvinder
On 05/22/2016 12:49 PM, Gurvinder Singh wrote:
> On 05/22/2016 10:23 AM, Sun Rui wrote:
>> If it is possible to rewrite URL in outbound responses in Knox or other reverse proxy, would that solve your issue?
> Any process which can keep track of workers and application drivers IP
> addresses and route traffic to those will work. Considering Spark Master
> does exactly this due to all workers and application has to register to
> the master, therefore I propose master to be the place to add such a
> functionality.
> 
> I am not aware with Knox capabilities but Nginx or any other normal
> reverse proxy will not be able to this on its own due to dynamic nature
> of application drivers and to some extent workers too.
> 
> - Gurvinder
>>> On May 22, 2016, at 14:55, Gurvinder Singh <gu...@uninett.no> wrote:
>>>
>>> On 05/22/2016 08:32 AM, Reynold Xin wrote:
>>>> Kubernetes itself already has facilities for http proxy, doesn't it?
>>>>
>>> Yeah kubernetes has ingress controller which can act the L7 load
>>> balancer and router traffic to Spark UI in this case. But I am referring
>>> to link present in UI to worker and application UI. Replied in the
>>> detail to Sun Rui's mail where I gave example of possible scenario.
>>>
>>> - Gurvinder
>>>>
>>>> On Sat, May 21, 2016 at 9:30 AM, Gurvinder Singh
>>>> <gurvinder.singh@uninett.no <ma...@uninett.no>> wrote:
>>>>
>>>>    Hi,
>>>>
>>>>    I am currently working on deploying Spark on kuberentes (K8s) and it is
>>>>    working fine. I am running Spark with standalone mode and checkpointing
>>>>    the state to shared system. So if master fails K8s starts it and from
>>>>    checkpoint it recover the earlier state and things just works fine. I
>>>>    have an issue with the Spark master Web UI to access the worker and
>>>>    application UI links. In brief, kubernetes service model allows me to
>>>>    expose the master service to internet, but accessing the
>>>>    application/workers UI is not possible as then I have to expose them too
>>>>    individually and given I can have multiple application it becomes hard
>>>>    to manage.
>>>>
>>>>    One solution can be that the master can act as reverse proxy to access
>>>>    information/state/logs from application/workers. As it has the
>>>>    information about their endpoint when application/worker register with
>>>>    master, so when a user initiate a request to access the information,
>>>>    master can proxy the request to corresponding endpoint.
>>>>
>>>>    So I am wondering if someone has already done work in this direction
>>>>    then it would be great to know. If not then would the community will be
>>>>    interesting in such feature. If yes then how and where I should get
>>>>    started as it would be helpful for me to have some guidance to start
>>>>    working on this.
>>>>
>>>>    Kind Regards,
>>>>    Gurvinder
>>>>
>>>>    ---------------------------------------------------------------------
>>>>    To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>>>    <ma...@spark.apache.org>
>>>>    For additional commands, e-mail: dev-help@spark.apache.org
>>>>    <ma...@spark.apache.org>
>>>>
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> For additional commands, e-mail: dev-help@spark.apache.org
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: spark on kubernetes

Posted by Gurvinder Singh <gu...@uninett.no>.
On 05/22/2016 10:23 AM, Sun Rui wrote:
> If it is possible to rewrite URL in outbound responses in Knox or other reverse proxy, would that solve your issue?
Any process which can keep track of workers and application drivers IP
addresses and route traffic to those will work. Considering Spark Master
does exactly this due to all workers and application has to register to
the master, therefore I propose master to be the place to add such a
functionality.

I am not aware with Knox capabilities but Nginx or any other normal
reverse proxy will not be able to this on its own due to dynamic nature
of application drivers and to some extent workers too.

- Gurvinder
>> On May 22, 2016, at 14:55, Gurvinder Singh <gu...@uninett.no> wrote:
>>
>> On 05/22/2016 08:32 AM, Reynold Xin wrote:
>>> Kubernetes itself already has facilities for http proxy, doesn't it?
>>>
>> Yeah kubernetes has ingress controller which can act the L7 load
>> balancer and router traffic to Spark UI in this case. But I am referring
>> to link present in UI to worker and application UI. Replied in the
>> detail to Sun Rui's mail where I gave example of possible scenario.
>>
>> - Gurvinder
>>>
>>> On Sat, May 21, 2016 at 9:30 AM, Gurvinder Singh
>>> <gurvinder.singh@uninett.no <ma...@uninett.no>> wrote:
>>>
>>>    Hi,
>>>
>>>    I am currently working on deploying Spark on kuberentes (K8s) and it is
>>>    working fine. I am running Spark with standalone mode and checkpointing
>>>    the state to shared system. So if master fails K8s starts it and from
>>>    checkpoint it recover the earlier state and things just works fine. I
>>>    have an issue with the Spark master Web UI to access the worker and
>>>    application UI links. In brief, kubernetes service model allows me to
>>>    expose the master service to internet, but accessing the
>>>    application/workers UI is not possible as then I have to expose them too
>>>    individually and given I can have multiple application it becomes hard
>>>    to manage.
>>>
>>>    One solution can be that the master can act as reverse proxy to access
>>>    information/state/logs from application/workers. As it has the
>>>    information about their endpoint when application/worker register with
>>>    master, so when a user initiate a request to access the information,
>>>    master can proxy the request to corresponding endpoint.
>>>
>>>    So I am wondering if someone has already done work in this direction
>>>    then it would be great to know. If not then would the community will be
>>>    interesting in such feature. If yes then how and where I should get
>>>    started as it would be helpful for me to have some guidance to start
>>>    working on this.
>>>
>>>    Kind Regards,
>>>    Gurvinder
>>>
>>>    ---------------------------------------------------------------------
>>>    To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>>    <ma...@spark.apache.org>
>>>    For additional commands, e-mail: dev-help@spark.apache.org
>>>    <ma...@spark.apache.org>
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> For additional commands, e-mail: dev-help@spark.apache.org
>>
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: spark on kubernetes

Posted by Sun Rui <su...@163.com>.
If it is possible to rewrite URL in outbound responses in Knox or other reverse proxy, would that solve your issue?
> On May 22, 2016, at 14:55, Gurvinder Singh <gu...@uninett.no> wrote:
> 
> On 05/22/2016 08:32 AM, Reynold Xin wrote:
>> Kubernetes itself already has facilities for http proxy, doesn't it?
>> 
> Yeah kubernetes has ingress controller which can act the L7 load
> balancer and router traffic to Spark UI in this case. But I am referring
> to link present in UI to worker and application UI. Replied in the
> detail to Sun Rui's mail where I gave example of possible scenario.
> 
> - Gurvinder
>> 
>> On Sat, May 21, 2016 at 9:30 AM, Gurvinder Singh
>> <gurvinder.singh@uninett.no <ma...@uninett.no>> wrote:
>> 
>>    Hi,
>> 
>>    I am currently working on deploying Spark on kuberentes (K8s) and it is
>>    working fine. I am running Spark with standalone mode and checkpointing
>>    the state to shared system. So if master fails K8s starts it and from
>>    checkpoint it recover the earlier state and things just works fine. I
>>    have an issue with the Spark master Web UI to access the worker and
>>    application UI links. In brief, kubernetes service model allows me to
>>    expose the master service to internet, but accessing the
>>    application/workers UI is not possible as then I have to expose them too
>>    individually and given I can have multiple application it becomes hard
>>    to manage.
>> 
>>    One solution can be that the master can act as reverse proxy to access
>>    information/state/logs from application/workers. As it has the
>>    information about their endpoint when application/worker register with
>>    master, so when a user initiate a request to access the information,
>>    master can proxy the request to corresponding endpoint.
>> 
>>    So I am wondering if someone has already done work in this direction
>>    then it would be great to know. If not then would the community will be
>>    interesting in such feature. If yes then how and where I should get
>>    started as it would be helpful for me to have some guidance to start
>>    working on this.
>> 
>>    Kind Regards,
>>    Gurvinder
>> 
>>    ---------------------------------------------------------------------
>>    To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>    <ma...@spark.apache.org>
>>    For additional commands, e-mail: dev-help@spark.apache.org
>>    <ma...@spark.apache.org>
>> 
>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: spark on kubernetes

Posted by Gurvinder Singh <gu...@uninett.no>.
On 05/22/2016 08:32 AM, Reynold Xin wrote:
> Kubernetes itself already has facilities for http proxy, doesn't it?
> 
Yeah kubernetes has ingress controller which can act the L7 load
balancer and router traffic to Spark UI in this case. But I am referring
to link present in UI to worker and application UI. Replied in the
detail to Sun Rui's mail where I gave example of possible scenario.

- Gurvinder
> 
> On Sat, May 21, 2016 at 9:30 AM, Gurvinder Singh
> <gurvinder.singh@uninett.no <ma...@uninett.no>> wrote:
> 
>     Hi,
> 
>     I am currently working on deploying Spark on kuberentes (K8s) and it is
>     working fine. I am running Spark with standalone mode and checkpointing
>     the state to shared system. So if master fails K8s starts it and from
>     checkpoint it recover the earlier state and things just works fine. I
>     have an issue with the Spark master Web UI to access the worker and
>     application UI links. In brief, kubernetes service model allows me to
>     expose the master service to internet, but accessing the
>     application/workers UI is not possible as then I have to expose them too
>     individually and given I can have multiple application it becomes hard
>     to manage.
> 
>     One solution can be that the master can act as reverse proxy to access
>     information/state/logs from application/workers. As it has the
>     information about their endpoint when application/worker register with
>     master, so when a user initiate a request to access the information,
>     master can proxy the request to corresponding endpoint.
> 
>     So I am wondering if someone has already done work in this direction
>     then it would be great to know. If not then would the community will be
>     interesting in such feature. If yes then how and where I should get
>     started as it would be helpful for me to have some guidance to start
>     working on this.
> 
>     Kind Regards,
>     Gurvinder
> 
>     ---------------------------------------------------------------------
>     To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>     <ma...@spark.apache.org>
>     For additional commands, e-mail: dev-help@spark.apache.org
>     <ma...@spark.apache.org>
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: spark on kubernetes

Posted by Reynold Xin <rx...@databricks.com>.
Kubernetes itself already has facilities for http proxy, doesn't it?


On Sat, May 21, 2016 at 9:30 AM, Gurvinder Singh <gurvinder.singh@uninett.no
> wrote:

> Hi,
>
> I am currently working on deploying Spark on kuberentes (K8s) and it is
> working fine. I am running Spark with standalone mode and checkpointing
> the state to shared system. So if master fails K8s starts it and from
> checkpoint it recover the earlier state and things just works fine. I
> have an issue with the Spark master Web UI to access the worker and
> application UI links. In brief, kubernetes service model allows me to
> expose the master service to internet, but accessing the
> application/workers UI is not possible as then I have to expose them too
> individually and given I can have multiple application it becomes hard
> to manage.
>
> One solution can be that the master can act as reverse proxy to access
> information/state/logs from application/workers. As it has the
> information about their endpoint when application/worker register with
> master, so when a user initiate a request to access the information,
> master can proxy the request to corresponding endpoint.
>
> So I am wondering if someone has already done work in this direction
> then it would be great to know. If not then would the community will be
> interesting in such feature. If yes then how and where I should get
> started as it would be helpful for me to have some guidance to start
> working on this.
>
> Kind Regards,
> Gurvinder
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>