You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@reef.apache.org by Chenxi Zhao <Ch...@microsoft.com.INVALID> on 2018/04/10 22:56:10 UTC

[Open discussion] Refactor REEF Wake Transport Interface to support other networking framework

Hi all,

I am opening this thread to collect people's opinions of refactoring Wake Transport interface in REEF to support other networking framework.

The idea behind this is that we want to address some problems for Azure Batch users:

  1.  Collecting Driver statuses from client.

Azure Batch is sitting behind firewall, which has no direct connection between nodes in cluster to end user's PC.

  1.  Enabling "Inter-node communication" limits maximum size of Azure Batch pool. Exceeding 100 nodes may result in the pool not reaching desired size.

However, "Inter-node communication" is required to run REEF in Azure Batch, for the purpose of communication between driver and evaluators.

To solve these issues for REEF user, I propose we could make use of a proxy server solution to enable the communication between nodes, client and driver. The idea is that instead of having them talk directly, we ask them to exchange message through a 3rd party web service. One of the solution I have been investigating is Azure Relay service (https://docs.microsoft.com/en-us/azure/service-bus-relay/relay-what-is-it), which "facilitates hybrid applications by enabling you to securely expose services that reside within a corporate enterprise network to the public cloud, without having to open a firewall connection, or require intrusive changes to a corporate network infrastructure. Relay supports a variety of different transport protocols and web services standards."

Here uses REEF Java as example: Communication inside REEF is using Netty framework, which implements a transportation layers. It is used for communication between Driver-Client, Driver-Evaluator. The proposing change is to refactor the abstraction of transport Interface in Wake : org.apache.reef.wake.remote.transport.Link, org.apache.reef.wake.remote.transport.Transport to be non "SocketAddress" based. So we could implement other network framework interfaces in addition to Netty. End user will be allowed to configure the network framework they want to use.

REEF Dot Net has a separate Wake package. From what I understand, the design is a bit different with Java's - I didn't find the transport interface yet. But the idea should be similar by refactoring "ITcpClientConnectionFactory" to provide a transport interface.

Any thoughts?

I also find some items may get affected:

  1.  There is a Jira ticket - Improve Communication inside REEF (https://issues.apache.org/jira/browse/REEF-1759)
  2.  A PR (https://github.com/apache/reef/pull/1341) to implement a Wake Transport using HTTP.
  3.  I noticed in Dot Net Implementation, we have an interface "IJobSubmissionResult", which provides a Driver URL to Http Server inside the driver. This bypasses Wake interface, which is also not aligned with Java Client Implementation using RemoteManager and Wake transport. Could someone help clarify it that why we are not using Wake interface?

Appreciate your feedback!

Chenxi

Re: [Open discussion] Refactor REEF Wake Transport Interface to support other networking framework

Posted by Byung-Gon Chun <bg...@gmail.com>.
Thanks for the explanation.
Having a relay service as transport makes a lot of sense.
(Initially, we thought about having various transport mechanisms including
message queues.)

-Gon

On Wed, Apr 11, 2018 at 10:24 AM, Chenxi Zhao <
Chenxi.Zhao@microsoft.com.invalid> wrote:

> Hi Gon,
>
> Unfortunately, I don’t think using HTTP/HTTPS could solve the problem.
> Correct me I misunderstand your point.
>
> The blocker of HTTP/HTTPS is that the connection cannot be established,
> when they are not in a connected network.
>
> In Azure Batch case,
> Client is running in Network A;
> Client submits Job through Azure Batch service to Azure Batch cluster,
> which runs in Network B.
> The problems indicated in previous email is that a. Network A cannot talk
> to Network B; b. Enable the communication within Network B limits Azure
> Batch cluster size.
>
> That's why we want to introduce the capability of using relay service in
> public internet. So both Network A and Network B can talk to each other
> through Azure Relay.
>
> Regards,
> Chenxi
>
> -----Original Message-----
> From: Byung-Gon Chun <bg...@gmail.com>
> Sent: Tuesday, April 10, 2018 4:14 PM
> To: dev@reef.apache.org
> Subject: Re: [Open discussion] Refactor REEF Wake Transport Interface to
> support other networking framework
>
> Hi Chenxi,
>
> Thanks for sharing a detailed proposal!
> The extension sounds interesting and it makes sense.
>
> I'm not familiar with the environment setup.
> There's an effort on supporting HTTP/HTTPS in Wake.
> Does HTTPS solve your problem?
> Even though HTTPS might solve the problem you described, would you prefer
> to use a relay service?
>
> Thanks!
> -Gon
>
>
> On Wed, Apr 11, 2018 at 7:56 AM, Chenxi Zhao < Chenxi.Zhao@microsoft.com.invalid>
> wrote:
>
> > Hi all,
> >
> > I am opening this thread to collect people's opinions of refactoring
> > Wake Transport interface in REEF to support other networking framework.
> >
> > The idea behind this is that we want to address some problems for
> > Azure Batch users:
> >
> >   1.  Collecting Driver statuses from client.
> >
> > Azure Batch is sitting behind firewall, which has no direct connection
> > between nodes in cluster to end user's PC.
> >
> >   1.  Enabling "Inter-node communication" limits maximum size of Azure
> > Batch pool. Exceeding 100 nodes may result in the pool not reaching
> > desired size.
> >
> > However, "Inter-node communication" is required to run REEF in Azure
> > Batch, for the purpose of communication between driver and evaluators.
> >
> > To solve these issues for REEF user, I propose we could make use of a
> > proxy server solution to enable the communication between nodes,
> > client and driver. The idea is that instead of having them talk
> > directly, we ask them to exchange message through a 3rd party web
> > service. One of the solution I have been investigating is Azure Relay
> > service (
> > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.
> > microsoft.com%2Fen-us%2Fazure%2Fservice-bus-relay%2Frelay-what-is-it&d
> > ata=02%7C01%7CChenxi.Zhao%40microsoft.com%7C70abec61fc7a4b8a44ef08d59f
> > 38cb91%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636589988641801317
> > &sdata=v3IEIpAe2MexeCJTqRALi%2BXohDafNywQeN2LzwsJPPY%3D&reserved=0),
> > which "facilitates hybrid applications by enabling you to securely
> > expose services that reside within a corporate enterprise network to
> > the public cloud, without having to open a firewall connection, or
> > require intrusive changes to a corporate network infrastructure. Relay
> > supports a variety of different transport protocols and web services
> standards."
> >
> > Here uses REEF Java as example: Communication inside REEF is using
> > Netty framework, which implements a transportation layers. It is used
> > for communication between Driver-Client, Driver-Evaluator. The
> > proposing change is to refactor the abstraction of transport Interface
> in Wake :
> > org.apache.reef.wake.remote.transport.Link,
> > org.apache.reef.wake.remote.transport.Transport
> > to be non "SocketAddress" based. So we could implement other network
> > framework interfaces in addition to Netty. End user will be allowed to
> > configure the network framework they want to use.
> >
> > REEF Dot Net has a separate Wake package. From what I understand, the
> > design is a bit different with Java's - I didn't find the transport
> > interface yet. But the idea should be similar by refactoring
> > "ITcpClientConnectionFactory" to provide a transport interface.
> >
> > Any thoughts?
> >
> > I also find some items may get affected:
> >
> >   1.  There is a Jira ticket - Improve Communication inside REEF (
> > https://na01.safelinks.protection.outlook.com/?url=
> https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FREEF-
> 1759&data=02%7C01%7CChenxi.Zhao%40microsoft.com%
> 7C70abec61fc7a4b8a44ef08d59f38cb91%7C72f988bf86f141af91ab2d7cd011
> db47%7C1%7C0%7C636589988641801317&sdata=e2au5H1wkf2ES00chQjCRrvRuKFDtF
> 0RV7H7v36fQew%3D&reserved=0)
> >   2.  A PR
> > (https://na01.safelinks.protection.outlook.com/?url=
> https%3A%2F%2Fgithub.com%2Fapache%2Freef%2Fpull%2F1341&
> data=02%7C01%7CChenxi.Zhao%40microsoft.com%7C70abec61fc7a4b8a44ef08d59f38
> cb91%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%
> 7C636589988641801317&sdata=TvsNfyWY8%2FS0yXv66WFOKEfTMibmxzQi5Q1nql
> AMc%2FQ%3D&reserved=0) to implement a Wake Transport using HTTP.
> >   3.  I noticed in Dot Net Implementation, we have an interface
> > "IJobSubmissionResult", which provides a Driver URL to Http Server
> > inside the driver. This bypasses Wake interface, which is also not
> > aligned with Java Client Implementation using RemoteManager and Wake
> > transport. Could someone help clarify it that why we are not using Wake
> interface?
> >
> > Appreciate your feedback!
> >
> > Chenxi
> >
>
>
>
> --
> Byung-Gon Chun
>



-- 
Byung-Gon Chun

RE: [Open discussion] Refactor REEF Wake Transport Interface to support other networking framework

Posted by Chenxi Zhao <Ch...@microsoft.com.INVALID>.
Hi Gon,

Unfortunately, I don’t think using HTTP/HTTPS could solve the problem. Correct me I misunderstand your point.

The blocker of HTTP/HTTPS is that the connection cannot be established, when they are not in a connected network. 

In Azure Batch case, 
Client is running in Network A; 
Client submits Job through Azure Batch service to Azure Batch cluster, which runs in Network B.
The problems indicated in previous email is that a. Network A cannot talk to Network B; b. Enable the communication within Network B limits Azure Batch cluster size. 

That's why we want to introduce the capability of using relay service in public internet. So both Network A and Network B can talk to each other through Azure Relay.

Regards,
Chenxi

-----Original Message-----
From: Byung-Gon Chun <bg...@gmail.com> 
Sent: Tuesday, April 10, 2018 4:14 PM
To: dev@reef.apache.org
Subject: Re: [Open discussion] Refactor REEF Wake Transport Interface to support other networking framework

Hi Chenxi,

Thanks for sharing a detailed proposal!
The extension sounds interesting and it makes sense.

I'm not familiar with the environment setup.
There's an effort on supporting HTTP/HTTPS in Wake.
Does HTTPS solve your problem?
Even though HTTPS might solve the problem you described, would you prefer to use a relay service?

Thanks!
-Gon


On Wed, Apr 11, 2018 at 7:56 AM, Chenxi Zhao < Chenxi.Zhao@microsoft.com.invalid> wrote:

> Hi all,
>
> I am opening this thread to collect people's opinions of refactoring 
> Wake Transport interface in REEF to support other networking framework.
>
> The idea behind this is that we want to address some problems for 
> Azure Batch users:
>
>   1.  Collecting Driver statuses from client.
>
> Azure Batch is sitting behind firewall, which has no direct connection 
> between nodes in cluster to end user's PC.
>
>   1.  Enabling "Inter-node communication" limits maximum size of Azure 
> Batch pool. Exceeding 100 nodes may result in the pool not reaching 
> desired size.
>
> However, "Inter-node communication" is required to run REEF in Azure 
> Batch, for the purpose of communication between driver and evaluators.
>
> To solve these issues for REEF user, I propose we could make use of a 
> proxy server solution to enable the communication between nodes, 
> client and driver. The idea is that instead of having them talk 
> directly, we ask them to exchange message through a 3rd party web 
> service. One of the solution I have been investigating is Azure Relay 
> service ( 
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.
> microsoft.com%2Fen-us%2Fazure%2Fservice-bus-relay%2Frelay-what-is-it&d
> ata=02%7C01%7CChenxi.Zhao%40microsoft.com%7C70abec61fc7a4b8a44ef08d59f
> 38cb91%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636589988641801317
> &sdata=v3IEIpAe2MexeCJTqRALi%2BXohDafNywQeN2LzwsJPPY%3D&reserved=0),
> which "facilitates hybrid applications by enabling you to securely 
> expose services that reside within a corporate enterprise network to 
> the public cloud, without having to open a firewall connection, or 
> require intrusive changes to a corporate network infrastructure. Relay 
> supports a variety of different transport protocols and web services standards."
>
> Here uses REEF Java as example: Communication inside REEF is using 
> Netty framework, which implements a transportation layers. It is used 
> for communication between Driver-Client, Driver-Evaluator. The 
> proposing change is to refactor the abstraction of transport Interface in Wake :
> org.apache.reef.wake.remote.transport.Link, 
> org.apache.reef.wake.remote.transport.Transport
> to be non "SocketAddress" based. So we could implement other network 
> framework interfaces in addition to Netty. End user will be allowed to 
> configure the network framework they want to use.
>
> REEF Dot Net has a separate Wake package. From what I understand, the 
> design is a bit different with Java's - I didn't find the transport 
> interface yet. But the idea should be similar by refactoring 
> "ITcpClientConnectionFactory" to provide a transport interface.
>
> Any thoughts?
>
> I also find some items may get affected:
>
>   1.  There is a Jira ticket - Improve Communication inside REEF (
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FREEF-1759&data=02%7C01%7CChenxi.Zhao%40microsoft.com%7C70abec61fc7a4b8a44ef08d59f38cb91%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636589988641801317&sdata=e2au5H1wkf2ES00chQjCRrvRuKFDtF0RV7H7v36fQew%3D&reserved=0)
>   2.  A PR 
> (https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Freef%2Fpull%2F1341&data=02%7C01%7CChenxi.Zhao%40microsoft.com%7C70abec61fc7a4b8a44ef08d59f38cb91%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636589988641801317&sdata=TvsNfyWY8%2FS0yXv66WFOKEfTMibmxzQi5Q1nqlAMc%2FQ%3D&reserved=0) to implement a Wake Transport using HTTP.
>   3.  I noticed in Dot Net Implementation, we have an interface 
> "IJobSubmissionResult", which provides a Driver URL to Http Server 
> inside the driver. This bypasses Wake interface, which is also not 
> aligned with Java Client Implementation using RemoteManager and Wake 
> transport. Could someone help clarify it that why we are not using Wake interface?
>
> Appreciate your feedback!
>
> Chenxi
>



--
Byung-Gon Chun

Re: [Open discussion] Refactor REEF Wake Transport Interface to support other networking framework

Posted by Byung-Gon Chun <bg...@gmail.com>.
Hi Chenxi,

Thanks for sharing a detailed proposal!
The extension sounds interesting and it makes sense.

I'm not familiar with the environment setup.
There's an effort on supporting HTTP/HTTPS in Wake.
Does HTTPS solve your problem?
Even though HTTPS might solve the problem you described, would you prefer
to use a relay service?

Thanks!
-Gon


On Wed, Apr 11, 2018 at 7:56 AM, Chenxi Zhao <
Chenxi.Zhao@microsoft.com.invalid> wrote:

> Hi all,
>
> I am opening this thread to collect people's opinions of refactoring Wake
> Transport interface in REEF to support other networking framework.
>
> The idea behind this is that we want to address some problems for Azure
> Batch users:
>
>   1.  Collecting Driver statuses from client.
>
> Azure Batch is sitting behind firewall, which has no direct connection
> between nodes in cluster to end user's PC.
>
>   1.  Enabling "Inter-node communication" limits maximum size of Azure
> Batch pool. Exceeding 100 nodes may result in the pool not reaching desired
> size.
>
> However, "Inter-node communication" is required to run REEF in Azure
> Batch, for the purpose of communication between driver and evaluators.
>
> To solve these issues for REEF user, I propose we could make use of a
> proxy server solution to enable the communication between nodes, client and
> driver. The idea is that instead of having them talk directly, we ask them
> to exchange message through a 3rd party web service. One of the solution I
> have been investigating is Azure Relay service (
> https://docs.microsoft.com/en-us/azure/service-bus-relay/relay-what-is-it),
> which "facilitates hybrid applications by enabling you to securely expose
> services that reside within a corporate enterprise network to the public
> cloud, without having to open a firewall connection, or require intrusive
> changes to a corporate network infrastructure. Relay supports a variety of
> different transport protocols and web services standards."
>
> Here uses REEF Java as example: Communication inside REEF is using Netty
> framework, which implements a transportation layers. It is used for
> communication between Driver-Client, Driver-Evaluator. The proposing change
> is to refactor the abstraction of transport Interface in Wake :
> org.apache.reef.wake.remote.transport.Link, org.apache.reef.wake.remote.transport.Transport
> to be non "SocketAddress" based. So we could implement other network
> framework interfaces in addition to Netty. End user will be allowed to
> configure the network framework they want to use.
>
> REEF Dot Net has a separate Wake package. From what I understand, the
> design is a bit different with Java's - I didn't find the transport
> interface yet. But the idea should be similar by refactoring
> "ITcpClientConnectionFactory" to provide a transport interface.
>
> Any thoughts?
>
> I also find some items may get affected:
>
>   1.  There is a Jira ticket - Improve Communication inside REEF (
> https://issues.apache.org/jira/browse/REEF-1759)
>   2.  A PR (https://github.com/apache/reef/pull/1341) to implement a Wake
> Transport using HTTP.
>   3.  I noticed in Dot Net Implementation, we have an interface
> "IJobSubmissionResult", which provides a Driver URL to Http Server inside
> the driver. This bypasses Wake interface, which is also not aligned with
> Java Client Implementation using RemoteManager and Wake transport. Could
> someone help clarify it that why we are not using Wake interface?
>
> Appreciate your feedback!
>
> Chenxi
>



-- 
Byung-Gon Chun