You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@reef.apache.org by Tyson Condie <tc...@gmail.com> on 2018/04/10 16:59:11 UTC

gRPC based Java Bridge

Hello,

We (myself, Doug Service and Scott Inglis) are in the process of developing
a new REEF Java Bridge that will use a two process solution to communicate
between the core Java Driver and an application Driver, which could be
implemented in an alternative language e.g., C#.

Communication between these two worlds will occur over gRPC using protocol
buffers 3.5 as the data format. However, the code will be structured in a
way that minimizes such dependencies in the case that an alternative/better
communication medium presents itself.

Current status: the core Java Driver is nearly code complete and I am
working on an application (client) Driver in Java (as well) that can be
used as a template for the C# application Driver, and for authoring unit
tests. Please expect a pull request with these changes in the coming days.
Concurrently, Doug Service and Scott Inglis will be developing a C# based
application Driver.

The core Java Driver and Java client/application Driver changes can be
tracked via Jira 2002 (https://issues.apache.org/jira/browse/REEF-2002),
which is a subtask to Jira 335 for removing the managed C++ Java bridge.

Your feedback would be most welcome!

Thanks
Tyson

Re: gRPC based Java Bridge

Posted by Byung-Gon Chun <bg...@gmail.com>.
Good to adopt protobuf, then. :)

On Tue, Apr 17, 2018 at 8:23 AM, Markus Weimer <ma...@weimo.de> wrote:

> On Mon, Apr 16, 2018 at 4:10 PM, Byung-Gon Chun <bg...@gmail.com> wrote:
> > We initially used protobuf but switched to avro because of its better
> .Net support.
> > Thus, we have both protobuf and avro.
>
> Yes, and the .NET support in protobuf got *significantly* better
> since. Version 3 and newer of protobuf, as well as current gRPC
> versions have 1st class support for .NET, while Avro really is still
> kinda bad.
>
> Markus
>



-- 
Byung-Gon Chun

Re: gRPC based Java Bridge

Posted by Markus Weimer <ma...@weimo.de>.
On Mon, Apr 16, 2018 at 4:37 PM, Sergiy Matusevych
<se...@gmail.com> wrote:
> By the way, do we know if YARN is planning to migrate to a newer version of
> protobuf? As far as I know, Hadoop 3.1 still uses protobuf 2.5 :(

My hunch is that that will never happen :)

> Anyway, we can probably just shadow protobuf2.5 together with hadoop jars
> and use newer protobuf for both gRPC and config serialization.

Yes, we probably have to do that. Or rather, have YARN shade their version.

Markus

Re: gRPC based Java Bridge

Posted by Sergiy Matusevych <se...@gmail.com>.
By the way, do we know if YARN is planning to migrate to a newer version of
protobuf? As far as I know, Hadoop 3.1 still uses protobuf 2.5 :(

Anyway, we can probably just shadow protobuf2.5 together with hadoop jars
and use newer protobuf for both gRPC and config serialization.

-- Sergiy.


On Mon, Apr 16, 2018 at 4:23 PM, Markus Weimer <ma...@weimo.de> wrote:

> On Mon, Apr 16, 2018 at 4:10 PM, Byung-Gon Chun <bg...@gmail.com> wrote:
> > We initially used protobuf but switched to avro because of its better
> .Net support.
> > Thus, we have both protobuf and avro.
>
> Yes, and the .NET support in protobuf got *significantly* better
> since. Version 3 and newer of protobuf, as well as current gRPC
> versions have 1st class support for .NET, while Avro really is still
> kinda bad.
>
> Markus
>

Re: gRPC based Java Bridge

Posted by Markus Weimer <ma...@weimo.de>.
On Mon, Apr 16, 2018 at 4:10 PM, Byung-Gon Chun <bg...@gmail.com> wrote:
> We initially used protobuf but switched to avro because of its better .Net support.
> Thus, we have both protobuf and avro.

Yes, and the .NET support in protobuf got *significantly* better
since. Version 3 and newer of protobuf, as well as current gRPC
versions have 1st class support for .NET, while Avro really is still
kinda bad.

Markus

Re: gRPC based Java Bridge

Posted by Byung-Gon Chun <bg...@gmail.com>.
Thanks for the explanation, Tyson.
Yes. It makes sense.

Serigy, I agree with you.

We initially used protobuf but switched to avro because of its better .Net
support.
Thus, we have both protobuf and avro.

-Gon



On Tue, Apr 17, 2018 at 8:05 AM, Sergiy Matusevych <
sergiy.matusevych@gmail.com> wrote:

> Hi guys,
>
> gRPC sounds great, and the more REEF code we can delete in the process,
> the better! For me, however, the most important aspect of this effort is to
> create a unified protocol definition for all REEF components. I would love
> to be able to go to the REEF code, open a few *.proto files and from that
> figure out how the entire system works on the high level. That is, our
> .proto files would serve as an executable specification for REEF components.
>
> Cheers,
> Sergiy.
>
>
>
>
> On Mon, Apr 16, 2018 at 2:01 PM, Tyson Condie <tc...@gmail.com>
> wrote:
>
>> The key solution that the new bridge will take on involves a dual process
>> design, where the core java driver is in one process, and the application
>> driver is in the other process. From there, we need to communicate
>> information between these two worlds. gRPC+protocol buffers is one way of
>> doing that communication. Since protocol buffers are already used between
>> driver and evaluator, I felt the same would be best between
>> driver-to-driver, and gRPC seems to be a well tested/supported RPC layer
>> for protocol buffers.
>>
>> -Tyson
>>
>> On Tue, Apr 10, 2018 at 4:08 PM, Byung-Gon Chun <bg...@gmail.com> wrote:
>>
>> > Tyson, thanks for the update!
>> > Could you give us a background on why a gRPC-based solution's
>> introduced?
>> >
>> > Thanks!
>> > -Gon
>> >
>> > On Wed, Apr 11, 2018 at 1:59 AM, Tyson Condie <tcondie.apache@gmail.com
>> >
>> > wrote:
>> >
>> >> Hello,
>> >>
>> >> We (myself, Doug Service and Scott Inglis) are in the process of
>> >> developing
>> >> a new REEF Java Bridge that will use a two process solution to
>> communicate
>> >> between the core Java Driver and an application Driver, which could be
>> >> implemented in an alternative language e.g., C#.
>> >>
>> >> Communication between these two worlds will occur over gRPC using
>> protocol
>> >> buffers 3.5 as the data format. However, the code will be structured
>> in a
>> >> way that minimizes such dependencies in the case that an
>> >> alternative/better
>> >> communication medium presents itself.
>> >>
>> >> Current status: the core Java Driver is nearly code complete and I am
>> >> working on an application (client) Driver in Java (as well) that can be
>> >> used as a template for the C# application Driver, and for authoring
>> unit
>> >> tests. Please expect a pull request with these changes in the coming
>> days.
>> >> Concurrently, Doug Service and Scott Inglis will be developing a C#
>> based
>> >> application Driver.
>> >>
>> >> The core Java Driver and Java client/application Driver changes can be
>> >> tracked via Jira 2002 (https://issues.apache.org/jira/browse/REEF-2002
>> ),
>> >> which is a subtask to Jira 335 for removing the managed C++ Java
>> bridge.
>> >>
>> >> Your feedback would be most welcome!
>> >>
>> >> Thanks
>> >> Tyson
>> >>
>> >
>> >
>> >
>> > --
>> > Byung-Gon Chun
>> >
>>
>
>


-- 
Byung-Gon Chun

Re: gRPC based Java Bridge

Posted by Sergiy Matusevych <se...@gmail.com>.
Hi guys,

gRPC sounds great, and the more REEF code we can delete in the process, the
better! For me, however, the most important aspect of this effort is to
create a unified protocol definition for all REEF components. I would love
to be able to go to the REEF code, open a few *.proto files and from that
figure out how the entire system works on the high level. That is, our
.proto files would serve as an executable specification for REEF components.

Cheers,
Sergiy.




On Mon, Apr 16, 2018 at 2:01 PM, Tyson Condie <tc...@gmail.com>
wrote:

> The key solution that the new bridge will take on involves a dual process
> design, where the core java driver is in one process, and the application
> driver is in the other process. From there, we need to communicate
> information between these two worlds. gRPC+protocol buffers is one way of
> doing that communication. Since protocol buffers are already used between
> driver and evaluator, I felt the same would be best between
> driver-to-driver, and gRPC seems to be a well tested/supported RPC layer
> for protocol buffers.
>
> -Tyson
>
> On Tue, Apr 10, 2018 at 4:08 PM, Byung-Gon Chun <bg...@gmail.com> wrote:
>
> > Tyson, thanks for the update!
> > Could you give us a background on why a gRPC-based solution's introduced?
> >
> > Thanks!
> > -Gon
> >
> > On Wed, Apr 11, 2018 at 1:59 AM, Tyson Condie <tc...@gmail.com>
> > wrote:
> >
> >> Hello,
> >>
> >> We (myself, Doug Service and Scott Inglis) are in the process of
> >> developing
> >> a new REEF Java Bridge that will use a two process solution to
> communicate
> >> between the core Java Driver and an application Driver, which could be
> >> implemented in an alternative language e.g., C#.
> >>
> >> Communication between these two worlds will occur over gRPC using
> protocol
> >> buffers 3.5 as the data format. However, the code will be structured in
> a
> >> way that minimizes such dependencies in the case that an
> >> alternative/better
> >> communication medium presents itself.
> >>
> >> Current status: the core Java Driver is nearly code complete and I am
> >> working on an application (client) Driver in Java (as well) that can be
> >> used as a template for the C# application Driver, and for authoring unit
> >> tests. Please expect a pull request with these changes in the coming
> days.
> >> Concurrently, Doug Service and Scott Inglis will be developing a C#
> based
> >> application Driver.
> >>
> >> The core Java Driver and Java client/application Driver changes can be
> >> tracked via Jira 2002 (https://issues.apache.org/jira/browse/REEF-2002
> ),
> >> which is a subtask to Jira 335 for removing the managed C++ Java bridge.
> >>
> >> Your feedback would be most welcome!
> >>
> >> Thanks
> >> Tyson
> >>
> >
> >
> >
> > --
> > Byung-Gon Chun
> >
>

Re: gRPC based Java Bridge

Posted by Tyson Condie <tc...@gmail.com>.
The key solution that the new bridge will take on involves a dual process
design, where the core java driver is in one process, and the application
driver is in the other process. From there, we need to communicate
information between these two worlds. gRPC+protocol buffers is one way of
doing that communication. Since protocol buffers are already used between
driver and evaluator, I felt the same would be best between
driver-to-driver, and gRPC seems to be a well tested/supported RPC layer
for protocol buffers.

-Tyson

On Tue, Apr 10, 2018 at 4:08 PM, Byung-Gon Chun <bg...@gmail.com> wrote:

> Tyson, thanks for the update!
> Could you give us a background on why a gRPC-based solution's introduced?
>
> Thanks!
> -Gon
>
> On Wed, Apr 11, 2018 at 1:59 AM, Tyson Condie <tc...@gmail.com>
> wrote:
>
>> Hello,
>>
>> We (myself, Doug Service and Scott Inglis) are in the process of
>> developing
>> a new REEF Java Bridge that will use a two process solution to communicate
>> between the core Java Driver and an application Driver, which could be
>> implemented in an alternative language e.g., C#.
>>
>> Communication between these two worlds will occur over gRPC using protocol
>> buffers 3.5 as the data format. However, the code will be structured in a
>> way that minimizes such dependencies in the case that an
>> alternative/better
>> communication medium presents itself.
>>
>> Current status: the core Java Driver is nearly code complete and I am
>> working on an application (client) Driver in Java (as well) that can be
>> used as a template for the C# application Driver, and for authoring unit
>> tests. Please expect a pull request with these changes in the coming days.
>> Concurrently, Doug Service and Scott Inglis will be developing a C# based
>> application Driver.
>>
>> The core Java Driver and Java client/application Driver changes can be
>> tracked via Jira 2002 (https://issues.apache.org/jira/browse/REEF-2002),
>> which is a subtask to Jira 335 for removing the managed C++ Java bridge.
>>
>> Your feedback would be most welcome!
>>
>> Thanks
>> Tyson
>>
>
>
>
> --
> Byung-Gon Chun
>

Re: gRPC based Java Bridge

Posted by Byung-Gon Chun <bg...@gmail.com>.
Thanks for resending the email, Doug.
I am sorry that I didn't apply earlier.

Somehow I just thought that you went ahead with Thrift.

-Gon



On Wed, Apr 11, 2018 at 9:37 AM, Douglas Service <ds...@gmail.com> wrote:

> I had proposed that we use Thrift instead a while back on the dev list (see
> email below) but received no comments. It is worth a read as it discusses
> many of the issues. As an Apache project it seems we should be using Apache
> Thrift  which has all of the same functionality as gRPC that we would use.
> Using gRPC creates a dependency on a Google copyright which should be
> carefully considered especially when there is an Apache owned alternative.
>
> Doug
>
> ------------------------------------------------------------
> -------------------------------
>
> The Apache Thrift team has implemented full .NET Core 2.0 support in the
> latest release. Thrift is similar to Avro and Protobuf, but supports many
> more languages than Avro (20+), and has RPC protocol support across all
> languages which Avro does not. Thrift also provides a robust
> cross-language test
> suite that literally runs every test across every permutation of languages
> one configures in the Thrift build environment. There are many online
> comparisons of Thrift and Protobuf and they are similar in features and
> performance. Apache Thrift is used in production systems most notably by
> Cloudera, Evernote, Facebook, Mendeley, and Uber, and is used or supported
> by a number of Apache projects such as Hadoop, Aurora, HBase, Parquet, and
> Storm. Most importantly for REEF Thrift is controlled by Apache.
>
> There have been some suggestions to change the approach to modifying the
> bridge to run on Linux. The possible approaches are:
>
> 1) Continue with Avro.
> 2) Switch to Thrift using a similar approach to the Avro approach.
> 3) Switch to keeping the C++, converting it to native, and using PInvoke to
> call from C# to C++, C# delegates to call from C++ to C#, and continue
> using the current JNI code to interface between Java and C#.
>
> The advantage of using Avro is that all of the data types for any supported
> language are autogenerated with the necessary marshaling/unmarshalling
> code. If you look at the current bridge, you will see that most of the code
> is handwritten data types duplicated across languages and associated
> cross-language conversion code. The disadvantage of Avro is that it does
> not support RPC protocol definitions between Java and C#, it has not
> transport support; thus we have to build the protocols transport by hand.
> In addition, we are using a combination of Microsoft/Apache Avro which
> means there is more work to do in the future on the Avro side.
>
> Thrift has all of the advantages of Avro, and in addition, it supports full
> RPC protocol definition and generates code for transports such as TCP and
> pipes, and wire formats such as binary and jason. Using thrift would
> eliminate all of the custom hand-coding and marshaling of types in the
> bridge as Avro does, and also eliminate the need to write the protocol code
> and transport code.
>
> The advantage of delegates and PInvoke is that we would keep a lot of the
> existing bridge code and possibly get done faster. In this case all of the
> code used for interop would need to move out of the current bridge
> executable into a dll on Windows and library, possibly sharded, on Linux,
> all of the managed types in C++ would have to become unmanaged, and then
> conversion code would have to be written to convert from managed types to
> unmanaged types and from unmanaged types to JNI types and back. The
> disadvantage to delegates and PInvoke is that it most likely all throw
> away.
>
> Going forward in the future there seems to be strong agreement that we
> should only have a single primary language, such as C# or Java in which
> most of the core functionality is written with an asynchronous
> cross-language messaging API. (Taking to spark would still require java)
> This would allow us to stop implementing core functionality in both C# and
> Java, and we would be able to support applications written in any language
> supported by Thrift or a language that supports calling into C++ such as R
> or Julia. Thrift is an excellent choice for this asynchronous messaging API
> and adopting it now would put us on the road to this future architecture.
> Currently it would probably have to be synchronous due to the current
> bridge design, but could then be asynchronous in the future, There is some
> concern that a messaging API will be slower than interop, but as Markus
> points out the limiting factor will be the time it takes to get messages
> between the driver and an evaluator on different nodes in the cluster and
> not the time it takes to get a message between the two processes in driver
> running on the same node. I would also encourage that we keep REEF code in
> the non-core languages as thin as possible.
>
> Comments?
>
> Doug
>
>
>
> On Tue, Apr 10, 2018 at 11:08 PM, Byung-Gon Chun <bg...@gmail.com> wrote:
>
> > Tyson, thanks for the update!
> > Could you give us a background on why a gRPC-based solution's introduced?
> >
> > Thanks!
> > -Gon
> >
> > On Wed, Apr 11, 2018 at 1:59 AM, Tyson Condie <tc...@gmail.com>
> > wrote:
> >
> > > Hello,
> > >
> > > We (myself, Doug Service and Scott Inglis) are in the process of
> > developing
> > > a new REEF Java Bridge that will use a two process solution to
> > communicate
> > > between the core Java Driver and an application Driver, which could be
> > > implemented in an alternative language e.g., C#.
> > >
> > > Communication between these two worlds will occur over gRPC using
> > protocol
> > > buffers 3.5 as the data format. However, the code will be structured
> in a
> > > way that minimizes such dependencies in the case that an
> > alternative/better
> > > communication medium presents itself.
> > >
> > > Current status: the core Java Driver is nearly code complete and I am
> > > working on an application (client) Driver in Java (as well) that can be
> > > used as a template for the C# application Driver, and for authoring
> unit
> > > tests. Please expect a pull request with these changes in the coming
> > days.
> > > Concurrently, Doug Service and Scott Inglis will be developing a C#
> based
> > > application Driver.
> > >
> > > The core Java Driver and Java client/application Driver changes can be
> > > tracked via Jira 2002 (https://issues.apache.org/jira/browse/REEF-2002
> ),
> > > which is a subtask to Jira 335 for removing the managed C++ Java
> bridge.
> > >
> > > Your feedback would be most welcome!
> > >
> > > Thanks
> > > Tyson
> > >
> >
> >
> >
> > --
> > Byung-Gon Chun
> >
>



-- 
Byung-Gon Chun

Re: gRPC based Java Bridge

Posted by John Yang <jo...@gmail.com>.
Hi Markus,


In Nemo, we use a wrapper RPC interface on top of 2 implementation options:
NCS(NetworkConnectionService), and gRPC.
The default implementation that we use is NCS, but switching to gRPC is
also possible.

In the below link, you can see
'@DefaultImplementation(NcsMessageEnvironment.class)' right above 'public
interface MessageEnvironment'.
https://github.com/apache/incubator-nemo/blob/master/runtime/common/src/main/java/edu/snu/nemo/runtime/common/message/MessageEnvironment.java

The Nemo team is very happy with REEF by the way. :)


Thanks,
John


On Mon, Apr 16, 2018 at 2:59 PM, Markus Weimer <ma...@weimo.de> wrote:

> On Sun, Apr 15, 2018 at 4:24 PM, Byung-Gon Chun <bg...@gmail.com> wrote:
>
> > The Nemo team created an abstraction for the channel and implemented it
> > using two different libraries. They are interchangeable.
> >
>
> Can you share more about this? Does this mean that there is a Wake backend
> across gRPC?
>
> Markus
>

Re: gRPC based Java Bridge

Posted by Markus Weimer <ma...@weimo.de>.
On Sun, Apr 15, 2018 at 4:24 PM, Byung-Gon Chun <bg...@gmail.com> wrote:

> The Nemo team created an abstraction for the channel and implemented it
> using two different libraries. They are interchangeable.
>

Can you share more about this? Does this mean that there is a Wake backend
across gRPC?

Markus

Re: gRPC based Java Bridge

Posted by Byung-Gon Chun <bg...@gmail.com>.
Using gRPC makes sense. We have some experience on using gRPC in Nemo.
Nemo has a channel between Driver and Worker, which is based on Wake/Netty
and gRPC.
The Nemo team created an abstraction for the channel and implemented it
using two different libraries. They are interchangeable.

Cheers,
Gon


On Mon, Apr 16, 2018 at 7:21 AM, Rogan Carr <ro...@gmail.com> wrote:

> Hi Markus,
>
> In general, I like having fewer dependencies and smaller codebases, so if
> we were to adopt gRPC for the bridge, I'd be in favor of using it in Wake
> as well.
>
> Best,
> Rogan
>
> On Sun, Apr 15, 2018 at 9:40 AM, Markus Weimer <ma...@weimo.de> wrote:
>
> > On Thu, Apr 12, 2018 at 12:53 AM, Byung-Gon Chun <bg...@gmail.com>
> wrote:
> > > Both Thrift and gRPC sound reasonable. Is there any reason to choose
> gRPC
> > > over Thrift?
> >
> > There seems to be a lot of momentum towards gRPC right now. It has
> > solid support for .NET, and uses HTTP2 as its transport layer. The
> > former is very interesting to us, the latter makes things much, much
> > easier on clusters: People like to use proxies and redirects and other
> > features of HTTP in their cluster operations.
> >
> > But this raises a broader question: As we take a dependency on gRPC,
> > we could eliminate a bunch of code in REEF, as it replaces both the
> > Avro and the Netty layers of Wake. WDYT about that?
> >
> > Thanks,
> >
> > Markus
> >
>



-- 
Byung-Gon Chun

Re: gRPC based Java Bridge

Posted by Rogan Carr <ro...@gmail.com>.
Hi Markus,

In general, I like having fewer dependencies and smaller codebases, so if
we were to adopt gRPC for the bridge, I'd be in favor of using it in Wake
as well.

Best,
Rogan

On Sun, Apr 15, 2018 at 9:40 AM, Markus Weimer <ma...@weimo.de> wrote:

> On Thu, Apr 12, 2018 at 12:53 AM, Byung-Gon Chun <bg...@gmail.com> wrote:
> > Both Thrift and gRPC sound reasonable. Is there any reason to choose gRPC
> > over Thrift?
>
> There seems to be a lot of momentum towards gRPC right now. It has
> solid support for .NET, and uses HTTP2 as its transport layer. The
> former is very interesting to us, the latter makes things much, much
> easier on clusters: People like to use proxies and redirects and other
> features of HTTP in their cluster operations.
>
> But this raises a broader question: As we take a dependency on gRPC,
> we could eliminate a bunch of code in REEF, as it replaces both the
> Avro and the Netty layers of Wake. WDYT about that?
>
> Thanks,
>
> Markus
>

Re: gRPC based Java Bridge

Posted by Markus Weimer <ma...@weimo.de>.
On Thu, Apr 12, 2018 at 12:53 AM, Byung-Gon Chun <bg...@gmail.com> wrote:
> Both Thrift and gRPC sound reasonable. Is there any reason to choose gRPC
> over Thrift?

There seems to be a lot of momentum towards gRPC right now. It has
solid support for .NET, and uses HTTP2 as its transport layer. The
former is very interesting to us, the latter makes things much, much
easier on clusters: People like to use proxies and redirects and other
features of HTTP in their cluster operations.

But this raises a broader question: As we take a dependency on gRPC,
we could eliminate a bunch of code in REEF, as it replaces both the
Avro and the Netty layers of Wake. WDYT about that?

Thanks,

Markus

Re: gRPC based Java Bridge

Posted by Byung-Gon Chun <bg...@gmail.com>.
Doug and Tyson,

Both Thrift and gRPC sound reasonable. Is there any reason to choose gRPC
over Thrift?

Thanks.
-Gon


On Thu, Apr 12, 2018 at 5:01 AM, Markus Weimer <ma...@weimo.de> wrote:

> On Tue, Apr 10, 2018 at 5:37 PM, Douglas Service <ds...@gmail.com>
> wrote:
>
> > Using gRPC creates a dependency on a Google copyright which should be
> > carefully considered especially when there is an Apache owned
> alternative.
> >
>
> One correction: gRPC isn't owned by Google, but by the Cloud Native
> Foundation.
>
> Markus
>



-- 
Byung-Gon Chun

Re: gRPC based Java Bridge

Posted by Markus Weimer <ma...@weimo.de>.
On Tue, Apr 10, 2018 at 5:37 PM, Douglas Service <ds...@gmail.com> wrote:

> Using gRPC creates a dependency on a Google copyright which should be
> carefully considered especially when there is an Apache owned alternative.
>

One correction: gRPC isn't owned by Google, but by the Cloud Native
Foundation.

Markus

Re: gRPC based Java Bridge

Posted by Douglas Service <ds...@gmail.com>.
I had proposed that we use Thrift instead a while back on the dev list (see
email below) but received no comments. It is worth a read as it discusses
many of the issues. As an Apache project it seems we should be using Apache
Thrift  which has all of the same functionality as gRPC that we would use.
Using gRPC creates a dependency on a Google copyright which should be
carefully considered especially when there is an Apache owned alternative.

Doug

-------------------------------------------------------------------------------------------

The Apache Thrift team has implemented full .NET Core 2.0 support in the
latest release. Thrift is similar to Avro and Protobuf, but supports many
more languages than Avro (20+), and has RPC protocol support across all
languages which Avro does not. Thrift also provides a robust
cross-language test
suite that literally runs every test across every permutation of languages
one configures in the Thrift build environment. There are many online
comparisons of Thrift and Protobuf and they are similar in features and
performance. Apache Thrift is used in production systems most notably by
Cloudera, Evernote, Facebook, Mendeley, and Uber, and is used or supported
by a number of Apache projects such as Hadoop, Aurora, HBase, Parquet, and
Storm. Most importantly for REEF Thrift is controlled by Apache.

There have been some suggestions to change the approach to modifying the
bridge to run on Linux. The possible approaches are:

1) Continue with Avro.
2) Switch to Thrift using a similar approach to the Avro approach.
3) Switch to keeping the C++, converting it to native, and using PInvoke to
call from C# to C++, C# delegates to call from C++ to C#, and continue
using the current JNI code to interface between Java and C#.

The advantage of using Avro is that all of the data types for any supported
language are autogenerated with the necessary marshaling/unmarshalling
code. If you look at the current bridge, you will see that most of the code
is handwritten data types duplicated across languages and associated
cross-language conversion code. The disadvantage of Avro is that it does
not support RPC protocol definitions between Java and C#, it has not
transport support; thus we have to build the protocols transport by hand.
In addition, we are using a combination of Microsoft/Apache Avro which
means there is more work to do in the future on the Avro side.

Thrift has all of the advantages of Avro, and in addition, it supports full
RPC protocol definition and generates code for transports such as TCP and
pipes, and wire formats such as binary and jason. Using thrift would
eliminate all of the custom hand-coding and marshaling of types in the
bridge as Avro does, and also eliminate the need to write the protocol code
and transport code.

The advantage of delegates and PInvoke is that we would keep a lot of the
existing bridge code and possibly get done faster. In this case all of the
code used for interop would need to move out of the current bridge
executable into a dll on Windows and library, possibly sharded, on Linux,
all of the managed types in C++ would have to become unmanaged, and then
conversion code would have to be written to convert from managed types to
unmanaged types and from unmanaged types to JNI types and back. The
disadvantage to delegates and PInvoke is that it most likely all throw away.

Going forward in the future there seems to be strong agreement that we
should only have a single primary language, such as C# or Java in which
most of the core functionality is written with an asynchronous
cross-language messaging API. (Taking to spark would still require java)
This would allow us to stop implementing core functionality in both C# and
Java, and we would be able to support applications written in any language
supported by Thrift or a language that supports calling into C++ such as R
or Julia. Thrift is an excellent choice for this asynchronous messaging API
and adopting it now would put us on the road to this future architecture.
Currently it would probably have to be synchronous due to the current
bridge design, but could then be asynchronous in the future, There is some
concern that a messaging API will be slower than interop, but as Markus
points out the limiting factor will be the time it takes to get messages
between the driver and an evaluator on different nodes in the cluster and
not the time it takes to get a message between the two processes in driver
running on the same node. I would also encourage that we keep REEF code in
the non-core languages as thin as possible.

Comments?

Doug



On Tue, Apr 10, 2018 at 11:08 PM, Byung-Gon Chun <bg...@gmail.com> wrote:

> Tyson, thanks for the update!
> Could you give us a background on why a gRPC-based solution's introduced?
>
> Thanks!
> -Gon
>
> On Wed, Apr 11, 2018 at 1:59 AM, Tyson Condie <tc...@gmail.com>
> wrote:
>
> > Hello,
> >
> > We (myself, Doug Service and Scott Inglis) are in the process of
> developing
> > a new REEF Java Bridge that will use a two process solution to
> communicate
> > between the core Java Driver and an application Driver, which could be
> > implemented in an alternative language e.g., C#.
> >
> > Communication between these two worlds will occur over gRPC using
> protocol
> > buffers 3.5 as the data format. However, the code will be structured in a
> > way that minimizes such dependencies in the case that an
> alternative/better
> > communication medium presents itself.
> >
> > Current status: the core Java Driver is nearly code complete and I am
> > working on an application (client) Driver in Java (as well) that can be
> > used as a template for the C# application Driver, and for authoring unit
> > tests. Please expect a pull request with these changes in the coming
> days.
> > Concurrently, Doug Service and Scott Inglis will be developing a C# based
> > application Driver.
> >
> > The core Java Driver and Java client/application Driver changes can be
> > tracked via Jira 2002 (https://issues.apache.org/jira/browse/REEF-2002),
> > which is a subtask to Jira 335 for removing the managed C++ Java bridge.
> >
> > Your feedback would be most welcome!
> >
> > Thanks
> > Tyson
> >
>
>
>
> --
> Byung-Gon Chun
>

Re: gRPC based Java Bridge

Posted by Byung-Gon Chun <bg...@gmail.com>.
Tyson, thanks for the update!
Could you give us a background on why a gRPC-based solution's introduced?

Thanks!
-Gon

On Wed, Apr 11, 2018 at 1:59 AM, Tyson Condie <tc...@gmail.com>
wrote:

> Hello,
>
> We (myself, Doug Service and Scott Inglis) are in the process of developing
> a new REEF Java Bridge that will use a two process solution to communicate
> between the core Java Driver and an application Driver, which could be
> implemented in an alternative language e.g., C#.
>
> Communication between these two worlds will occur over gRPC using protocol
> buffers 3.5 as the data format. However, the code will be structured in a
> way that minimizes such dependencies in the case that an alternative/better
> communication medium presents itself.
>
> Current status: the core Java Driver is nearly code complete and I am
> working on an application (client) Driver in Java (as well) that can be
> used as a template for the C# application Driver, and for authoring unit
> tests. Please expect a pull request with these changes in the coming days.
> Concurrently, Doug Service and Scott Inglis will be developing a C# based
> application Driver.
>
> The core Java Driver and Java client/application Driver changes can be
> tracked via Jira 2002 (https://issues.apache.org/jira/browse/REEF-2002),
> which is a subtask to Jira 335 for removing the managed C++ Java bridge.
>
> Your feedback would be most welcome!
>
> Thanks
> Tyson
>



-- 
Byung-Gon Chun