You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by Michael Hogue <mi...@gmail.com> on 2017/06/06 18:28:49 UTC

NIFI-3641

All,

   As an initial dive into nifi i elected to take a stab at NIFI-3641
<https://issues.apache.org/jira/browse/NIFI-3641> (in particular, the
site-to-site bit), but i've got a few high level questions before taking
off down a path.

   Would it be more desirable to have a gRPC site-to-site mechanism or the
ability to generally interact with external gRPC services, such as tensor
flow, in a processor? That might help guide the approach I take in
addressing the issue. I'm not entirely sure it's possible to do both with
the same solution since the current site-to-site API assumes the other end
is a nifi. On top of that, gRPC provides a number of load balancing, remote
gRPC server (i.e. peer) selection, and serialization mechanisms that the
current site-to-site implementation does itself.

   What i'm looking for here are some thoughts and/or guidance on a
recommended approach and intended goal.

Thanks,
Mike

Re: NIFI-3641

Posted by Michael Hogue <mi...@gmail.com>.
Yes. I've created NIFI-4037
<https://issues.apache.org/jira/browse/NIFI-4037> and NIFI-4038
<https://issues.apache.org/jira/browse/NIFI-4038> to capture this work and
linked them to the original NIFI-3641
<https://issues.apache.org/jira/browse/NIFI-3641>. Additionally, i'd
created a design document [1] for the site-to-site piece that should
probably be removed until that's addressed. I don't appear to have
permission to remove it.

Thanks,
Mike

1.
https://cwiki.apache.org/confluence/display/NIFI/Support+gRPC+as+a+transport+mechanism+for+Site-to-Site

On Wed, Jun 7, 2017 at 2:48 PM Tony Kurc <tr...@gmail.com> wrote:

> Mike,
> Do you plan to put in a new ticket for a gRPC processor without the
> site-to-site?
>
> On Wed, Jun 7, 2017 at 10:40 AM, Joe Witt <jo...@gmail.com> wrote:
>
> > i'm definitely a +1.  That path makes total sense to me and lets us
> > all learn a bit more about gRPC and what it could do as we move along.
> >
> > Thanks
> >
> > On Wed, Jun 7, 2017 at 10:35 AM, Michael Hogue
> > <mi...@gmail.com> wrote:
> > > Koji,
> > >
> > >    I like that idea and it seems like a simple enough approach for
> > > introducing gRPC into nifi. I appreciate all of the feedback. If there
> > are
> > > no objections, i'll move forward with Koji's suggestion.
> > >
> > > Thanks,
> > > Mike
> > >
> > > On Tue, Jun 6, 2017 at 10:58 PM Koji Kawamura <ij...@gmail.com>
> > > wrote:
> > >
> > >> Hi Mike,
> > >>
> > >> I like the idea of adding gRPC as an option for NiFi to communicate
> > >> with other NiFi (s2s) or other server endpoint which can talk via
> > >> gRPC.
> > >>
> > >> I had implemented HTTP for s2s before. It was not an easy task (at
> > >> least for me) to make the new protocol align with existing
> > >> terminology, behavior and the same level of support.
> > >> We need to implement both s2s client and server side and that would
> > >> require a huge effort.
> > >>
> > >> I personally prefer starting with option 2 (enabling flow file sharing
> > >> to arbitrary external
> > >> services via gRPC). Probably start with a simple gRPC client processor
> > >> similar to InvokeHTTP.
> > >> Then we would expand our support by adding server side processor
> > >> similar to HandleHTTPRequest/Response.
> > >> After that, we could add support for gRPC way load-balancing, to
> > >> distribute requests among NiFi nodes in a cluster those are running
> > >> HandleGRPCRequest/Response.
> > >> At this point, we need a Load balancer described in this design
> > document:
> > >> https://github.com/grpc/grpc/blob/master/doc/load-balancing.md
> > >>
> > >> If we go through this route, we will have more detailed knowledge on
> > >> how gRPC works and more clear idea on how we can apply it to NiFi s2s
> > >> mechanism.
> > >>
> > >> I maybe wrong since I just started reading about gRPC technology..
> > >>
> > >> Thanks,
> > >> Koji
> > >>
> > >> On Wed, Jun 7, 2017 at 11:29 AM, Michael Hogue
> > >> <mi...@gmail.com> wrote:
> > >> > Indeed, Tony. Thanks for clearing that up.
> > >> >
> > >> > The first option (gRPC for s2s) may offer an opportunity to
> leverage a
> > >> > library for some of the things you'd likely care about with
> > distributed
> > >> > comms, such as load balancing, rather than implementing that
> > ourselves.
> > >> > gRPC has pluggable load balancing, authentication, and transport
> > >> protocols,
> > >> > so you're still free to provide your own implementation.
> > >> >
> > >> > I think the latter option (enabling flow file sharing to arbitrary
> > >> external
> > >> > services via gRPC) may open a number of additional doors not
> > previously
> > >> > available with tensorflow as the motivator.
> > >> >
> > >> > I'm happy to choose one of these things, but i thought it'd be wise
> to
> > >> open
> > >> > the conversation to the dev list prior to starting.
> > >> >
> > >> > Thanks,
> > >> > Mike
> > >> >
> > >> > On Tue, Jun 6, 2017 at 8:22 PM Tony Kurc <tr...@gmail.com> wrote:
> > >> >
> > >> >> Joe, the ticket [1] mentions tensorflow, implying, to some extent,
> > that
> > >> >> option A from my email wouldn't cut the mustard.
> > >> >>
> > >> >> 1. https://issues.apache.org/jira/browse/NIFI-3641
> > >> >>
> > >> >> On Jun 6, 2017 8:15 PM, "Joe Witt" <jo...@gmail.com> wrote:
> > >> >>
> > >> >> > Mike, Tony,
> > >> >> >
> > >> >> > This certainly sounds interesting and from reading through the
> > >> >> > motivations/design behind it there are clearly some well thought
> > out
> > >> >> > reasons for it.
> > >> >> >
> > >> >> > For site-to-site support I can see advantages for
> interoperability.
> > >> >> > For other factors it would be good to identify limitations of the
> > >> >> > current options (raw sockets/http) so that we can clearly and
> > >> >> > measurable improve gaps for certain use cases.  Would be good to
> > hear
> > >> >> > any of those you have in mind.
> > >> >> >
> > >> >> > Outside of site-to-site it seems like it could make sense for a
> > >> >> > processor configured with a given gRPC/proto being able to talk
> to
> > >> >> > another service to share data.  Is that planned as part of this
> > effort
> > >> >> > or is that just what the referenced JIRA was about?
> > >> >> >
> > >> >> > In either case this seems like one heck of an initial area to
> > >> >> > contribute to and a good discussion point!  Thanks
> > >> >> >
> > >> >> > Joe
> > >> >> >
> > >> >> > http://www.grpc.io/blog/principles
> > >> >> >
> > >> >> > On Tue, Jun 6, 2017 at 7:21 PM, Tony Kurc <tr...@gmail.com>
> > wrote:
> > >> >> > > Mike,
> > >> >> > > I think what you're saying is you are debating two options:
> > >> >> > >
> > >> >> > > A) gRPC as a transport mechanism and support the deployment use
> > >> cases
> > >> >> > from
> > >> >> > > the HTTP s2s document [1] to include using nifi-specific peer
> > >> selection
> > >> >> > in
> > >> >> > > the client if the destination is a cluster.
> > >> >> > > B) Building a different implementation with an additional
> > deployment
> > >> >> > case,
> > >> >> > > which is sending from a client (in the diagrams as NiFi
> > site-to-site
> > >> >> > > client) to a cluster which isn't NiFi and delegating peer
> > selection
> > >> "to
> > >> >> > > gRPC" which lightens what the receiving cluster would have to
> > >> >> implement?
> > >> >> > >
> > >> >> > > Sound pretty close to the decision you're looking for input on?
> > >> >> > >
> > >> >> > > 1.
> > >> >> > > https://cwiki.apache.org/confluence/display/NIFI/
> > >> >> > Support+HTTP%28S%29+as+a+transport+mechanism+for+Site-
> > >> >> > to-Site#SupportHTTP(S)asatransportmechanismforSite-
> > >> >> > to-Site-Deploymentexamples
> > >> >> > >
> > >> >> > > On Tue, Jun 6, 2017 at 2:28 PM, Michael Hogue <
> > >> >> > michael.p.hogue89@gmail.com>
> > >> >> > > wrote:
> > >> >> > >
> > >> >> > >> All,
> > >> >> > >>
> > >> >> > >>    As an initial dive into nifi i elected to take a stab at
> > >> NIFI-3641
> > >> >> > >> <https://issues.apache.org/jira/browse/NIFI-3641> (in
> > particular,
> > >> the
> > >> >> > >> site-to-site bit), but i've got a few high level questions
> > before
> > >> >> taking
> > >> >> > >> off down a path.
> > >> >> > >>
> > >> >> > >>    Would it be more desirable to have a gRPC site-to-site
> > >> mechanism or
> > >> >> > the
> > >> >> > >> ability to generally interact with external gRPC services,
> such
> > as
> > >> >> > tensor
> > >> >> > >> flow, in a processor? That might help guide the approach I
> take
> > in
> > >> >> > >> addressing the issue. I'm not entirely sure it's possible to
> do
> > >> both
> > >> >> > with
> > >> >> > >> the same solution since the current site-to-site API assumes
> the
> > >> other
> > >> >> > end
> > >> >> > >> is a nifi. On top of that, gRPC provides a number of load
> > >> balancing,
> > >> >> > remote
> > >> >> > >> gRPC server (i.e. peer) selection, and serialization
> mechanisms
> > >> that
> > >> >> the
> > >> >> > >> current site-to-site implementation does itself.
> > >> >> > >>
> > >> >> > >>    What i'm looking for here are some thoughts and/or guidance
> > on a
> > >> >> > >> recommended approach and intended goal.
> > >> >> > >>
> > >> >> > >> Thanks,
> > >> >> > >> Mike
> > >> >> > >>
> > >> >> >
> > >> >>
> > >>
> >
>

Re: NIFI-3641

Posted by Tony Kurc <tr...@gmail.com>.
Mike,
Do you plan to put in a new ticket for a gRPC processor without the
site-to-site?

On Wed, Jun 7, 2017 at 10:40 AM, Joe Witt <jo...@gmail.com> wrote:

> i'm definitely a +1.  That path makes total sense to me and lets us
> all learn a bit more about gRPC and what it could do as we move along.
>
> Thanks
>
> On Wed, Jun 7, 2017 at 10:35 AM, Michael Hogue
> <mi...@gmail.com> wrote:
> > Koji,
> >
> >    I like that idea and it seems like a simple enough approach for
> > introducing gRPC into nifi. I appreciate all of the feedback. If there
> are
> > no objections, i'll move forward with Koji's suggestion.
> >
> > Thanks,
> > Mike
> >
> > On Tue, Jun 6, 2017 at 10:58 PM Koji Kawamura <ij...@gmail.com>
> > wrote:
> >
> >> Hi Mike,
> >>
> >> I like the idea of adding gRPC as an option for NiFi to communicate
> >> with other NiFi (s2s) or other server endpoint which can talk via
> >> gRPC.
> >>
> >> I had implemented HTTP for s2s before. It was not an easy task (at
> >> least for me) to make the new protocol align with existing
> >> terminology, behavior and the same level of support.
> >> We need to implement both s2s client and server side and that would
> >> require a huge effort.
> >>
> >> I personally prefer starting with option 2 (enabling flow file sharing
> >> to arbitrary external
> >> services via gRPC). Probably start with a simple gRPC client processor
> >> similar to InvokeHTTP.
> >> Then we would expand our support by adding server side processor
> >> similar to HandleHTTPRequest/Response.
> >> After that, we could add support for gRPC way load-balancing, to
> >> distribute requests among NiFi nodes in a cluster those are running
> >> HandleGRPCRequest/Response.
> >> At this point, we need a Load balancer described in this design
> document:
> >> https://github.com/grpc/grpc/blob/master/doc/load-balancing.md
> >>
> >> If we go through this route, we will have more detailed knowledge on
> >> how gRPC works and more clear idea on how we can apply it to NiFi s2s
> >> mechanism.
> >>
> >> I maybe wrong since I just started reading about gRPC technology..
> >>
> >> Thanks,
> >> Koji
> >>
> >> On Wed, Jun 7, 2017 at 11:29 AM, Michael Hogue
> >> <mi...@gmail.com> wrote:
> >> > Indeed, Tony. Thanks for clearing that up.
> >> >
> >> > The first option (gRPC for s2s) may offer an opportunity to leverage a
> >> > library for some of the things you'd likely care about with
> distributed
> >> > comms, such as load balancing, rather than implementing that
> ourselves.
> >> > gRPC has pluggable load balancing, authentication, and transport
> >> protocols,
> >> > so you're still free to provide your own implementation.
> >> >
> >> > I think the latter option (enabling flow file sharing to arbitrary
> >> external
> >> > services via gRPC) may open a number of additional doors not
> previously
> >> > available with tensorflow as the motivator.
> >> >
> >> > I'm happy to choose one of these things, but i thought it'd be wise to
> >> open
> >> > the conversation to the dev list prior to starting.
> >> >
> >> > Thanks,
> >> > Mike
> >> >
> >> > On Tue, Jun 6, 2017 at 8:22 PM Tony Kurc <tr...@gmail.com> wrote:
> >> >
> >> >> Joe, the ticket [1] mentions tensorflow, implying, to some extent,
> that
> >> >> option A from my email wouldn't cut the mustard.
> >> >>
> >> >> 1. https://issues.apache.org/jira/browse/NIFI-3641
> >> >>
> >> >> On Jun 6, 2017 8:15 PM, "Joe Witt" <jo...@gmail.com> wrote:
> >> >>
> >> >> > Mike, Tony,
> >> >> >
> >> >> > This certainly sounds interesting and from reading through the
> >> >> > motivations/design behind it there are clearly some well thought
> out
> >> >> > reasons for it.
> >> >> >
> >> >> > For site-to-site support I can see advantages for interoperability.
> >> >> > For other factors it would be good to identify limitations of the
> >> >> > current options (raw sockets/http) so that we can clearly and
> >> >> > measurable improve gaps for certain use cases.  Would be good to
> hear
> >> >> > any of those you have in mind.
> >> >> >
> >> >> > Outside of site-to-site it seems like it could make sense for a
> >> >> > processor configured with a given gRPC/proto being able to talk to
> >> >> > another service to share data.  Is that planned as part of this
> effort
> >> >> > or is that just what the referenced JIRA was about?
> >> >> >
> >> >> > In either case this seems like one heck of an initial area to
> >> >> > contribute to and a good discussion point!  Thanks
> >> >> >
> >> >> > Joe
> >> >> >
> >> >> > http://www.grpc.io/blog/principles
> >> >> >
> >> >> > On Tue, Jun 6, 2017 at 7:21 PM, Tony Kurc <tr...@gmail.com>
> wrote:
> >> >> > > Mike,
> >> >> > > I think what you're saying is you are debating two options:
> >> >> > >
> >> >> > > A) gRPC as a transport mechanism and support the deployment use
> >> cases
> >> >> > from
> >> >> > > the HTTP s2s document [1] to include using nifi-specific peer
> >> selection
> >> >> > in
> >> >> > > the client if the destination is a cluster.
> >> >> > > B) Building a different implementation with an additional
> deployment
> >> >> > case,
> >> >> > > which is sending from a client (in the diagrams as NiFi
> site-to-site
> >> >> > > client) to a cluster which isn't NiFi and delegating peer
> selection
> >> "to
> >> >> > > gRPC" which lightens what the receiving cluster would have to
> >> >> implement?
> >> >> > >
> >> >> > > Sound pretty close to the decision you're looking for input on?
> >> >> > >
> >> >> > > 1.
> >> >> > > https://cwiki.apache.org/confluence/display/NIFI/
> >> >> > Support+HTTP%28S%29+as+a+transport+mechanism+for+Site-
> >> >> > to-Site#SupportHTTP(S)asatransportmechanismforSite-
> >> >> > to-Site-Deploymentexamples
> >> >> > >
> >> >> > > On Tue, Jun 6, 2017 at 2:28 PM, Michael Hogue <
> >> >> > michael.p.hogue89@gmail.com>
> >> >> > > wrote:
> >> >> > >
> >> >> > >> All,
> >> >> > >>
> >> >> > >>    As an initial dive into nifi i elected to take a stab at
> >> NIFI-3641
> >> >> > >> <https://issues.apache.org/jira/browse/NIFI-3641> (in
> particular,
> >> the
> >> >> > >> site-to-site bit), but i've got a few high level questions
> before
> >> >> taking
> >> >> > >> off down a path.
> >> >> > >>
> >> >> > >>    Would it be more desirable to have a gRPC site-to-site
> >> mechanism or
> >> >> > the
> >> >> > >> ability to generally interact with external gRPC services, such
> as
> >> >> > tensor
> >> >> > >> flow, in a processor? That might help guide the approach I take
> in
> >> >> > >> addressing the issue. I'm not entirely sure it's possible to do
> >> both
> >> >> > with
> >> >> > >> the same solution since the current site-to-site API assumes the
> >> other
> >> >> > end
> >> >> > >> is a nifi. On top of that, gRPC provides a number of load
> >> balancing,
> >> >> > remote
> >> >> > >> gRPC server (i.e. peer) selection, and serialization mechanisms
> >> that
> >> >> the
> >> >> > >> current site-to-site implementation does itself.
> >> >> > >>
> >> >> > >>    What i'm looking for here are some thoughts and/or guidance
> on a
> >> >> > >> recommended approach and intended goal.
> >> >> > >>
> >> >> > >> Thanks,
> >> >> > >> Mike
> >> >> > >>
> >> >> >
> >> >>
> >>
>

Re: NIFI-3641

Posted by Joe Witt <jo...@gmail.com>.
i'm definitely a +1.  That path makes total sense to me and lets us
all learn a bit more about gRPC and what it could do as we move along.

Thanks

On Wed, Jun 7, 2017 at 10:35 AM, Michael Hogue
<mi...@gmail.com> wrote:
> Koji,
>
>    I like that idea and it seems like a simple enough approach for
> introducing gRPC into nifi. I appreciate all of the feedback. If there are
> no objections, i'll move forward with Koji's suggestion.
>
> Thanks,
> Mike
>
> On Tue, Jun 6, 2017 at 10:58 PM Koji Kawamura <ij...@gmail.com>
> wrote:
>
>> Hi Mike,
>>
>> I like the idea of adding gRPC as an option for NiFi to communicate
>> with other NiFi (s2s) or other server endpoint which can talk via
>> gRPC.
>>
>> I had implemented HTTP for s2s before. It was not an easy task (at
>> least for me) to make the new protocol align with existing
>> terminology, behavior and the same level of support.
>> We need to implement both s2s client and server side and that would
>> require a huge effort.
>>
>> I personally prefer starting with option 2 (enabling flow file sharing
>> to arbitrary external
>> services via gRPC). Probably start with a simple gRPC client processor
>> similar to InvokeHTTP.
>> Then we would expand our support by adding server side processor
>> similar to HandleHTTPRequest/Response.
>> After that, we could add support for gRPC way load-balancing, to
>> distribute requests among NiFi nodes in a cluster those are running
>> HandleGRPCRequest/Response.
>> At this point, we need a Load balancer described in this design document:
>> https://github.com/grpc/grpc/blob/master/doc/load-balancing.md
>>
>> If we go through this route, we will have more detailed knowledge on
>> how gRPC works and more clear idea on how we can apply it to NiFi s2s
>> mechanism.
>>
>> I maybe wrong since I just started reading about gRPC technology..
>>
>> Thanks,
>> Koji
>>
>> On Wed, Jun 7, 2017 at 11:29 AM, Michael Hogue
>> <mi...@gmail.com> wrote:
>> > Indeed, Tony. Thanks for clearing that up.
>> >
>> > The first option (gRPC for s2s) may offer an opportunity to leverage a
>> > library for some of the things you'd likely care about with distributed
>> > comms, such as load balancing, rather than implementing that ourselves.
>> > gRPC has pluggable load balancing, authentication, and transport
>> protocols,
>> > so you're still free to provide your own implementation.
>> >
>> > I think the latter option (enabling flow file sharing to arbitrary
>> external
>> > services via gRPC) may open a number of additional doors not previously
>> > available with tensorflow as the motivator.
>> >
>> > I'm happy to choose one of these things, but i thought it'd be wise to
>> open
>> > the conversation to the dev list prior to starting.
>> >
>> > Thanks,
>> > Mike
>> >
>> > On Tue, Jun 6, 2017 at 8:22 PM Tony Kurc <tr...@gmail.com> wrote:
>> >
>> >> Joe, the ticket [1] mentions tensorflow, implying, to some extent, that
>> >> option A from my email wouldn't cut the mustard.
>> >>
>> >> 1. https://issues.apache.org/jira/browse/NIFI-3641
>> >>
>> >> On Jun 6, 2017 8:15 PM, "Joe Witt" <jo...@gmail.com> wrote:
>> >>
>> >> > Mike, Tony,
>> >> >
>> >> > This certainly sounds interesting and from reading through the
>> >> > motivations/design behind it there are clearly some well thought out
>> >> > reasons for it.
>> >> >
>> >> > For site-to-site support I can see advantages for interoperability.
>> >> > For other factors it would be good to identify limitations of the
>> >> > current options (raw sockets/http) so that we can clearly and
>> >> > measurable improve gaps for certain use cases.  Would be good to hear
>> >> > any of those you have in mind.
>> >> >
>> >> > Outside of site-to-site it seems like it could make sense for a
>> >> > processor configured with a given gRPC/proto being able to talk to
>> >> > another service to share data.  Is that planned as part of this effort
>> >> > or is that just what the referenced JIRA was about?
>> >> >
>> >> > In either case this seems like one heck of an initial area to
>> >> > contribute to and a good discussion point!  Thanks
>> >> >
>> >> > Joe
>> >> >
>> >> > http://www.grpc.io/blog/principles
>> >> >
>> >> > On Tue, Jun 6, 2017 at 7:21 PM, Tony Kurc <tr...@gmail.com> wrote:
>> >> > > Mike,
>> >> > > I think what you're saying is you are debating two options:
>> >> > >
>> >> > > A) gRPC as a transport mechanism and support the deployment use
>> cases
>> >> > from
>> >> > > the HTTP s2s document [1] to include using nifi-specific peer
>> selection
>> >> > in
>> >> > > the client if the destination is a cluster.
>> >> > > B) Building a different implementation with an additional deployment
>> >> > case,
>> >> > > which is sending from a client (in the diagrams as NiFi site-to-site
>> >> > > client) to a cluster which isn't NiFi and delegating peer selection
>> "to
>> >> > > gRPC" which lightens what the receiving cluster would have to
>> >> implement?
>> >> > >
>> >> > > Sound pretty close to the decision you're looking for input on?
>> >> > >
>> >> > > 1.
>> >> > > https://cwiki.apache.org/confluence/display/NIFI/
>> >> > Support+HTTP%28S%29+as+a+transport+mechanism+for+Site-
>> >> > to-Site#SupportHTTP(S)asatransportmechanismforSite-
>> >> > to-Site-Deploymentexamples
>> >> > >
>> >> > > On Tue, Jun 6, 2017 at 2:28 PM, Michael Hogue <
>> >> > michael.p.hogue89@gmail.com>
>> >> > > wrote:
>> >> > >
>> >> > >> All,
>> >> > >>
>> >> > >>    As an initial dive into nifi i elected to take a stab at
>> NIFI-3641
>> >> > >> <https://issues.apache.org/jira/browse/NIFI-3641> (in particular,
>> the
>> >> > >> site-to-site bit), but i've got a few high level questions before
>> >> taking
>> >> > >> off down a path.
>> >> > >>
>> >> > >>    Would it be more desirable to have a gRPC site-to-site
>> mechanism or
>> >> > the
>> >> > >> ability to generally interact with external gRPC services, such as
>> >> > tensor
>> >> > >> flow, in a processor? That might help guide the approach I take in
>> >> > >> addressing the issue. I'm not entirely sure it's possible to do
>> both
>> >> > with
>> >> > >> the same solution since the current site-to-site API assumes the
>> other
>> >> > end
>> >> > >> is a nifi. On top of that, gRPC provides a number of load
>> balancing,
>> >> > remote
>> >> > >> gRPC server (i.e. peer) selection, and serialization mechanisms
>> that
>> >> the
>> >> > >> current site-to-site implementation does itself.
>> >> > >>
>> >> > >>    What i'm looking for here are some thoughts and/or guidance on a
>> >> > >> recommended approach and intended goal.
>> >> > >>
>> >> > >> Thanks,
>> >> > >> Mike
>> >> > >>
>> >> >
>> >>
>>

Re: NIFI-3641

Posted by Michael Hogue <mi...@gmail.com>.
Koji,

   I like that idea and it seems like a simple enough approach for
introducing gRPC into nifi. I appreciate all of the feedback. If there are
no objections, i'll move forward with Koji's suggestion.

Thanks,
Mike

On Tue, Jun 6, 2017 at 10:58 PM Koji Kawamura <ij...@gmail.com>
wrote:

> Hi Mike,
>
> I like the idea of adding gRPC as an option for NiFi to communicate
> with other NiFi (s2s) or other server endpoint which can talk via
> gRPC.
>
> I had implemented HTTP for s2s before. It was not an easy task (at
> least for me) to make the new protocol align with existing
> terminology, behavior and the same level of support.
> We need to implement both s2s client and server side and that would
> require a huge effort.
>
> I personally prefer starting with option 2 (enabling flow file sharing
> to arbitrary external
> services via gRPC). Probably start with a simple gRPC client processor
> similar to InvokeHTTP.
> Then we would expand our support by adding server side processor
> similar to HandleHTTPRequest/Response.
> After that, we could add support for gRPC way load-balancing, to
> distribute requests among NiFi nodes in a cluster those are running
> HandleGRPCRequest/Response.
> At this point, we need a Load balancer described in this design document:
> https://github.com/grpc/grpc/blob/master/doc/load-balancing.md
>
> If we go through this route, we will have more detailed knowledge on
> how gRPC works and more clear idea on how we can apply it to NiFi s2s
> mechanism.
>
> I maybe wrong since I just started reading about gRPC technology..
>
> Thanks,
> Koji
>
> On Wed, Jun 7, 2017 at 11:29 AM, Michael Hogue
> <mi...@gmail.com> wrote:
> > Indeed, Tony. Thanks for clearing that up.
> >
> > The first option (gRPC for s2s) may offer an opportunity to leverage a
> > library for some of the things you'd likely care about with distributed
> > comms, such as load balancing, rather than implementing that ourselves.
> > gRPC has pluggable load balancing, authentication, and transport
> protocols,
> > so you're still free to provide your own implementation.
> >
> > I think the latter option (enabling flow file sharing to arbitrary
> external
> > services via gRPC) may open a number of additional doors not previously
> > available with tensorflow as the motivator.
> >
> > I'm happy to choose one of these things, but i thought it'd be wise to
> open
> > the conversation to the dev list prior to starting.
> >
> > Thanks,
> > Mike
> >
> > On Tue, Jun 6, 2017 at 8:22 PM Tony Kurc <tr...@gmail.com> wrote:
> >
> >> Joe, the ticket [1] mentions tensorflow, implying, to some extent, that
> >> option A from my email wouldn't cut the mustard.
> >>
> >> 1. https://issues.apache.org/jira/browse/NIFI-3641
> >>
> >> On Jun 6, 2017 8:15 PM, "Joe Witt" <jo...@gmail.com> wrote:
> >>
> >> > Mike, Tony,
> >> >
> >> > This certainly sounds interesting and from reading through the
> >> > motivations/design behind it there are clearly some well thought out
> >> > reasons for it.
> >> >
> >> > For site-to-site support I can see advantages for interoperability.
> >> > For other factors it would be good to identify limitations of the
> >> > current options (raw sockets/http) so that we can clearly and
> >> > measurable improve gaps for certain use cases.  Would be good to hear
> >> > any of those you have in mind.
> >> >
> >> > Outside of site-to-site it seems like it could make sense for a
> >> > processor configured with a given gRPC/proto being able to talk to
> >> > another service to share data.  Is that planned as part of this effort
> >> > or is that just what the referenced JIRA was about?
> >> >
> >> > In either case this seems like one heck of an initial area to
> >> > contribute to and a good discussion point!  Thanks
> >> >
> >> > Joe
> >> >
> >> > http://www.grpc.io/blog/principles
> >> >
> >> > On Tue, Jun 6, 2017 at 7:21 PM, Tony Kurc <tr...@gmail.com> wrote:
> >> > > Mike,
> >> > > I think what you're saying is you are debating two options:
> >> > >
> >> > > A) gRPC as a transport mechanism and support the deployment use
> cases
> >> > from
> >> > > the HTTP s2s document [1] to include using nifi-specific peer
> selection
> >> > in
> >> > > the client if the destination is a cluster.
> >> > > B) Building a different implementation with an additional deployment
> >> > case,
> >> > > which is sending from a client (in the diagrams as NiFi site-to-site
> >> > > client) to a cluster which isn't NiFi and delegating peer selection
> "to
> >> > > gRPC" which lightens what the receiving cluster would have to
> >> implement?
> >> > >
> >> > > Sound pretty close to the decision you're looking for input on?
> >> > >
> >> > > 1.
> >> > > https://cwiki.apache.org/confluence/display/NIFI/
> >> > Support+HTTP%28S%29+as+a+transport+mechanism+for+Site-
> >> > to-Site#SupportHTTP(S)asatransportmechanismforSite-
> >> > to-Site-Deploymentexamples
> >> > >
> >> > > On Tue, Jun 6, 2017 at 2:28 PM, Michael Hogue <
> >> > michael.p.hogue89@gmail.com>
> >> > > wrote:
> >> > >
> >> > >> All,
> >> > >>
> >> > >>    As an initial dive into nifi i elected to take a stab at
> NIFI-3641
> >> > >> <https://issues.apache.org/jira/browse/NIFI-3641> (in particular,
> the
> >> > >> site-to-site bit), but i've got a few high level questions before
> >> taking
> >> > >> off down a path.
> >> > >>
> >> > >>    Would it be more desirable to have a gRPC site-to-site
> mechanism or
> >> > the
> >> > >> ability to generally interact with external gRPC services, such as
> >> > tensor
> >> > >> flow, in a processor? That might help guide the approach I take in
> >> > >> addressing the issue. I'm not entirely sure it's possible to do
> both
> >> > with
> >> > >> the same solution since the current site-to-site API assumes the
> other
> >> > end
> >> > >> is a nifi. On top of that, gRPC provides a number of load
> balancing,
> >> > remote
> >> > >> gRPC server (i.e. peer) selection, and serialization mechanisms
> that
> >> the
> >> > >> current site-to-site implementation does itself.
> >> > >>
> >> > >>    What i'm looking for here are some thoughts and/or guidance on a
> >> > >> recommended approach and intended goal.
> >> > >>
> >> > >> Thanks,
> >> > >> Mike
> >> > >>
> >> >
> >>
>

Re: NIFI-3641

Posted by Koji Kawamura <ij...@gmail.com>.
Hi Mike,

I like the idea of adding gRPC as an option for NiFi to communicate
with other NiFi (s2s) or other server endpoint which can talk via
gRPC.

I had implemented HTTP for s2s before. It was not an easy task (at
least for me) to make the new protocol align with existing
terminology, behavior and the same level of support.
We need to implement both s2s client and server side and that would
require a huge effort.

I personally prefer starting with option 2 (enabling flow file sharing
to arbitrary external
services via gRPC). Probably start with a simple gRPC client processor
similar to InvokeHTTP.
Then we would expand our support by adding server side processor
similar to HandleHTTPRequest/Response.
After that, we could add support for gRPC way load-balancing, to
distribute requests among NiFi nodes in a cluster those are running
HandleGRPCRequest/Response.
At this point, we need a Load balancer described in this design document:
https://github.com/grpc/grpc/blob/master/doc/load-balancing.md

If we go through this route, we will have more detailed knowledge on
how gRPC works and more clear idea on how we can apply it to NiFi s2s
mechanism.

I maybe wrong since I just started reading about gRPC technology..

Thanks,
Koji

On Wed, Jun 7, 2017 at 11:29 AM, Michael Hogue
<mi...@gmail.com> wrote:
> Indeed, Tony. Thanks for clearing that up.
>
> The first option (gRPC for s2s) may offer an opportunity to leverage a
> library for some of the things you'd likely care about with distributed
> comms, such as load balancing, rather than implementing that ourselves.
> gRPC has pluggable load balancing, authentication, and transport protocols,
> so you're still free to provide your own implementation.
>
> I think the latter option (enabling flow file sharing to arbitrary external
> services via gRPC) may open a number of additional doors not previously
> available with tensorflow as the motivator.
>
> I'm happy to choose one of these things, but i thought it'd be wise to open
> the conversation to the dev list prior to starting.
>
> Thanks,
> Mike
>
> On Tue, Jun 6, 2017 at 8:22 PM Tony Kurc <tr...@gmail.com> wrote:
>
>> Joe, the ticket [1] mentions tensorflow, implying, to some extent, that
>> option A from my email wouldn't cut the mustard.
>>
>> 1. https://issues.apache.org/jira/browse/NIFI-3641
>>
>> On Jun 6, 2017 8:15 PM, "Joe Witt" <jo...@gmail.com> wrote:
>>
>> > Mike, Tony,
>> >
>> > This certainly sounds interesting and from reading through the
>> > motivations/design behind it there are clearly some well thought out
>> > reasons for it.
>> >
>> > For site-to-site support I can see advantages for interoperability.
>> > For other factors it would be good to identify limitations of the
>> > current options (raw sockets/http) so that we can clearly and
>> > measurable improve gaps for certain use cases.  Would be good to hear
>> > any of those you have in mind.
>> >
>> > Outside of site-to-site it seems like it could make sense for a
>> > processor configured with a given gRPC/proto being able to talk to
>> > another service to share data.  Is that planned as part of this effort
>> > or is that just what the referenced JIRA was about?
>> >
>> > In either case this seems like one heck of an initial area to
>> > contribute to and a good discussion point!  Thanks
>> >
>> > Joe
>> >
>> > http://www.grpc.io/blog/principles
>> >
>> > On Tue, Jun 6, 2017 at 7:21 PM, Tony Kurc <tr...@gmail.com> wrote:
>> > > Mike,
>> > > I think what you're saying is you are debating two options:
>> > >
>> > > A) gRPC as a transport mechanism and support the deployment use cases
>> > from
>> > > the HTTP s2s document [1] to include using nifi-specific peer selection
>> > in
>> > > the client if the destination is a cluster.
>> > > B) Building a different implementation with an additional deployment
>> > case,
>> > > which is sending from a client (in the diagrams as NiFi site-to-site
>> > > client) to a cluster which isn't NiFi and delegating peer selection "to
>> > > gRPC" which lightens what the receiving cluster would have to
>> implement?
>> > >
>> > > Sound pretty close to the decision you're looking for input on?
>> > >
>> > > 1.
>> > > https://cwiki.apache.org/confluence/display/NIFI/
>> > Support+HTTP%28S%29+as+a+transport+mechanism+for+Site-
>> > to-Site#SupportHTTP(S)asatransportmechanismforSite-
>> > to-Site-Deploymentexamples
>> > >
>> > > On Tue, Jun 6, 2017 at 2:28 PM, Michael Hogue <
>> > michael.p.hogue89@gmail.com>
>> > > wrote:
>> > >
>> > >> All,
>> > >>
>> > >>    As an initial dive into nifi i elected to take a stab at NIFI-3641
>> > >> <https://issues.apache.org/jira/browse/NIFI-3641> (in particular, the
>> > >> site-to-site bit), but i've got a few high level questions before
>> taking
>> > >> off down a path.
>> > >>
>> > >>    Would it be more desirable to have a gRPC site-to-site mechanism or
>> > the
>> > >> ability to generally interact with external gRPC services, such as
>> > tensor
>> > >> flow, in a processor? That might help guide the approach I take in
>> > >> addressing the issue. I'm not entirely sure it's possible to do both
>> > with
>> > >> the same solution since the current site-to-site API assumes the other
>> > end
>> > >> is a nifi. On top of that, gRPC provides a number of load balancing,
>> > remote
>> > >> gRPC server (i.e. peer) selection, and serialization mechanisms that
>> the
>> > >> current site-to-site implementation does itself.
>> > >>
>> > >>    What i'm looking for here are some thoughts and/or guidance on a
>> > >> recommended approach and intended goal.
>> > >>
>> > >> Thanks,
>> > >> Mike
>> > >>
>> >
>>

Re: NIFI-3641

Posted by Michael Hogue <mi...@gmail.com>.
Indeed, Tony. Thanks for clearing that up.

The first option (gRPC for s2s) may offer an opportunity to leverage a
library for some of the things you'd likely care about with distributed
comms, such as load balancing, rather than implementing that ourselves.
gRPC has pluggable load balancing, authentication, and transport protocols,
so you're still free to provide your own implementation.

I think the latter option (enabling flow file sharing to arbitrary external
services via gRPC) may open a number of additional doors not previously
available with tensorflow as the motivator.

I'm happy to choose one of these things, but i thought it'd be wise to open
the conversation to the dev list prior to starting.

Thanks,
Mike

On Tue, Jun 6, 2017 at 8:22 PM Tony Kurc <tr...@gmail.com> wrote:

> Joe, the ticket [1] mentions tensorflow, implying, to some extent, that
> option A from my email wouldn't cut the mustard.
>
> 1. https://issues.apache.org/jira/browse/NIFI-3641
>
> On Jun 6, 2017 8:15 PM, "Joe Witt" <jo...@gmail.com> wrote:
>
> > Mike, Tony,
> >
> > This certainly sounds interesting and from reading through the
> > motivations/design behind it there are clearly some well thought out
> > reasons for it.
> >
> > For site-to-site support I can see advantages for interoperability.
> > For other factors it would be good to identify limitations of the
> > current options (raw sockets/http) so that we can clearly and
> > measurable improve gaps for certain use cases.  Would be good to hear
> > any of those you have in mind.
> >
> > Outside of site-to-site it seems like it could make sense for a
> > processor configured with a given gRPC/proto being able to talk to
> > another service to share data.  Is that planned as part of this effort
> > or is that just what the referenced JIRA was about?
> >
> > In either case this seems like one heck of an initial area to
> > contribute to and a good discussion point!  Thanks
> >
> > Joe
> >
> > http://www.grpc.io/blog/principles
> >
> > On Tue, Jun 6, 2017 at 7:21 PM, Tony Kurc <tr...@gmail.com> wrote:
> > > Mike,
> > > I think what you're saying is you are debating two options:
> > >
> > > A) gRPC as a transport mechanism and support the deployment use cases
> > from
> > > the HTTP s2s document [1] to include using nifi-specific peer selection
> > in
> > > the client if the destination is a cluster.
> > > B) Building a different implementation with an additional deployment
> > case,
> > > which is sending from a client (in the diagrams as NiFi site-to-site
> > > client) to a cluster which isn't NiFi and delegating peer selection "to
> > > gRPC" which lightens what the receiving cluster would have to
> implement?
> > >
> > > Sound pretty close to the decision you're looking for input on?
> > >
> > > 1.
> > > https://cwiki.apache.org/confluence/display/NIFI/
> > Support+HTTP%28S%29+as+a+transport+mechanism+for+Site-
> > to-Site#SupportHTTP(S)asatransportmechanismforSite-
> > to-Site-Deploymentexamples
> > >
> > > On Tue, Jun 6, 2017 at 2:28 PM, Michael Hogue <
> > michael.p.hogue89@gmail.com>
> > > wrote:
> > >
> > >> All,
> > >>
> > >>    As an initial dive into nifi i elected to take a stab at NIFI-3641
> > >> <https://issues.apache.org/jira/browse/NIFI-3641> (in particular, the
> > >> site-to-site bit), but i've got a few high level questions before
> taking
> > >> off down a path.
> > >>
> > >>    Would it be more desirable to have a gRPC site-to-site mechanism or
> > the
> > >> ability to generally interact with external gRPC services, such as
> > tensor
> > >> flow, in a processor? That might help guide the approach I take in
> > >> addressing the issue. I'm not entirely sure it's possible to do both
> > with
> > >> the same solution since the current site-to-site API assumes the other
> > end
> > >> is a nifi. On top of that, gRPC provides a number of load balancing,
> > remote
> > >> gRPC server (i.e. peer) selection, and serialization mechanisms that
> the
> > >> current site-to-site implementation does itself.
> > >>
> > >>    What i'm looking for here are some thoughts and/or guidance on a
> > >> recommended approach and intended goal.
> > >>
> > >> Thanks,
> > >> Mike
> > >>
> >
>

Re: NIFI-3641

Posted by Tony Kurc <tr...@gmail.com>.
Joe, the ticket [1] mentions tensorflow, implying, to some extent, that
option A from my email wouldn't cut the mustard.

1. https://issues.apache.org/jira/browse/NIFI-3641

On Jun 6, 2017 8:15 PM, "Joe Witt" <jo...@gmail.com> wrote:

> Mike, Tony,
>
> This certainly sounds interesting and from reading through the
> motivations/design behind it there are clearly some well thought out
> reasons for it.
>
> For site-to-site support I can see advantages for interoperability.
> For other factors it would be good to identify limitations of the
> current options (raw sockets/http) so that we can clearly and
> measurable improve gaps for certain use cases.  Would be good to hear
> any of those you have in mind.
>
> Outside of site-to-site it seems like it could make sense for a
> processor configured with a given gRPC/proto being able to talk to
> another service to share data.  Is that planned as part of this effort
> or is that just what the referenced JIRA was about?
>
> In either case this seems like one heck of an initial area to
> contribute to and a good discussion point!  Thanks
>
> Joe
>
> http://www.grpc.io/blog/principles
>
> On Tue, Jun 6, 2017 at 7:21 PM, Tony Kurc <tr...@gmail.com> wrote:
> > Mike,
> > I think what you're saying is you are debating two options:
> >
> > A) gRPC as a transport mechanism and support the deployment use cases
> from
> > the HTTP s2s document [1] to include using nifi-specific peer selection
> in
> > the client if the destination is a cluster.
> > B) Building a different implementation with an additional deployment
> case,
> > which is sending from a client (in the diagrams as NiFi site-to-site
> > client) to a cluster which isn't NiFi and delegating peer selection "to
> > gRPC" which lightens what the receiving cluster would have to implement?
> >
> > Sound pretty close to the decision you're looking for input on?
> >
> > 1.
> > https://cwiki.apache.org/confluence/display/NIFI/
> Support+HTTP%28S%29+as+a+transport+mechanism+for+Site-
> to-Site#SupportHTTP(S)asatransportmechanismforSite-
> to-Site-Deploymentexamples
> >
> > On Tue, Jun 6, 2017 at 2:28 PM, Michael Hogue <
> michael.p.hogue89@gmail.com>
> > wrote:
> >
> >> All,
> >>
> >>    As an initial dive into nifi i elected to take a stab at NIFI-3641
> >> <https://issues.apache.org/jira/browse/NIFI-3641> (in particular, the
> >> site-to-site bit), but i've got a few high level questions before taking
> >> off down a path.
> >>
> >>    Would it be more desirable to have a gRPC site-to-site mechanism or
> the
> >> ability to generally interact with external gRPC services, such as
> tensor
> >> flow, in a processor? That might help guide the approach I take in
> >> addressing the issue. I'm not entirely sure it's possible to do both
> with
> >> the same solution since the current site-to-site API assumes the other
> end
> >> is a nifi. On top of that, gRPC provides a number of load balancing,
> remote
> >> gRPC server (i.e. peer) selection, and serialization mechanisms that the
> >> current site-to-site implementation does itself.
> >>
> >>    What i'm looking for here are some thoughts and/or guidance on a
> >> recommended approach and intended goal.
> >>
> >> Thanks,
> >> Mike
> >>
>

Re: NIFI-3641

Posted by Joe Witt <jo...@gmail.com>.
Mike, Tony,

This certainly sounds interesting and from reading through the
motivations/design behind it there are clearly some well thought out
reasons for it.

For site-to-site support I can see advantages for interoperability.
For other factors it would be good to identify limitations of the
current options (raw sockets/http) so that we can clearly and
measurable improve gaps for certain use cases.  Would be good to hear
any of those you have in mind.

Outside of site-to-site it seems like it could make sense for a
processor configured with a given gRPC/proto being able to talk to
another service to share data.  Is that planned as part of this effort
or is that just what the referenced JIRA was about?

In either case this seems like one heck of an initial area to
contribute to and a good discussion point!  Thanks

Joe

http://www.grpc.io/blog/principles

On Tue, Jun 6, 2017 at 7:21 PM, Tony Kurc <tr...@gmail.com> wrote:
> Mike,
> I think what you're saying is you are debating two options:
>
> A) gRPC as a transport mechanism and support the deployment use cases from
> the HTTP s2s document [1] to include using nifi-specific peer selection in
> the client if the destination is a cluster.
> B) Building a different implementation with an additional deployment case,
> which is sending from a client (in the diagrams as NiFi site-to-site
> client) to a cluster which isn't NiFi and delegating peer selection "to
> gRPC" which lightens what the receiving cluster would have to implement?
>
> Sound pretty close to the decision you're looking for input on?
>
> 1.
> https://cwiki.apache.org/confluence/display/NIFI/Support+HTTP%28S%29+as+a+transport+mechanism+for+Site-to-Site#SupportHTTP(S)asatransportmechanismforSite-to-Site-Deploymentexamples
>
> On Tue, Jun 6, 2017 at 2:28 PM, Michael Hogue <mi...@gmail.com>
> wrote:
>
>> All,
>>
>>    As an initial dive into nifi i elected to take a stab at NIFI-3641
>> <https://issues.apache.org/jira/browse/NIFI-3641> (in particular, the
>> site-to-site bit), but i've got a few high level questions before taking
>> off down a path.
>>
>>    Would it be more desirable to have a gRPC site-to-site mechanism or the
>> ability to generally interact with external gRPC services, such as tensor
>> flow, in a processor? That might help guide the approach I take in
>> addressing the issue. I'm not entirely sure it's possible to do both with
>> the same solution since the current site-to-site API assumes the other end
>> is a nifi. On top of that, gRPC provides a number of load balancing, remote
>> gRPC server (i.e. peer) selection, and serialization mechanisms that the
>> current site-to-site implementation does itself.
>>
>>    What i'm looking for here are some thoughts and/or guidance on a
>> recommended approach and intended goal.
>>
>> Thanks,
>> Mike
>>

Re: NIFI-3641

Posted by Tony Kurc <tr...@gmail.com>.
Mike,
I think what you're saying is you are debating two options:

A) gRPC as a transport mechanism and support the deployment use cases from
the HTTP s2s document [1] to include using nifi-specific peer selection in
the client if the destination is a cluster.
B) Building a different implementation with an additional deployment case,
which is sending from a client (in the diagrams as NiFi site-to-site
client) to a cluster which isn't NiFi and delegating peer selection "to
gRPC" which lightens what the receiving cluster would have to implement?

Sound pretty close to the decision you're looking for input on?

1.
https://cwiki.apache.org/confluence/display/NIFI/Support+HTTP%28S%29+as+a+transport+mechanism+for+Site-to-Site#SupportHTTP(S)asatransportmechanismforSite-to-Site-Deploymentexamples

On Tue, Jun 6, 2017 at 2:28 PM, Michael Hogue <mi...@gmail.com>
wrote:

> All,
>
>    As an initial dive into nifi i elected to take a stab at NIFI-3641
> <https://issues.apache.org/jira/browse/NIFI-3641> (in particular, the
> site-to-site bit), but i've got a few high level questions before taking
> off down a path.
>
>    Would it be more desirable to have a gRPC site-to-site mechanism or the
> ability to generally interact with external gRPC services, such as tensor
> flow, in a processor? That might help guide the approach I take in
> addressing the issue. I'm not entirely sure it's possible to do both with
> the same solution since the current site-to-site API assumes the other end
> is a nifi. On top of that, gRPC provides a number of load balancing, remote
> gRPC server (i.e. peer) selection, and serialization mechanisms that the
> current site-to-site implementation does itself.
>
>    What i'm looking for here are some thoughts and/or guidance on a
> recommended approach and intended goal.
>
> Thanks,
> Mike
>