You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Sachin Goel <sa...@gmail.com> on 2015/07/17 18:19:29 UTC

Communicating between nodes at runtime

Hi all
Is it possible to send and receive key,value pairs at runtime? I would like
to broadcast values at runtime so they are available on every node. This
somehow seems essential for an implementation of Asynchronous batch
iterations.

What I would like is to have two functions in the RuntimeContext,
broadcast(key,value) and receive(key).
If such a thing doesn't exist now, how should I approach implementing it?

Cheers!
Sachin
-- Sachin Goel
Computer Science, IIT Delhi
m. +91-9871457685

Re: Communicating between nodes at runtime

Posted by Maximilian Michels <mx...@apache.org>.
Hi Sachin,

Do you know about Broadcast Variables? They allow you to transfer a DataSet
to all nodes.

https://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#broadcast-variables

Let us know if that fits your needs.

Cheers,
Max

On Fri, Jul 17, 2015 at 6:19 PM, Sachin Goel <sa...@gmail.com>
wrote:

> Hi all
> Is it possible to send and receive key,value pairs at runtime? I would like
> to broadcast values at runtime so they are available on every node. This
> somehow seems essential for an implementation of Asynchronous batch
> iterations.
>
> What I would like is to have two functions in the RuntimeContext,
> broadcast(key,value) and receive(key).
> If such a thing doesn't exist now, how should I approach implementing it?
>
> Cheers!
> Sachin
> -- Sachin Goel
> Computer Science, IIT Delhi
> m. +91-9871457685
>

Re: Communicating between nodes at runtime

Posted by Sachin Goel <sa...@gmail.com>.
@Stephan, why would there be a problem with the reliability of updates?

Also, what would be the best way to achieve this functionality? Preferably
using the current API.

Cheers!
Sachin

-- Sachin Goel
Computer Science, IIT Delhi
m. +91-9871457685

On Tue, Jul 21, 2015 at 4:17 PM, Stephan Ewen <se...@apache.org> wrote:

> I see the use case for that, but I don't think that this should be realized
> through Flink's network stack. The network stack is designed for continuous
> streams, backpressure, and configurable throughput/latency. I think it is
> not the best solution for asynchronous messaging.
>
> Also, for the ML field, you will want to be able to send "fire and forget"
> messages, as exact and reliable updates are usually necessary for
> stochastic algorithms.
>
> To me, this calls for a different communication fabric.
>
>
> On Mon, Jul 20, 2015 at 5:22 PM, Sachin Goel <sa...@gmail.com>
> wrote:
>
> > @Max, broadcast variables have to be declared before the program is
> > executed. I want to be able to do something whereby I can send data
> inside
> > a map operation to all nodes. This would perhaps have an effect similar
> to
> > the recent discussion on making Accumulators available before Job
> > completion. [I don't know if it has been merged yet.]
> >
> > @Stephan, I was thinking more in the direction of adding a task at every
> > worker which works as an Event handler to receive data from other nodes.
> I
> > got the idea after going through the iterative runtime module which has a
> > separate task for handling synchronization. I followed it up with going
> > through the optimizer routine wherein it is added to some *auxiliary
> > vertices *list. Can you elaborate on that?
> >
> > Further, is it possible to add an event handler to the runtime context
> > itself? I wrote the boilerplate stuff for this but I haven't defined the
> > channels where the event would be written to the network so other workers
> > can also listen to it.  [
> > https://github.com/sachingoel0101/flink/tree/async_iter]. I want to be
> > able
> > to call broadcast and receive as getRuntimeContext.broadcast and
> > getRuntimeContext.receive, which ensures complete data transmission
> across
> > nodes, even at runtime.
> > I'm not sure how to add a separate task which handles event over the
> > network like the IterationSynchronization task [which is the best use I
> > could find of using event handlers].
> >
> > Let me know your thoughts. Or if you can think of a better way to achieve
> > sharing of data which is *only *evaluated at runtime.
> >
> > Cheers!
> > Sachin
> >
> > -- Sachin Goel
> > Computer Science, IIT Delhi
> > m. +91-9871457685
> >
> > On Mon, Jul 20, 2015 at 3:34 PM, Stephan Ewen <se...@apache.org> wrote:
> >
> > > You are probably looking for a parameter server tool.
> > >
> > > How about setting up one of these memory grids to use that? Apache
> > Ignite,
> > > or Apache Geode, or one of those.
> > >
> > > On Fri, Jul 17, 2015 at 6:19 PM, Sachin Goel <sachingoel0101@gmail.com
> >
> > > wrote:
> > >
> > > > Hi all
> > > > Is it possible to send and receive key,value pairs at runtime? I
> would
> > > like
> > > > to broadcast values at runtime so they are available on every node.
> > This
> > > > somehow seems essential for an implementation of Asynchronous batch
> > > > iterations.
> > > >
> > > > What I would like is to have two functions in the RuntimeContext,
> > > > broadcast(key,value) and receive(key).
> > > > If such a thing doesn't exist now, how should I approach implementing
> > it?
> > > >
> > > > Cheers!
> > > > Sachin
> > > > -- Sachin Goel
> > > > Computer Science, IIT Delhi
> > > > m. +91-9871457685
> > > >
> > >
> >
>

Re: Communicating between nodes at runtime

Posted by Stephan Ewen <se...@apache.org>.
I see the use case for that, but I don't think that this should be realized
through Flink's network stack. The network stack is designed for continuous
streams, backpressure, and configurable throughput/latency. I think it is
not the best solution for asynchronous messaging.

Also, for the ML field, you will want to be able to send "fire and forget"
messages, as exact and reliable updates are usually necessary for
stochastic algorithms.

To me, this calls for a different communication fabric.


On Mon, Jul 20, 2015 at 5:22 PM, Sachin Goel <sa...@gmail.com>
wrote:

> @Max, broadcast variables have to be declared before the program is
> executed. I want to be able to do something whereby I can send data inside
> a map operation to all nodes. This would perhaps have an effect similar to
> the recent discussion on making Accumulators available before Job
> completion. [I don't know if it has been merged yet.]
>
> @Stephan, I was thinking more in the direction of adding a task at every
> worker which works as an Event handler to receive data from other nodes. I
> got the idea after going through the iterative runtime module which has a
> separate task for handling synchronization. I followed it up with going
> through the optimizer routine wherein it is added to some *auxiliary
> vertices *list. Can you elaborate on that?
>
> Further, is it possible to add an event handler to the runtime context
> itself? I wrote the boilerplate stuff for this but I haven't defined the
> channels where the event would be written to the network so other workers
> can also listen to it.  [
> https://github.com/sachingoel0101/flink/tree/async_iter]. I want to be
> able
> to call broadcast and receive as getRuntimeContext.broadcast and
> getRuntimeContext.receive, which ensures complete data transmission across
> nodes, even at runtime.
> I'm not sure how to add a separate task which handles event over the
> network like the IterationSynchronization task [which is the best use I
> could find of using event handlers].
>
> Let me know your thoughts. Or if you can think of a better way to achieve
> sharing of data which is *only *evaluated at runtime.
>
> Cheers!
> Sachin
>
> -- Sachin Goel
> Computer Science, IIT Delhi
> m. +91-9871457685
>
> On Mon, Jul 20, 2015 at 3:34 PM, Stephan Ewen <se...@apache.org> wrote:
>
> > You are probably looking for a parameter server tool.
> >
> > How about setting up one of these memory grids to use that? Apache
> Ignite,
> > or Apache Geode, or one of those.
> >
> > On Fri, Jul 17, 2015 at 6:19 PM, Sachin Goel <sa...@gmail.com>
> > wrote:
> >
> > > Hi all
> > > Is it possible to send and receive key,value pairs at runtime? I would
> > like
> > > to broadcast values at runtime so they are available on every node.
> This
> > > somehow seems essential for an implementation of Asynchronous batch
> > > iterations.
> > >
> > > What I would like is to have two functions in the RuntimeContext,
> > > broadcast(key,value) and receive(key).
> > > If such a thing doesn't exist now, how should I approach implementing
> it?
> > >
> > > Cheers!
> > > Sachin
> > > -- Sachin Goel
> > > Computer Science, IIT Delhi
> > > m. +91-9871457685
> > >
> >
>

Re: Communicating between nodes at runtime

Posted by Sachin Goel <sa...@gmail.com>.
@Max, broadcast variables have to be declared before the program is
executed. I want to be able to do something whereby I can send data inside
a map operation to all nodes. This would perhaps have an effect similar to
the recent discussion on making Accumulators available before Job
completion. [I don't know if it has been merged yet.]

@Stephan, I was thinking more in the direction of adding a task at every
worker which works as an Event handler to receive data from other nodes. I
got the idea after going through the iterative runtime module which has a
separate task for handling synchronization. I followed it up with going
through the optimizer routine wherein it is added to some *auxiliary
vertices *list. Can you elaborate on that?

Further, is it possible to add an event handler to the runtime context
itself? I wrote the boilerplate stuff for this but I haven't defined the
channels where the event would be written to the network so other workers
can also listen to it.  [
https://github.com/sachingoel0101/flink/tree/async_iter]. I want to be able
to call broadcast and receive as getRuntimeContext.broadcast and
getRuntimeContext.receive, which ensures complete data transmission across
nodes, even at runtime.
I'm not sure how to add a separate task which handles event over the
network like the IterationSynchronization task [which is the best use I
could find of using event handlers].

Let me know your thoughts. Or if you can think of a better way to achieve
sharing of data which is *only *evaluated at runtime.

Cheers!
Sachin

-- Sachin Goel
Computer Science, IIT Delhi
m. +91-9871457685

On Mon, Jul 20, 2015 at 3:34 PM, Stephan Ewen <se...@apache.org> wrote:

> You are probably looking for a parameter server tool.
>
> How about setting up one of these memory grids to use that? Apache Ignite,
> or Apache Geode, or one of those.
>
> On Fri, Jul 17, 2015 at 6:19 PM, Sachin Goel <sa...@gmail.com>
> wrote:
>
> > Hi all
> > Is it possible to send and receive key,value pairs at runtime? I would
> like
> > to broadcast values at runtime so they are available on every node. This
> > somehow seems essential for an implementation of Asynchronous batch
> > iterations.
> >
> > What I would like is to have two functions in the RuntimeContext,
> > broadcast(key,value) and receive(key).
> > If such a thing doesn't exist now, how should I approach implementing it?
> >
> > Cheers!
> > Sachin
> > -- Sachin Goel
> > Computer Science, IIT Delhi
> > m. +91-9871457685
> >
>

Re: Communicating between nodes at runtime

Posted by Stephan Ewen <se...@apache.org>.
You are probably looking for a parameter server tool.

How about setting up one of these memory grids to use that? Apache Ignite,
or Apache Geode, or one of those.

On Fri, Jul 17, 2015 at 6:19 PM, Sachin Goel <sa...@gmail.com>
wrote:

> Hi all
> Is it possible to send and receive key,value pairs at runtime? I would like
> to broadcast values at runtime so they are available on every node. This
> somehow seems essential for an implementation of Asynchronous batch
> iterations.
>
> What I would like is to have two functions in the RuntimeContext,
> broadcast(key,value) and receive(key).
> If such a thing doesn't exist now, how should I approach implementing it?
>
> Cheers!
> Sachin
> -- Sachin Goel
> Computer Science, IIT Delhi
> m. +91-9871457685
>