You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Evgeny Kincharov <Ev...@epam.com> on 2017/02/27 13:04:21 UTC

Running streaming job on every node of cluster

Hi,

I have the simplest streaming job, and I want to distribute my job on every node of my Flink cluster.

Job is simple:

source (SocketTextStream) -> map -> sink (AsyncHtttpSink).

When I increase parallelism of my job when deploying or directly in code, no effect because source is can't work in parallel.
Now I reduce "Tasks Slots" to 1 on ever nodes and deploy my job as many times as nodes in the cluster.
It works when I have only one job. If I want deploy another in parallel there is no free slots.
I hope more convenient way to do that is exists. Thanks.

BR,
Evgeny

Re: Running streaming job on every node of cluster

Posted by Nico Kruber <ni...@data-artisans.com>.

Hi Evgeny,
I tried to reproduce your example with the following code, having another 
console listening with "nc -l 12345"

env.setParallelism(2);
env.addSource(new SocketTextStreamFunction("localhost", 12345, " ", 3))
		.map(new MapFunction<String, String>() {
			@Override
			public String map(final String s) throws Exception { return s; }
		})
		.addSink(new DiscardingSink<String>());

This way, I do get a source with parallelism 1 and map & sink with parallelism 
2 and the whole program accompanying 2 slots as expected. You can check in the 
web interface of your cluster how many slots are taken after executing one 
instance of your program.

How do you set your parallelism?


Nico

On Monday, 27 February 2017 14:04:21 CET Evgeny Kincharov wrote:
> Hi,
> 
> I have the simplest streaming job, and I want to distribute my job on every
> node of my Flink cluster.
> 
> Job is simple:
> 
> source (SocketTextStream) -> map -> sink (AsyncHtttpSink).
> 
> When I increase parallelism of my job when deploying or directly in code, no
> effect because source is can't work in parallel. Now I reduce "Tasks Slots"
> to 1 on ever nodes and deploy my job as many times as nodes in the cluster.
> It works when I have only one job. If I want deploy another in parallel
> there is no free slots. I hope more convenient way to do that is exists.
> Thanks.
> 
> BR,
> Evgeny