You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by "Alam, Zeeshan" <Ze...@fmr.com> on 2016/08/19 13:50:00 UTC

Flink Cluster setup

Hi All,

I have set up a flink standalone cluster, with one master and two slave , all RedHat-7 machines. In the master Dashboard  http://flink-master:8081/ I can see 2 Task Manager and 8 task slot as I have set taskmanager.numberOfTaskSlots: 4 in flink-conf.yaml in all of the slaves.

Now when I first ran my program with no parallelism mention, I got exception :: java.io.IOException: Could not connect to BlobServer at address /master-node-ip:49313. So I unblocked port 49313 from the firewall and then my program ran successfully using a single task slot. I have couple of questions regarding this:


1.       How does flink uses port 49313, Is this port number arbitrary? How would I know which port to unblock before running my program?



2.       I wanted my job to utilize all my task slots, so I ran my program using ./bin/flink run -p 8 myjars/flinkstream-flinkcluster.jar. I again got the exception

java.io.IOException: Connecting the channel failed: Connecting to remote task manager + 'slave1-url.com/slave1-ip:45835' has failed. This might indicate that the remote task manager has been lost. And

java.io.IOException: Connecting the channel failed: Connecting to remote task manager + 'slave2-url.com/slave2-ip:45086' has failed. This might indicate that the remote task manager has been lost.

Again after unblocking these ports on the respective machines my program ran successfully utilizing all the 8 task slots.



What I want to know is how Flink is behaving now as all these 8 tasks are reading from the same Kafka topic. Will each task gets the same data from the Kafka topic or each task will be receiving separate data from others? What I want is to distribute the events from the same Kafka topic evenly to all the available task slots. Is this a proper way to do so?


Thanks & Regards
Zeeshan Alam



Re: Flink Cluster setup

Posted by Robert Metzger <rm...@apache.org>.
Hi,

Flink allocates the blob server at an ephemeral port, so it'll change every
time you start Flink.
However, the "blob.server.port" configuration [1] allows you to use a
predefined port, or even a port range.

If your Kafka topic has only one partition, only one of the 8 tasks will
read the data. Flink will not read the same data multiple times.
If the number of partitions is higher than the number of tasks, each task
will read multiple partitions.

If your data is evenly distributed among the partitions, you don't nede to
worry about that. As soon as you do a keyBy() operation, the data will be
shuffeled over the network, according to the key.
If you have unevenly distributed data, you can add the
dataStream.rebalance() call in the API.

Let me know if you have more questions.


Regards,
Robert



[1] https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html

On Fri, Aug 19, 2016 at 3:50 PM, Alam, Zeeshan <Ze...@fmr.com> wrote:

> Hi All,
>
>
>
> I have set up a flink standalone cluster, with one master and two slave ,
> all RedHat-7 machines. In the master Dashboard  http://flink-master:8081/
> I can see 2 Task Manager and 8 task slot as I have set
> taskmanager.numberOfTaskSlots: 4 in flink-conf.yaml in all of the slaves.
>
>
>
> Now when I first ran my program with no parallelism mention, I got
> exception :: java.io.IOException: Could not connect to BlobServer at
> address /master-node-ip:49313. So I unblocked port 49313 from the firewall
> and then my program ran successfully using a single task slot. I have
> couple of questions regarding this:
>
>
>
> 1.       How does flink uses port 49313, Is this port number arbitrary?
> How would I know which port to unblock before running my program?
>
>
>
> 2.       I wanted my job to utilize all my task slots, so I ran my
> program using ./bin/flink run -p 8 myjars/flinkstream-flinkcluster.jar. I
> again got the exception
>
> java.io.IOException: Connecting the channel failed: Connecting to remote
> task manager + 'slave1-url.com/slave1-ip:45835' has failed. This might
> indicate that the remote task manager has been lost. And
>
> java.io.IOException: Connecting the channel failed: Connecting to remote
> task manager + 'slave2-url.com/slave2-ip:45086' has failed. This might
> indicate that the remote task manager has been lost.
>
> Again after unblocking these ports on the respective machines my program
> ran successfully utilizing all the 8 task slots.
>
>
>
> What I want to know is how Flink is behaving now as all these 8 tasks are
> reading from the same Kafka topic. Will each task gets the same data from
> the Kafka topic or each task will be receiving separate data from others?
> What I want is to distribute the events from the same Kafka topic evenly to
> all the available task slots. Is this a proper way to do so?
>
>
>
>
>
> *Thanks & Regards*
>
> *Zeeshan Alam *
>
>
>
>
>