You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Mohit Anchlia <mo...@gmail.com> on 2017/02/03 01:09:13 UTC

Parallelism and Partitioning

What is the granularity of parallelism in flink? For eg: if I am reading
from Kafka -> map -> Sink and I say parallel 2 does it mean it creates 2
consumer threads and allocates it on 2 separate task managers?

Also, it would be good to understand the difference between parallelism and
partitioning that also could be distributed across task managers.

Re: Parallelism and Partitioning

Posted by Mohit Anchlia <mo...@gmail.com>.
Any information on this would be helpful.

On Thu, Feb 2, 2017 at 5:09 PM, Mohit Anchlia <mo...@gmail.com>
wrote:

> What is the granularity of parallelism in flink? For eg: if I am reading
> from Kafka -> map -> Sink and I say parallel 2 does it mean it creates 2
> consumer threads and allocates it on 2 separate task managers?
>
> Also, it would be good to understand the difference between parallelism
> and partitioning that also could be distributed across task managers.
>

Re: Parallelism and Partitioning

Posted by Ufuk Celebi <uc...@apache.org>.
On Fri, Feb 3, 2017 at 2:09 AM, Mohit Anchlia <mo...@gmail.com> wrote:
> What is the granularity of parallelism in flink? For eg: if I am reading
> from Kafka -> map -> Sink and I say parallel 2 does it mean it creates 2
> consumer threads and allocates it on 2 separate task managers?

Yes, this will instantiate two instances of the Kafka source, the map
operator, and the sink. These parallel sub pipelines will be scheduled
to separate slots (that might happen to on the same TM). See here for
more details: https://ci.apache.org/projects/flink/flink-docs-release-1.3/internals/job_scheduling.html

Partitioning of state happens in key groups, which define a range of
keys. A single subtask is usually responsible for more than a single
key group. The key groups are the units of rescaling your program.
This is configurable via the setMaxParallelism() method on the
environment.