You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@beam.apache.org by Amit Sela <am...@gmail.com> on 2016/12/05 07:48:41 UTC

Increase stream parallelism after reading from UnboundedSource

Hi all,

I have a general question about how stream-processing frameworks/engines
usually behave in the following scenario:

Say I have a Pipeline that consumes from 1 Kafka partition, so that my
initial (optimal) parallelism is 1 as well.

For any downstream computation, is it common for stream processors to
"fan-out/parallelise" the stream by shuffling the data into more
streams/partitions/bundles ?

Thanks,
Amit

Re: Increase stream parallelism after reading from UnboundedSource

Posted by Aljoscha Krettek <al...@apache.org>.

Hi,
I can only speak for Flink, there you usually fan-out/parallelise the
stream after a non-parallel source.

Cheers,
Aljoscha

On Mon, 5 Dec 2016 at 15:48 Amit Sela <am...@gmail.com> wrote:

> Hi all,
>
> I have a general question about how stream-processing frameworks/engines
> usually behave in the following scenario:
>
> Say I have a Pipeline that consumes from 1 Kafka partition, so that my
> initial (optimal) parallelism is 1 as well.
>
> For any downstream computation, is it common for stream processors to
> "fan-out/parallelise" the stream by shuffling the data into more
> streams/partitions/bundles ?
>
> Thanks,
> Amit
>

Re: Increase stream parallelism after reading from UnboundedSource

Posted by Amit Sela <am...@gmail.com>.

I think it is common in batch (and micro-batch for streaming) because at
any given time you're computing a "chunk" (pick your naming.. we have lot's
of them ;-) ) and slicing-up this chunk to distribute across more cpus if
available is clearly better, but I was wondering about "event-at-a-time"
processors and everything in-between - such as bundles that may be of size
1, but might contain more elements.

On Tue, Dec 6, 2016 at 10:18 PM Raghu Angadi <ra...@google.com.invalid>
wrote:

> On Sun, Dec 4, 2016 at 11:48 PM, Amit Sela <am...@gmail.com> wrote:
>
> > For any downstream computation, is it common for stream processors to
> > "fan-out/parallelise" the stream by shuffling the data into more
> > streams/partitions/bundles ?
> >
>
> I think so. It is pretty common in batch processing too.
>

Re: Increase stream parallelism after reading from UnboundedSource

Posted by Raghu Angadi <ra...@google.com.INVALID>.

On Sun, Dec 4, 2016 at 11:48 PM, Amit Sela <am...@gmail.com> wrote:

> For any downstream computation, is it common for stream processors to
> "fan-out/parallelise" the stream by shuffling the data into more
> streams/partitions/bundles ?
>

I think so. It is pretty common in batch processing too.