You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Andrew Kowpak <an...@ssimwave.com> on 2018/10/04 19:58:04 UTC

Kafka Per-Partition Watermarks

Hi all,

I apologize if this has been discussed to death in the past, but, I'm
finding myself very confused, and google is not proving helpful.

Based on the documentation, I understand that if there are idle partitions
in a kafka stream, watermarks will not advance for the entire application.
I was hoping that by setting parallelism = the number of partitions that I
would be able to work around the issue, but, this didn't work.  I'm totally
willing to accept the fact that if I have idle partitions, my windowed
partitions won't work, however, I would really like to understand why
setting the parallelism didn't work.  If someone can explain, or perhaps
point me to documentation or code, it would be very much appreciated.

Thanks.

-- 
*Andrew Kowpak P.Eng* *Sr. Software Engineer*
(519)  489 2688 | SSIMWAVE Inc.
402-140 Columbia Street West, Waterloo ON

Re: Kafka Per-Partition Watermarks

Posted by Taher Koitawala <ta...@gslab.com>.
Hey Andrew,
          We face the same problem in our application where some of the
kafka partitions are empty. In this case what we do is use the rebalance()
method on the source streams.

Ex: DataStream<T> srcStream= Env.addSource(new
FlinkKafkaConsumer09<>(topic,Ser,props));
SrcStream.rebalance();

Since rebalance sends records all tasks in a round robin fashion we do not
have to worry about empty partitions and also about mapping our parallelism
to kafka partitions. After rebalance you can have your parallelism higher
than the number of kafka partition and windowing will work just fine.

Thanks,
Taher Koitawala

On Fri 5 Oct, 2018, 1:28 AM Andrew Kowpak, <an...@ssimwave.com>
wrote:

> Hi all,
>
> I apologize if this has been discussed to death in the past, but, I'm
> finding myself very confused, and google is not proving helpful.
>
> Based on the documentation, I understand that if there are idle partitions
> in a kafka stream, watermarks will not advance for the entire application.
> I was hoping that by setting parallelism = the number of partitions that I
> would be able to work around the issue, but, this didn't work.  I'm totally
> willing to accept the fact that if I have idle partitions, my windowed
> partitions won't work, however, I would really like to understand why
> setting the parallelism didn't work.  If someone can explain, or perhaps
> point me to documentation or code, it would be very much appreciated.
>
> Thanks.
>
> --
> *Andrew Kowpak P.Eng* *Sr. Software Engineer*
> (519)  489 2688 | SSIMWAVE Inc.
> 402-140 Columbia Street West, Waterloo ON
>

Re: Kafka Per-Partition Watermarks

Posted by Andrew Kowpak <an...@ssimwave.com>.
Yes, my job does do a keyBy.  It never occurred to me that keyBy would
distributed data from different partitions to different tasks, but, now
that you mention it, it actually makes perfect sense.  Thanks you for the
help.

On Thu, Oct 4, 2018 at 5:11 PM Elias Levy <fe...@gmail.com>
wrote:

> Does your job perform a keyBy or broadcast that would result in data from
> different partitions being distributed among tasks?  If so, then that would
> be the cause.
>
> On Thu, Oct 4, 2018 at 12:58 PM Andrew Kowpak <an...@ssimwave.com>
> wrote:
>
>> Hi all,
>>
>> I apologize if this has been discussed to death in the past, but, I'm
>> finding myself very confused, and google is not proving helpful.
>>
>> Based on the documentation, I understand that if there are idle
>> partitions in a kafka stream, watermarks will not advance for the entire
>> application.  I was hoping that by setting parallelism = the number of
>> partitions that I would be able to work around the issue, but, this didn't
>> work.  I'm totally willing to accept the fact that if I have idle
>> partitions, my windowed partitions won't work, however, I would really like
>> to understand why setting the parallelism didn't work.  If someone can
>> explain, or perhaps point me to documentation or code, it would be very
>> much appreciated.
>>
>> Thanks.
>>
>> --
>> *Andrew Kowpak P.Eng* *Sr. Software Engineer*
>> (519)  489 2688 | SSIMWAVE Inc.
>> 402-140 Columbia Street West, Waterloo ON
>>
>

-- 
*Andrew Kowpak P.Eng* *Sr. Software Engineer*
(519)  489 2688 | SSIMWAVE Inc.
402-140 Columbia Street West, Waterloo ON

Re: Kafka Per-Partition Watermarks

Posted by Elias Levy <fe...@gmail.com>.
Does your job perform a keyBy or broadcast that would result in data from
different partitions being distributed among tasks?  If so, then that would
be the cause.

On Thu, Oct 4, 2018 at 12:58 PM Andrew Kowpak <an...@ssimwave.com>
wrote:

> Hi all,
>
> I apologize if this has been discussed to death in the past, but, I'm
> finding myself very confused, and google is not proving helpful.
>
> Based on the documentation, I understand that if there are idle partitions
> in a kafka stream, watermarks will not advance for the entire application.
> I was hoping that by setting parallelism = the number of partitions that I
> would be able to work around the issue, but, this didn't work.  I'm totally
> willing to accept the fact that if I have idle partitions, my windowed
> partitions won't work, however, I would really like to understand why
> setting the parallelism didn't work.  If someone can explain, or perhaps
> point me to documentation or code, it would be very much appreciated.
>
> Thanks.
>
> --
> *Andrew Kowpak P.Eng* *Sr. Software Engineer*
> (519)  489 2688 | SSIMWAVE Inc.
> 402-140 Columbia Street West, Waterloo ON
>