You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by gerardg <ge...@talaia.io> on 2019/03/07 14:24:04 UTC

Timestamp synchronized message consumption across kafka partitions

I'm wondering if there is a way to avoid consuming too fast from partitions
that not have as much data as the other ones in the same topic by keeping
them more or less synchronized by its ingestion timestamp. Similar to what
kafka streams does:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-353%3A+Improve+Kafka+Streams+Timestamp+Synchronization

We are having an issue where partitions with less data are consumed very
fast which creates a lot of windows that can't be triggered until the
partitions with more data are consumed and the watermark gets advanced. It
seems that this issue should be quite common but we can't seem to find any
standard solution to it. Maybe is just that our partitions are too
unbalanced but still, without having a way to bound the skew between
partition (for example when processing accumulated data) it seems like a
potential source of problems.

Anyone have an idea or suggestion to deal with this issue?

Thanks,

Gerard



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Timestamp synchronized message consumption across kafka partitions

Posted by Gerard Garcia <ge...@talaia.io>.
I'll answer myself. I guess the most viable option for now is to wait for
the work in
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Sharing-state-between-subtasks-td24489.html

On Thu, Mar 7, 2019, 3:24 PM gerardg <ge...@talaia.io> wrote:

> I'm wondering if there is a way to avoid consuming too fast from partitions
> that not have as much data as the other ones in the same topic by keeping
> them more or less synchronized by its ingestion timestamp. Similar to what
> kafka streams does:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-353%3A+Improve+Kafka+Streams+Timestamp+Synchronization
>
> We are having an issue where partitions with less data are consumed very
> fast which creates a lot of windows that can't be triggered until the
> partitions with more data are consumed and the watermark gets advanced. It
> seems that this issue should be quite common but we can't seem to find any
> standard solution to it. Maybe is just that our partitions are too
> unbalanced but still, without having a way to bound the skew between
> partition (for example when processing accumulated data) it seems like a
> potential source of problems.
>
> Anyone have an idea or suggestion to deal with this issue?
>
> Thanks,
>
> Gerard
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>