You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Prasanna kumar <pr...@gmail.com> on 2023/01/06 06:52:16 UTC

Snappy Compression for Checkpoints

Hello Flink Community ,



We are running Jobs in flink version 1.12.7 which reads from Kafka , apply
some rules(stored in broadcast state) and then writes to kafka. This is a
very low latency and high throughput and we have set up at least one
semantics.



Checkpoint Configuration Used

   1. We cannot have many duplicates during the restarts so we have set a
   checkpoint interval of 3s. (We cannot increase it any more since , we have
   10s of 1000s of records processed per sec ) .
   2. Checkpointing target location is AWS S3.
   3. Max Concurrent Checkpoint is 1
   4. Time Between Checkpoints is 500ms

Earlier we had around 10 rule objects stored in broadcast state. Recently
we have enabled 80 rule objects.  Post increase , we are seeing a lot of
checkpoints in progress . (Earlier we had rarely seen this in metrics
dashboard).  The Parallelism of BroadCast Function is around 10 and the
present Checkpoint size is 64kb.



Since we expect this rule objects to increase to 1000 and beyond in a
year's time, we are looking at ways to improve performance in checkpoint.
We cannot use incremental checkpoint since its supported only in RocksDB
and the development arc is little higher. Looking at easier solution first
, we tried using "SnapshotCompression" , but we did not see any difference
in decrease of checkpoint size.



Have few questions on the same

   1. Does SnapshotCompression work in version 1.12.7 ?
   2. If Yes , how much size reduction could we expect if this is enabled
   and at what size does the Compression works . Is there any threshold post
   only which the compression would work ?



Apart from the questions above , you are welcome to suggest any config
changes that can be done for improvements.



Thanks & Regards,

Prasanna

Re: Snappy Compression for Checkpoints

Posted by Martijn Visser <ma...@apache.org>.
Hi Prasanna,

There is no support for compression in operator state. This can be tracked
under https://issues.apache.org/jira/browse/FLINK-30113

Best regards,

Martijn

On Fri, Jan 6, 2023 at 7:53 AM Prasanna kumar <pr...@gmail.com>
wrote:

> Hello Flink Community ,
>
>
>
> We are running Jobs in flink version 1.12.7 which reads from Kafka , apply
> some rules(stored in broadcast state) and then writes to kafka. This is a
> very low latency and high throughput and we have set up at least one
> semantics.
>
>
>
> Checkpoint Configuration Used
>
>    1. We cannot have many duplicates during the restarts so we have set a
>    checkpoint interval of 3s. (We cannot increase it any more since , we have
>    10s of 1000s of records processed per sec ) .
>    2. Checkpointing target location is AWS S3.
>    3. Max Concurrent Checkpoint is 1
>    4. Time Between Checkpoints is 500ms
>
> Earlier we had around 10 rule objects stored in broadcast state. Recently
> we have enabled 80 rule objects.  Post increase , we are seeing a lot of
> checkpoints in progress . (Earlier we had rarely seen this in metrics
> dashboard).  The Parallelism of BroadCast Function is around 10 and the
> present Checkpoint size is 64kb.
>
>
>
> Since we expect this rule objects to increase to 1000 and beyond in a
> year's time, we are looking at ways to improve performance in checkpoint.
> We cannot use incremental checkpoint since its supported only in RocksDB
> and the development arc is little higher. Looking at easier solution first
> , we tried using "SnapshotCompression" , but we did not see any difference
> in decrease of checkpoint size.
>
>
>
> Have few questions on the same
>
>    1. Does SnapshotCompression work in version 1.12.7 ?
>    2. If Yes , how much size reduction could we expect if this is enabled
>    and at what size does the Compression works . Is there any threshold post
>    only which the compression would work ?
>
>
>
> Apart from the questions above , you are welcome to suggest any config
> changes that can be done for improvements.
>
>
>
> Thanks & Regards,
>
> Prasanna
>