You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Abdul Rahman <ab...@gmail.com> on 2022/01/22 06:51:10 UTC

Question about MapState size

Hello,

I have a streaming application that has an operator based on the
KeyedCoProcessFunction. The operator has a MapState object.  I store
some data in this operator with a fixed ttl. I would like to monitor
the size/count of this state over time since its related to some
operational metrics we want to track. Seems like a simple thing to do;
but I havent come up with a way to do so

Given that iterating over the complete map is an expensive operation,
I only plan to do so periodically.  The first issue is that , the
stream is keyed, so any time i do a count of the mapstate, i dont get
the complete size of the state object, but only count pertaining to
the specific key of partition. Is there a way around this ?

Secondly, is there a way to monitor rocksdb usage over time. I can
find managed memory metrics. but this does not include disk space
rocksdb used. is there a way to get this from standard flink metrics;
either task manager or job manager ?

Re: Question about MapState size

Posted by Yun Tang <my...@live.com>.
Hi Abdul,

What does "only count pertaining to the specific key of partition" mean? The counting size is for the map related to a specific selected key or the all the maps in the whole map state?

You can leverage RocksDB's native metrics to monitor the rocksDB usage, such as total-sst-files-size[1] to know the total sst files on disks of each rocksDB.

[1] https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/deployment/config/#state-backend-rocksdb-metrics-total-sst-files-size


Best
Yun Tang
________________________________
From: Abdul Rahman <ab...@gmail.com>
Sent: Saturday, January 22, 2022 14:51
To: user@flink.apache.org <us...@flink.apache.org>
Subject: Question about MapState size

Hello,

I have a streaming application that has an operator based on the
KeyedCoProcessFunction. The operator has a MapState object.  I store
some data in this operator with a fixed ttl. I would like to monitor
the size/count of this state over time since its related to some
operational metrics we want to track. Seems like a simple thing to do;
but I havent come up with a way to do so

Given that iterating over the complete map is an expensive operation,
I only plan to do so periodically.  The first issue is that , the
stream is keyed, so any time i do a count of the mapstate, i dont get
the complete size of the state object, but only count pertaining to
the specific key of partition. Is there a way around this ?

Secondly, is there a way to monitor rocksdb usage over time. I can
find managed memory metrics. but this does not include disk space
rocksdb used. is there a way to get this from standard flink metrics;
either task manager or job manager ?