You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jungtaek Lim (JIRA)" <ji...@apache.org> on 2018/05/31 14:37:00 UTC

[jira] [Created] (SPARK-24441) Expose total size of states in HDFSBackedStateStoreProvider

Jungtaek Lim created SPARK-24441:
------------------------------------

             Summary: Expose total size of states in HDFSBackedStateStoreProvider
                 Key: SPARK-24441
                 URL: https://issues.apache.org/jira/browse/SPARK-24441
             Project: Spark
          Issue Type: Improvement
          Components: Structured Streaming
    Affects Versions: 2.3.0
            Reporter: Jungtaek Lim


While Spark exposes state metrics for single state, Spark still doesn't expose overall memory usage of state (loadedMaps) in HDFSBackedStateStoreProvider. 

Since HDFSBackedStateStoreProvider caches multiple versions of entire state in hashmap, this can occupy much memory than single version of state. Based on the default value of minVersionsToRetain, the size of cache map can grow more than 100 times of the size of single state. It would be better to expose it as well so that end users can determine actual memory usage for state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org