You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Jungtaek Lim (JIRA)" <ji...@apache.org> on 2018/09/21 05:41:00 UTC

[jira] [Updated] (SPARK-24441) Expose total estimated size of states in HDFSBackedStateStoreProvider

     [ https://issues.apache.org/jira/browse/SPARK-24441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jungtaek Lim updated SPARK-24441:
---------------------------------
    Fix Version/s:     (was: 3.0.0)
                   2.4.0

> Expose total estimated size of states in HDFSBackedStateStoreProvider
> ---------------------------------------------------------------------
>
>                 Key: SPARK-24441
>                 URL: https://issues.apache.org/jira/browse/SPARK-24441
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 2.3.0
>            Reporter: Jungtaek Lim
>            Assignee: Jungtaek Lim
>            Priority: Major
>             Fix For: 2.4.0
>
>
> While Spark exposes state metrics for single state, Spark still doesn't expose overall memory usage of state (loadedMaps) in HDFSBackedStateStoreProvider. 
> The rationalize of the patch is that state backed by HDFSBackedStateStoreProvider will consume more memory than the number what we can get from query status due to caching multiple versions of states. The memory footprint to be much larger than query status reports in situations where the state store is getting a lot of updates: while shallow-copying map incurs additional small memory usages due to the size of map entities and references, but row objects will still be shared across the versions. If there're lots of updates between batches, less row objects will be shared and more row objects will exist in memory consuming much memory then what we expect.
> It would be better to expose it as well so that end users can determine actual memory usage for state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org