You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Till Rohrmann (JIRA)" <ji...@apache.org> on 2019/02/07 09:19:00 UTC

[jira] [Updated] (FLINK-11517) Inefficient window state access when using RocksDB state backend

     [ https://issues.apache.org/jira/browse/FLINK-11517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Till Rohrmann updated FLINK-11517:
----------------------------------
    Component/s: Local Runtime

> Inefficient window state access when using RocksDB state backend
> ----------------------------------------------------------------
>
>                 Key: FLINK-11517
>                 URL: https://issues.apache.org/jira/browse/FLINK-11517
>             Project: Flink
>          Issue Type: Bug
>          Components: Local Runtime
>            Reporter: Elias Levy
>            Priority: Major
>
> When using an aggregate function on a window with a process function and the RocksDB state backend, state access is inefficient.
> The WindowOperator calls windowState.add to merge the new element using the aggregate function.  The add method of RocksDBAggregatingState will read the state, deserialize the state, call the aggregate function, deserialize the state, and write it out.
> If the trigger decides the window must be fired, as the the windowState.add does not return the state, the WindowOperator must call windowState.get to get it and pass it to the window process function, resulting in another read and deserialization.
> Finally, while the state is not passed in to the trigger, in some cases the trigger may have a need to access the state.  That is our case.  As the state is not passed to the trigger, we must read and deserialize the state one more from within the trigger.
> Thus, state must be read and deserialized three times to process a single element.  If the state is large, this can be quite costly.
>  
> Ideally  windowState.add would return the state, so that the WindowOperator can pass it to the process function without having to read it again.  Additionally, the state would be made available to the trigger to enable more use cases without having to go through the state descriptor again.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)