You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Steve Niemitz (Jira)" <ji...@apache.org> on 2020/05/27 02:06:00 UTC

[jira] [Assigned] (BEAM-7568) Java dataflow harness re-encodes value state cells even if they haven't changed

     [ https://issues.apache.org/jira/browse/BEAM-7568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steve Niemitz reassigned BEAM-7568:
-----------------------------------

    Assignee: Steve Niemitz

> Java dataflow harness re-encodes value state cells even if they haven't changed
> -------------------------------------------------------------------------------
>
>                 Key: BEAM-7568
>                 URL: https://issues.apache.org/jira/browse/BEAM-7568
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-dataflow
>    Affects Versions: 2.13.0
>            Reporter: Steve Niemitz
>            Assignee: Steve Niemitz
>            Priority: P2
>
> The java dataflow worker seems to re-encode ValueState cells after every work item, even they weren't modified.
> You can see here [https://github.com/apache/beam/blob/a71bfda77df36aa1531f01533c372233cfba0dd9/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateInternals.java#L413] that the value is always encoded (and used to weight the cache entry) even if it won't be persisted back to windmill. 
> This can have some large performance implications if they values being stored are expensive/large to encode, and infrequently modified.  Ideally, the weight would be also cached, and the value would only need to be modified if it was changed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)