You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Beam JIRA Bot (Jira)" <ji...@apache.org> on 2021/07/10 17:21:01 UTC

[jira] [Commented] (BEAM-11629) Optimize the cache storage for InteractiveRunner

    [ https://issues.apache.org/jira/browse/BEAM-11629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378504#comment-17378504 ] 

Beam JIRA Bot commented on BEAM-11629:
--------------------------------------

This issue is assigned but has not received an update in 30 days so it has been labeled "stale-assigned". If you are still working on the issue, please give an update and remove the label. If you are no longer working on the issue, please unassign so someone else may work on it. In 7 days the issue will be automatically unassigned.

> Optimize the cache storage for InteractiveRunner
> ------------------------------------------------
>
>                 Key: BEAM-11629
>                 URL: https://issues.apache.org/jira/browse/BEAM-11629
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-py-interactive
>            Reporter: Dmytro Kozhevin
>            Assignee: Dmytro Kozhevin
>            Priority: P2
>              Labels: stale-assigned
>          Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Currently InteractiveRunner wraps every record of the cached PCollection into WindowedValue. There is 2 problems about this:
> 1) The windowing information is unnecessary for the batch-mode runs (everything is in the same global window).
> 2) Since the cache is stored as text, we pickle the WindowedValue, which adds ~500 bytes of data to every record (e.g. a cache of just 1000000 integers would take ~500MB instead of ~4MB).
> These issues significantly slow down the interactive runs for data with lots of small rows.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)