You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Ismaël Mejía (Jira)" <ji...@apache.org> on 2020/05/18 14:38:00 UTC

[jira] [Updated] (BEAM-9748) Refactor Reparallelize as an alternative Reshuffle implementation

     [ https://issues.apache.org/jira/browse/BEAM-9748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ismaël Mejía updated BEAM-9748:
-------------------------------
    Summary: Refactor Reparallelize as an alternative Reshuffle implementation  (was: Add Reshuffle.ForSequentiallyGeneratedInput transform)

> Refactor Reparallelize as an alternative Reshuffle implementation
> -----------------------------------------------------------------
>
>                 Key: BEAM-9748
>                 URL: https://issues.apache.org/jira/browse/BEAM-9748
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-core
>            Reporter: Ismaël Mejía
>            Assignee: Ismaël Mejía
>            Priority: P3
>          Time Spent: 3h
>  Remaining Estimate: 0h
>
> Some DoFn based IOs like JdbcIO and RedisIO rely on a different approach to Reparallelize outputs using a combination of a an empty PCollectionView to force materialization and Reshuffle.viaRandomkey to reparallelize a PCollection. This issue extracts this transform and expose it as part of the Reshuffle to avoid repeating the code for transforms (notably IOs) that produce lots of sequentially generated data where and benefit of this alternative approach to perform better reparallelization of its output.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)