You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Kevin (Jira)" <ji...@apache.org> on 2022/01/13 02:39:00 UTC

[jira] [Commented] (BEAM-1438) The default behavior for the Write transform doesn't work well with the Dataflow streaming runner

    [ https://issues.apache.org/jira/browse/BEAM-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17475064#comment-17475064 ] 

Kevin commented on BEAM-1438:
-----------------------------

Hello [~kenn] , I didn't see any open pull requests, did you mean either [https://github.com/apache/beam/pull/11850] or [https://github.com/apache/beam/pull/1952] (both are closed) ? I didn't see any updates on those pull requests after year 2020. We are facing this issue around Sep 2021 by running the dataflow runner on GCP. Is there any fixes are merged these months? Thanks.

> The default behavior for the Write transform doesn't work well with the Dataflow streaming runner
> -------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-1438
>                 URL: https://issues.apache.org/jira/browse/BEAM-1438
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow
>            Reporter: Reuven Lax
>            Priority: P3
>             Fix For: 2.5.0
>
>          Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> If a Write specifies 0 output shards, that implies the runner should pick an appropriate sharding. The default behavior is to write one shard per input bundle. This works well with the Dataflow batch runner, but not with the streaming runner which produces large numbers of small bundles.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)