You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@beam.apache.org by "Yichi Zhang (Jira)" <ji...@apache.org> on 2022/05/09 18:58:00 UTC

[jira] [Updated] (BEAM-14429) SyntheticUnboundedSource(with SDF) produce duplicate records when split with DEFAULT_DESIRED_NUM_SPLITS

     [ https://issues.apache.org/jira/browse/BEAM-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yichi Zhang updated BEAM-14429:
-------------------------------
    Summary: SyntheticUnboundedSource(with SDF) produce duplicate records when split with DEFAULT_DESIRED_NUM_SPLITS  (was: SyntheticUnboundedSource(with SDF) produce wrong number of records when initial split is larger than 1)

> SyntheticUnboundedSource(with SDF) produce duplicate records when split with DEFAULT_DESIRED_NUM_SPLITS
> -------------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-14429
>                 URL: https://issues.apache.org/jira/browse/BEAM-14429
>             Project: Beam
>          Issue Type: Bug
>          Components: io-common
>            Reporter: Yichi Zhang
>            Priority: P2
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> With the default 20 split, the num records produced by Read.from(SyntheticUnboundedSource) is always larger than the numRecords specified. the more splits the more actual number records produced is off. And the Read step tends to take longer time with more splits.
>  
> The issue is manifested with java LoadTests on dataflow runner v2.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)