You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Kenneth Knowles (Jira)" <ji...@apache.org> on 2022/01/14 20:01:00 UTC
[jira] [Updated] (BEAM-12664) Improve textio: Write sharding
[ https://issues.apache.org/jira/browse/BEAM-12664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kenneth Knowles updated BEAM-12664:
-----------------------------------
Status: Open (was: Triage Needed)
> Improve textio: Write sharding
> ------------------------------
>
> Key: BEAM-12664
> URL: https://issues.apache.org/jira/browse/BEAM-12664
> Project: Beam
> Issue Type: Improvement
> Components: sdk-go
> Reporter: Robert Burke
> Priority: P3
>
> The other SDKs have implementations that shard files on write. So should the Go SDK. The feature is mentioned in the Beam Programming Guide:
> [https://beam.apache.org/documentation/programming-guide/#file-based-writing-multiple-files]
> It would be expedient to provide an Xlang TextIO implementation for the Go SDK compared to replicating the implementation in Go, at cost of some execution time performance.
> Ideally it would be similarly generalized to simplify writing File Sinks. File sinks are necessarily complex to provide a robust and reliable implementation
> Current Go implementation.
> [https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/io/textio/textio.go#L119]
> Python FileIO implementation:
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/filebasedsink.py]
> (Note iobase.Sink is deprecated, but is still suitable for file io.)
> Java TextIO & FileIO:
> [https://github.com/apache/beam/blob/f8fbbfa309ac88848057de694d4cc1cba3eaa92a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java#L1259]
> [https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java]
>
> KafkaIO (example of writing Go SDK side wrapper for a xlang Java IO):
> [https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/io/xlang/kafkaio/kafka.go]
>
> General docs on writing sinks: [https://beam.apache.org/documentation/io/developing-io-overview/#sinks]
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)