You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Kenneth Knowles (Jira)" <ji...@apache.org> on 2022/01/14 20:01:00 UTC

[jira] [Updated] (BEAM-12664) Improve textio: Write sharding

     [ https://issues.apache.org/jira/browse/BEAM-12664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kenneth Knowles updated BEAM-12664:
-----------------------------------
    Status: Open  (was: Triage Needed)

> Improve textio: Write sharding
> ------------------------------
>
>                 Key: BEAM-12664
>                 URL: https://issues.apache.org/jira/browse/BEAM-12664
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-go
>            Reporter: Robert Burke
>            Priority: P3
>
> The other SDKs have implementations that shard files on write. So should the Go SDK. The feature is mentioned in the Beam Programming Guide:
> [https://beam.apache.org/documentation/programming-guide/#file-based-writing-multiple-files]
> It would be expedient to provide an Xlang TextIO implementation for the Go SDK compared to replicating the implementation in Go, at cost of some execution time performance.
> Ideally it would be similarly generalized to simplify writing File Sinks.  File sinks are necessarily complex to provide a robust and reliable implementation
> Current Go implementation.
> [https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/io/textio/textio.go#L119]
> Python FileIO implementation:
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/filebasedsink.py] 
> (Note iobase.Sink is deprecated, but is still suitable for file io.)
> Java TextIO & FileIO:
> [https://github.com/apache/beam/blob/f8fbbfa309ac88848057de694d4cc1cba3eaa92a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java#L1259] 
> [https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java] 
>  
> KafkaIO (example of writing Go SDK side wrapper for a xlang Java IO):
> [https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/io/xlang/kafkaio/kafka.go] 
>  
> General docs on writing sinks: [https://beam.apache.org/documentation/io/developing-io-overview/#sinks] 
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)