You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2017/07/04 17:51:00 UTC
[jira] [Commented] (BEAM-2500) Add support for S3 as a Apache Beam
FileSystem
[ https://issues.apache.org/jira/browse/BEAM-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16073967#comment-16073967 ]
Steve Loughran commented on BEAM-2500:
--------------------------------------
It's not clear that the S3 clients from EMR or Apache (S3A) work with Beam, not until you've got the tests. Certainly there's a report on [StackOverflow|https://stackoverflow.com/questions/44792884/apache-beam-unable-to-read-text-file-from-s3-using-hadoop-file-system-sdk] which implies that beam depends on a read(ByteBuffer) operation which is not implemented by S3A (nor indeed, Azure WASB).
> Add support for S3 as a Apache Beam FileSystem
> ----------------------------------------------
>
> Key: BEAM-2500
> URL: https://issues.apache.org/jira/browse/BEAM-2500
> Project: Beam
> Issue Type: Improvement
> Components: sdk-java-extensions
> Reporter: Luke Cwik
> Priority: Minor
>
> Note that this is for providing direct integration with S3 as an Apache Beam FileSystem.
> There is already support for using the Hadoop S3 connector by depending on the Hadoop File System module[1], configuring HadoopFileSystemOptions[2] with a S3 configuration[3].
> 1: https://github.com/apache/beam/tree/master/sdks/java/io/hadoop-file-system
> 2: https://github.com/apache/beam/blob/master/sdks/java/io/hadoop-file-system/src/main/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystemOptions.java#L53
> 3: https://wiki.apache.org/hadoop/AmazonS3
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)