You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Jean-Baptiste Onofré (JIRA)" <ji...@apache.org> on 2016/10/06 12:27:22 UTC
[jira] [Commented] (BEAM-59) IOChannelFactory rethinking/redesign
[ https://issues.apache.org/jira/browse/BEAM-59?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15551806#comment-15551806 ]
Jean-Baptiste Onofré commented on BEAM-59:
------------------------------------------
Hi guys,
HDFS (and other filesystems like S3, ...) is a recurrent question. If we have the HdfsIO, I think it would make sense to remove it and to refactore FileBasedSource/FileBasedSink to support those new filesystems.
[~dhalperi@google.com] maybe we can work together on a proposal we can send on the mailing list ?
Thanks !
> IOChannelFactory rethinking/redesign
> ------------------------------------
>
> Key: BEAM-59
> URL: https://issues.apache.org/jira/browse/BEAM-59
> Project: Beam
> Issue Type: New Feature
> Components: sdk-java-core, sdk-java-gcp
> Reporter: Daniel Halperin
>
> Right now, FileBasedSource and FileBasedSink communication is mediated by IOChannelFactory. There are a number of issues:
> * Global configuration -- e.g., all 'gs://' URIs use the same credentials. This should be per-source/per-sink/etc.
> * Supported APIs -- currently IOChannelFactory is in the "non-public API" util package and subject to change. We need users to be able to add new backends ('s3://', 'hdfs://', etc.) directly, without fear that they will be broken.
> * Per-backend features: e.g., creating buckets in GCS/s3, setting expiration time, etc.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)