You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@orc.apache.org by "Ismaël Mejía (JIRA)" <ji...@apache.org> on 2019/08/02 14:49:04 UTC

[jira] [Updated] (ORC-508) Add a reader/writer that does not depend on Hadoop FileSystem

     [ https://issues.apache.org/jira/browse/ORC-508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ismaël Mejía updated ORC-508:
-----------------------------
    Description: It seems that the default implementation classes of Orc today depend on Hadoop FS objects to write. This is not ideal for APIs that do not rely on Hadoop. For some context I was taking a look at adding support for Apache Beam, but Beam's API supports multiple filesystems with a more generic abstraction that relies on Java's Channels and Streams APIs and delegate directly to Distributed FS e.g. Google Cloud Storage, Amazon S3, etc. It would be really nice to have such support in the core implementation and to maybe split the Hadoop dependencies implementation into its own module in the future.  (was: It seems that the default implementation classes of Orc today depend on Hadoop FS objects to write. This is not ideal for APIs that do not rely on Hadoop. For some context I was taking a look at adding support for Apache Beam, but Beam's API supports multiple filesystems with a more generic abstraction that relies on Java's Channels and Streams APIs. That delegate directly to Distributed FS e.g. Google Cloud Storage, Amazon S3, etc. It would be really nice to have such support in the core implementation and to maybe split the hadoop depending implementation into its own module in the future.)

> Add a reader/writer that does not depend on Hadoop FileSystem
> -------------------------------------------------------------
>
>                 Key: ORC-508
>                 URL: https://issues.apache.org/jira/browse/ORC-508
>             Project: ORC
>          Issue Type: Improvement
>          Components: Java
>            Reporter: Ismaël Mejía
>            Priority: Major
>
> It seems that the default implementation classes of Orc today depend on Hadoop FS objects to write. This is not ideal for APIs that do not rely on Hadoop. For some context I was taking a look at adding support for Apache Beam, but Beam's API supports multiple filesystems with a more generic abstraction that relies on Java's Channels and Streams APIs and delegate directly to Distributed FS e.g. Google Cloud Storage, Amazon S3, etc. It would be really nice to have such support in the core implementation and to maybe split the Hadoop dependencies implementation into its own module in the future.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)