You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "Owen O'Malley (JIRA)" <ji...@apache.org> on 2019/05/31 22:00:00 UTC

[jira] [Commented] (ORC-508) Add a reader/writer that does not depend on Hadoop FileSystem

    [ https://issues.apache.org/jira/browse/ORC-508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16853448#comment-16853448 ] 

Owen O'Malley commented on ORC-508:
-----------------------------------

The main problem is going to be that a couple Hadoop classes are in the API. We can't remove them without breaking compatibility. I'd suggest making a new module (orc-hadoop-proxy?) that contains a few classes that satisfy the required contract.

I assume that you only care about core and not mapreduce or tools.

Classes that I know about:
* Configuration
* FileSystem
* Path
* VersionInfo

You would then be able to add the module into the classpath instead of Hadoop and have the rest of the ORC library work as intended.

> Add a reader/writer that does not depend on Hadoop FileSystem
> -------------------------------------------------------------
>
>                 Key: ORC-508
>                 URL: https://issues.apache.org/jira/browse/ORC-508
>             Project: ORC
>          Issue Type: Improvement
>          Components: Java
>            Reporter: Ismaël Mejía
>            Priority: Major
>
> It seems that the default implementation classes of Orc today depend on Hadoop FS objects to write. This is not ideal for APIs that do not rely on Hadoop. For some context I was taking a look at adding support for Apache Beam, but Beam's API supports multiple filesystems with a more generic abstraction that relies on Java's Channels and Streams APIs. That delegate directly to Distributed FS e.g. Google Cloud Storage, Amazon S3, etc. It would be really nice to have such support in the core implementation and to maybe split the hadoop depending implementation into its own module in the future.
>  
>  
> After a look at some parts of the `orc-core`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)