You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "Owen O'Malley (Jira)" <ji...@apache.org> on 2021/02/17 00:16:00 UTC

[jira] [Commented] (ORC-617) Provide facility to read ORC data from FSDataInputStream, not only from Path.

    [ https://issues.apache.org/jira/browse/ORC-617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17285559#comment-17285559 ] 

Owen O'Malley commented on ORC-617:
-----------------------------------

I commented on ORC-618, but I guess it is relevant here too:

I'd recommend using the org.apache.orc.util.StreamWrapperFileSystem. It was created exactly for this purpose.

You might also be interested in ORC-508, which introduces a new API that does not depend on Hadoop classes.


> Provide facility to read ORC data from FSDataInputStream, not only from Path.
> -----------------------------------------------------------------------------
>
>                 Key: ORC-617
>                 URL: https://issues.apache.org/jira/browse/ORC-617
>             Project: ORC
>          Issue Type: Improvement
>            Reporter: Pasha Finkeshteyn
>            Priority: Major
>
> I've met following issue: I need to partially read ORC file, say first 1000 rows.
> Currently I can't use OrcFile class for that because it needs local file. But in reality it should be enough to have FSDataInputStream.
> _*Motivation:*_ I want to read this data not only from local file system or HDFS but also from, say, S3. It has abiity to open read stream at any needed position so it's super-easy to implement adapter for it to FSDataInputStream.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)