You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by Kavinder Dhaliwal <ka...@gmail.com> on 2016/06/23 23:59:57 UTC

Example of Reading a ORC file

Hi,

I am new to the ORC library and am looking for an example of how to read
ORC files directly through the Java API. Specifically, how to project
columns through the RecordReader. I have taken a look at the example at
https://orc.apache.org/docs/core-java.html but don't know how to actually
extract a single row from the inner loop. The Hive ORC RecordReader
interface has a .next() method which I don't see available in the
org.apache.orc interface.

I appreciate the help and apologize for my ignorance

Kavinder

Re: Example of Reading a ORC file

Posted by Prasanth J <j....@gmail.com>.
Hi

Orc has moved to use vectorized readers completely. Thats the reason you are
are not seeing .next() interface. The vector reader interface is .nextBatch(batch).
Orc no longer returns records instead it returns column batches 
(there is a open jira for cleaning up the interfaces).

Here is a link to a simple example for reading data out of orc file.
https://github.com/apache/orc/blob/master/java/core/src/test/org/apache/orc/TestNewIntegerEncoding.java#L109

There are more examples in the junit tests
https://github.com/apache/orc/tree/master/java/core/src/test/org/apache/orc

Hope this helps.

Thanks
Prasanth

> On Jun 23, 2016, at 4:59 PM, Kavinder Dhaliwal <ka...@gmail.com> wrote:
> 
> Hi,
> 
> I am new to the ORC library and am looking for an example of how to read
> ORC files directly through the Java API. Specifically, how to project
> columns through the RecordReader. I have taken a look at the example at
> https://orc.apache.org/docs/core-java.html but don't know how to actually
> extract a single row from the inner loop. The Hive ORC RecordReader
> interface has a .next() method which I don't see available in the
> org.apache.orc interface.
> 
> I appreciate the help and apologize for my ignorance
> 
> Kavinder


Re: Example of Reading a ORC file

Posted by Kavinder Dhaliwal <ka...@gmail.com>.
Thanks a lot Owen and Prasanth. That's exactly what I was looking for.

Re: Example of Reading a ORC file

Posted by Owen O'Malley <om...@apache.org>.
To do column projection, you need to specify an include array in the
options to Reader.rows. It looks like this:

Reader reader = OrcFile.createReader(new Path(filename), options);
TypeDescription schema = reader.getSchema();
boolean[] include = new boolean[schema.getMaximumId() + 1];

// select only the first column to read
TypeDescription col0 = schema.getChildren().get(0);
for(int c=col0.getId(); c <= col0.getMaximumId(); ++c) {
  include[c] = true;
}
RecordReader rows = reader.rows(new Reader.Options().include(include));

.. Owen

On Thu, Jun 23, 2016 at 5:59 PM, Kavinder Dhaliwal <ka...@gmail.com>
wrote:

> Hi,
>
> I am new to the ORC library and am looking for an example of how to read
> ORC files directly through the Java API. Specifically, how to project
> columns through the RecordReader. I have taken a look at the example at
> https://orc.apache.org/docs/core-java.html but don't know how to actually
> extract a single row from the inner loop. The Hive ORC RecordReader
> interface has a .next() method which I don't see available in the
> org.apache.orc interface.
>
> I appreciate the help and apologize for my ignorance
>
> Kavinder
>