You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Keith Chapman <ke...@gmail.com> on 2017/04/05 00:21:20 UTC

[PARQUET-CPP] Does Parquet cpp have a record reader interface?

Hi,

I'm trying to read a parquet file which has a nested schema, i seen that
the java library have a record reader API which helps construct a record.
Does the cpp API have something equivalent? If not what is the
recomendation as to how to read a nested parquet file using the cpp API.

Regards,
Keith.

http://keith-chapman.com

Re: [PARQUET-CPP] Does Parquet cpp have a record reader interface?

Posted by Wes McKinney <we...@gmail.com>.
hi Keith,

That's right -- see
https://github.com/apache/parquet-cpp/blob/master/src/parquet/schema.cc#L577.
You're welcome to store the levels indicating repetition level and
definition level indicating null/not null in the Node object.

- Wes

On Mon, Apr 17, 2017 at 7:43 PM, Keith Chapman <ke...@gmail.com>
wrote:

> Hi Wes,
>
> I was looking into how I could recreate a record from the columnReaders.
> From reading through the code I understand the the hierarchy is stored
> using NodePtr's, i don't see them storing the definition level or the
> repitition levels though, am I missing something? From what I see the
> max_repitition_level and the max_definitinition level is only stored in the
> ColumnDescriptor and not in the Node
>
> Regards,
> Keith.
>
> http://keith-chapman.com
>
> On Wed, Apr 5, 2017 at 2:26 PM, Keith Chapman <ke...@gmail.com>
> wrote:
>
>> Thanks for the info Wes, I looked around the code and did not find
>> anything about how I could construct a row from a bunch of columnar
>> readers. Reconstructing records from columns with a nested schema may be
>> something that other folks are also interested in. I'm trying to do some
>> read up on parquet and trying to understand how I would do it, would live
>> to put it out there for feedback and potentially up streaming once I have
>> something working.
>>
>> Regards,
>> Keith.
>>
>> http://keith-chapman.com
>>
>> On Wed, Apr 5, 2017 at 2:20 PM, Wes McKinney <we...@gmail.com> wrote:
>>
>>> hi Keith -- we have focused so far on columnar reads (i.e. Arrow) vs.
>>> row/record reads. We would welcome contributions to add a record
>>> reader interface
>>>
>>> Thanks
>>> Wes
>>>
>>> On Tue, Apr 4, 2017 at 8:21 PM, Keith Chapman <ke...@gmail.com>
>>> wrote:
>>> > Hi,
>>> >
>>> > I'm trying to read a parquet file which has a nested schema, i seen
>>> that
>>> > the java library have a record reader API which helps construct a
>>> record.
>>> > Does the cpp API have something equivalent? If not what is the
>>> > recomendation as to how to read a nested parquet file using the cpp
>>> API.
>>> >
>>> > Regards,
>>> > Keith.
>>> >
>>> > http://keith-chapman.com
>>>
>>
>>
>

Re: [PARQUET-CPP] Does Parquet cpp have a record reader interface?

Posted by Keith Chapman <ke...@gmail.com>.
Hi Wes,

I was looking into how I could recreate a record from the columnReaders.
From reading through the code I understand the the hierarchy is stored
using NodePtr's, i don't see them storing the definition level or the
repitition levels though, am I missing something? From what I see the
max_repitition_level and the max_definitinition level is only stored in the
ColumnDescriptor and not in the Node

Regards,
Keith.

http://keith-chapman.com

On Wed, Apr 5, 2017 at 2:26 PM, Keith Chapman <ke...@gmail.com>
wrote:

> Thanks for the info Wes, I looked around the code and did not find
> anything about how I could construct a row from a bunch of columnar
> readers. Reconstructing records from columns with a nested schema may be
> something that other folks are also interested in. I'm trying to do some
> read up on parquet and trying to understand how I would do it, would live
> to put it out there for feedback and potentially up streaming once I have
> something working.
>
> Regards,
> Keith.
>
> http://keith-chapman.com
>
> On Wed, Apr 5, 2017 at 2:20 PM, Wes McKinney <we...@gmail.com> wrote:
>
>> hi Keith -- we have focused so far on columnar reads (i.e. Arrow) vs.
>> row/record reads. We would welcome contributions to add a record
>> reader interface
>>
>> Thanks
>> Wes
>>
>> On Tue, Apr 4, 2017 at 8:21 PM, Keith Chapman <ke...@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > I'm trying to read a parquet file which has a nested schema, i seen that
>> > the java library have a record reader API which helps construct a
>> record.
>> > Does the cpp API have something equivalent? If not what is the
>> > recomendation as to how to read a nested parquet file using the cpp API.
>> >
>> > Regards,
>> > Keith.
>> >
>> > http://keith-chapman.com
>>
>
>

Re: [PARQUET-CPP] Does Parquet cpp have a record reader interface?

Posted by Keith Chapman <ke...@gmail.com>.
Thanks for the info Wes, I looked around the code and did not find anything
about how I could construct a row from a bunch of columnar readers.
Reconstructing records from columns with a nested schema may be something
that other folks are also interested in. I'm trying to do some read up on
parquet and trying to understand how I would do it, would live to put it
out there for feedback and potentially up streaming once I have something
working.

Regards,
Keith.

http://keith-chapman.com

On Wed, Apr 5, 2017 at 2:20 PM, Wes McKinney <we...@gmail.com> wrote:

> hi Keith -- we have focused so far on columnar reads (i.e. Arrow) vs.
> row/record reads. We would welcome contributions to add a record
> reader interface
>
> Thanks
> Wes
>
> On Tue, Apr 4, 2017 at 8:21 PM, Keith Chapman <ke...@gmail.com>
> wrote:
> > Hi,
> >
> > I'm trying to read a parquet file which has a nested schema, i seen that
> > the java library have a record reader API which helps construct a record.
> > Does the cpp API have something equivalent? If not what is the
> > recomendation as to how to read a nested parquet file using the cpp API.
> >
> > Regards,
> > Keith.
> >
> > http://keith-chapman.com
>

Re: [PARQUET-CPP] Does Parquet cpp have a record reader interface?

Posted by Wes McKinney <we...@gmail.com>.
hi Keith -- we have focused so far on columnar reads (i.e. Arrow) vs.
row/record reads. We would welcome contributions to add a record
reader interface

Thanks
Wes

On Tue, Apr 4, 2017 at 8:21 PM, Keith Chapman <ke...@gmail.com> wrote:
> Hi,
>
> I'm trying to read a parquet file which has a nested schema, i seen that
> the java library have a record reader API which helps construct a record.
> Does the cpp API have something equivalent? If not what is the
> recomendation as to how to read a nested parquet file using the cpp API.
>
> Regards,
> Keith.
>
> http://keith-chapman.com