You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Damien Profeta <da...@amadeus.com> on 2017/09/13 23:17:03 UTC
Switch between new parquet reader and old one
Hi,
I was looking at the code that read the parquet file and noticed there
is a switch 'isComplex' to choose if it is possible to use the new
reader or if we have to use the old one.
The switch is based on the columns of the files (complex type or
repetition level) but it doesn't care about the columns that have to
effectively read.
If for the ongoing query, we only read simple column and not with
repetition level, couldn't we use the new reader? That would be a minor
optimization but it could be worth.
Thanks
Damien
RE: Switch between new parquet reader and old one
Posted by Kunal Khatua <kk...@mapr.com>.
+1 on this, though in the long term, we ought to be able to handle even complex types within the Drill native reader.
-----Original Message-----
From: Jinfeng Ni [mailto:jni@apache.org]
Sent: Wednesday, September 13, 2017 4:29 PM
To: dev <de...@drill.apache.org>
Subject: Re: Switch between new parquet reader and old one
That makes sense. We should only check the requested columns, not every column in parquet file, to decide which parquet reader to use.
On Wed, Sep 13, 2017 at 4:17 PM, Damien Profeta <da...@amadeus.com>
wrote:
> Hi,
>
> I was looking at the code that read the parquet file and noticed there
> is a switch 'isComplex' to choose if it is possible to use the new
> reader or if we have to use the old one.
> The switch is based on the columns of the files (complex type or
> repetition level) but it doesn't care about the columns that have to
> effectively read.
>
> If for the ongoing query, we only read simple column and not with
> repetition level, couldn't we use the new reader? That would be a
> minor optimization but it could be worth.
>
> Thanks
> Damien
>
>
Re: Switch between new parquet reader and old one
Posted by Jinfeng Ni <jn...@apache.org>.
That makes sense. We should only check the requested columns, not every
column in parquet file, to decide which parquet reader to use.
On Wed, Sep 13, 2017 at 4:17 PM, Damien Profeta <da...@amadeus.com>
wrote:
> Hi,
>
> I was looking at the code that read the parquet file and noticed there is
> a switch 'isComplex' to choose if it is possible to use the new reader or
> if we have to use the old one.
> The switch is based on the columns of the files (complex type or
> repetition level) but it doesn't care about the columns that have to
> effectively read.
>
> If for the ongoing query, we only read simple column and not with
> repetition level, couldn't we use the new reader? That would be a minor
> optimization but it could be worth.
>
> Thanks
> Damien
>
>