You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@drill.apache.org by Damien Profeta <da...@amadeus.com> on 2017/09/13 23:17:03 UTC

Switch between new parquet reader and old one

Hi,

I was looking at the code that read the parquet file and noticed there 
is a switch 'isComplex' to choose if it is possible to use the new 
reader or if we have to use the old one.
The switch is based on the columns of the files (complex type or 
repetition level) but it doesn't care about the columns that have to 
effectively read.

If for the ongoing query, we only read simple column and not with 
repetition level, couldn't we use the new reader? That would be a minor 
optimization but it could be worth.

Thanks
Damien

RE: Switch between new parquet reader and old one

Posted by Kunal Khatua <kk...@mapr.com>.

+1 on this, though in the long term, we ought to be able to handle even complex types within the Drill native reader. 

-----Original Message-----
From: Jinfeng Ni [mailto:jni@apache.org] 
Sent: Wednesday, September 13, 2017 4:29 PM
To: dev <de...@drill.apache.org>
Subject: Re: Switch between new parquet reader and old one

That makes sense. We should only check the requested columns, not every column in parquet file, to decide which parquet reader to use.


On Wed, Sep 13, 2017 at 4:17 PM, Damien Profeta <da...@amadeus.com>
wrote:

> Hi,
>
> I was looking at the code that read the parquet file and noticed there 
> is a switch 'isComplex' to choose if it is possible to use the new 
> reader or if we have to use the old one.
> The switch is based on the columns of the files (complex type or 
> repetition level) but it doesn't care about the columns that have to 
> effectively read.
>
> If for the ongoing query, we only read simple column and not with 
> repetition level, couldn't we use the new reader? That would be a 
> minor optimization but it could be worth.
>
> Thanks
> Damien
>
>

Re: Switch between new parquet reader and old one

Posted by Jinfeng Ni <jn...@apache.org>.

That makes sense. We should only check the requested columns, not every
column in parquet file, to decide which parquet reader to use.


On Wed, Sep 13, 2017 at 4:17 PM, Damien Profeta <da...@amadeus.com>
wrote:

> Hi,
>
> I was looking at the code that read the parquet file and noticed there is
> a switch 'isComplex' to choose if it is possible to use the new reader or
> if we have to use the old one.
> The switch is based on the columns of the files (complex type or
> repetition level) but it doesn't care about the columns that have to
> effectively read.
>
> If for the ongoing query, we only read simple column and not with
> repetition level, couldn't we use the new reader? That would be a minor
> optimization but it could be worth.
>
> Thanks
> Damien
>
>