You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Oleksandr Kalinin (JIRA)" <ji...@apache.org> on 2018/04/26 16:38:00 UTC

[jira] [Commented] (DRILL-5797) Use more often the new parquet reader

    [ https://issues.apache.org/jira/browse/DRILL-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16454503#comment-16454503 ] 

Oleksandr Kalinin commented on DRILL-5797:
------------------------------------------

When debugging complex12.q failure from the list of failing queries above it appears that there is another not related to case sensitivity. 

If file schema has primitive column A and repeated column B with nested column A (B.A), then executing query 'select A from ....' leads to following scenario:

(1) rowGroupScan passed to ParquetScanBatchCreator contains only column A. That will be correctly handled by the code in PR allowing the fast reader
(2) However, ParquetSchema passed to ReadStat will contain both A and B.A which leads to failure explained above in this JIRA as B.A is complex

Looks like additional issue, not related to PR code though. I also could reproduce case sensitivity issue, investigating both issues currently.

> Use more often the new parquet reader
> -------------------------------------
>
>                 Key: DRILL-5797
>                 URL: https://issues.apache.org/jira/browse/DRILL-5797
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Parquet
>            Reporter: Damien Profeta
>            Assignee: Damien Profeta
>            Priority: Major
>             Fix For: 1.14.0
>
>
> The choice of using the regular parquet reader of the optimized one is based of what type of columns is in the file. But the columns that are read by the query doesn't matter. We can increase a little bit the cases where the optimized reader is used by checking is the projected column are simple or not.
> This is an optimization waiting for the fast parquet reader to handle complex structure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)