You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "salim achouche (JIRA)" <ji...@apache.org> on 2017/10/06 01:39:00 UTC
[jira] [Commented] (DRILL-5847) Flat Parquet Reader Performance
Analysis
[ https://issues.apache.org/jira/browse/DRILL-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16194035#comment-16194035 ]
salim achouche commented on DRILL-5847:
---------------------------------------
Summary of the CPU based Enhancements:
*Improve CPU Cache Locality -*
* The current implementation performs row based processing (foreach row do; do; for each column do; ..)
* This is done to compute a batch size that can fit within a fixed memory size
* Unfortunately, even this expensive logic doesn't work as data is populated using the Value Vectors setSafe(...) APIs
* This API doesn't provide any feedback to caller as it automatically extends the allocator if the new value doesn't fit
* We propose to switch to a columnar processing to maximize CPU cache locality
*NOTE -*
* Memory batch size enforcement will be done under another JIRA that will target all operators (not just Parquet)
* [~Paul.Rogers] is leading this effort; I'll be implementing the Parquet scanner enforcement
> Flat Parquet Reader Performance Analysis
> ----------------------------------------
>
> Key: DRILL-5847
> URL: https://issues.apache.org/jira/browse/DRILL-5847
> Project: Apache Drill
> Issue Type: Sub-task
> Components: Storage - Parquet
> Affects Versions: 1.11.0
> Reporter: salim achouche
> Assignee: salim achouche
> Labels: performance
> Fix For: 1.12.0
>
> Attachments: Drill Framework Enhancements.pdf
>
>
> This task is to analyze the Flat Parquet Reader logic looking for performance improvements opportunities.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)