You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2017/03/02 23:37:45 UTC

[jira] [Resolved] (DRILL-5267) Managed external sort spills too often with Parquet data

     [ https://issues.apache.org/jira/browse/DRILL-5267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Rogers resolved DRILL-5267.
--------------------------------
       Resolution: Fixed
    Fix Version/s:     (was: 1.10)
                   1.10.0

> Managed external sort spills too often with Parquet data
> --------------------------------------------------------
>
>                 Key: DRILL-5267
>                 URL: https://issues.apache.org/jira/browse/DRILL-5267
>             Project: Apache Drill
>          Issue Type: Sub-task
>    Affects Versions: 1.10
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>             Fix For: 1.10.0
>
>
> DRILL-5266 describes how Parquet produces low-density record batches. The result of these batches is that the external sort spills more frequently than it should because it sizes spill files based on batch size, not data content of the batch. Since Parquet batches are 95% empty space, the spill files end up far too small.
> Adjust the spill calculations based on actual data content, not the size of the overall record batch.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)