You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Padma Penumarthy (JIRA)" <ji...@apache.org> on 2018/02/15 03:00:00 UTC

[jira] [Created] (DRILL-6161) Allocate memory for outgoing vectors based on sizing calculations

Padma Penumarthy created DRILL-6161:
---------------------------------------

             Summary: Allocate memory for outgoing vectors based on sizing calculations
                 Key: DRILL-6161
                 URL: https://issues.apache.org/jira/browse/DRILL-6161
             Project: Apache Drill
          Issue Type: Improvement
          Components: Execution - Flow
    Affects Versions: 1.12.0
            Reporter: Padma Penumarthy
            Assignee: Padma Penumarthy
             Fix For: 1.13.0


Currently, in drill, we allocate memory for outgoing value vectors either for max value of 64k entries or start from 4096 and keep doubling as we need more memory. Every time we double, we allocate a new vector and do a copy. We also zero fill the new half. This has performance penalty. As part of batch sizing project, based on incoming batch(es) sizing information, we are limiting number of rows in outgoing batch based on memory. Since we know the number of rows and the average size of each column in the outgoing batch, we should use that information to preallocate memory for the outgoing vectors. This will be done as each operator is being changed to adhere to produce configured batch sizes.

Another improvement that can be done is packing the value vectors as dense as possible to improve the over all memory utilization. Since we allocate memory in powers of 2, once we figure out the number of rows to include in the outgoing batch, round it down to closest power of 2 and allocate memory for that many rows.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)