You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Paul Rogers (Jira)" <ji...@apache.org> on 2019/12/13 03:01:00 UTC

[jira] [Resolved] (DRILL-5272) Text file reader is inefficient

     [ https://issues.apache.org/jira/browse/DRILL-5272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Rogers resolved DRILL-5272.
--------------------------------
    Resolution: Fixed

This issue was fixed when converting the text readers to use the result set loader framework.

> Text file reader is inefficient
> -------------------------------
>
>                 Key: DRILL-5272
>                 URL: https://issues.apache.org/jira/browse/DRILL-5272
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Minor
>
> From inspection of the ScanBatch and CompliantTextReader.
> Every batch holds about five implicit vectors. These are repeated for every row, which can greatly increase incoming data size.
> When populating the vectors, the allocation starts at 8 bytes and grows to 16 bytes, causing a (slow) memory reallocation for every vector:
> {code}
> [org.apache.drill.exec.vector.UInt4Vector] - 
> Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: [8] -> [16]
> {code}
> Whether due to the above, or a different issues is causing memory growth in the scan batch:
> {code}
> Entry Memory: 6,456,448
> Exit Memory: 7,636,312
> Entry Memory: 7570560
> Exit Memory: 8750424
> ...
> {code}
> Evidently the implicit vectors are added in response to a "SELECT *" query. Perhaps provide them only if actually requested.
> The vectors are populated for every row, making a copy of a potentially long file name and path for every record. Since the values are common to every record, perhaps we can use the same data copy for each, but have the offset vector for each record just point to the single copy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)