You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Pritesh Maker (JIRA)" <ji...@apache.org> on 2018/08/21 18:17:00 UTC
[jira] [Updated] (DRILL-6688) Data batches for Project operator exceed the maximum specified

     [ https://issues.apache.org/jira/browse/DRILL-6688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pritesh Maker updated DRILL-6688:
---------------------------------
    Reviewer: Boaz Ben-Zvi

> Data batches for Project operator exceed the maximum specified
> --------------------------------------------------------------
>
>                 Key: DRILL-6688
>                 URL: https://issues.apache.org/jira/browse/DRILL-6688
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>    Affects Versions: 1.14.0
>            Reporter: Robert Hou
>            Assignee: Karthikeyan Manivannan
>            Priority: Major
>             Fix For: 1.15.0
>
>
> I ran this query:
> alter session set `drill.exec.memory.operator.project.output_batch_size` = 131072;
> alter session set `planner.width.max_per_node` = 1;
> alter session set `planner.width.max_per_query` = 1;
> select
> chr(101) CharacterValuea,
> chr(102) CharacterValueb,
> chr(103) CharacterValuec,
> chr(104) CharacterValued,
> chr(105) CharacterValuee
> from dfs.`/drill/testdata/batch_memory/character5_1MB.parquet`;
> The output has 1024 identical lines:
> e f g h i
> There is one incoming batch:
> 2018-08-09 15:50:14,794 [24933ad8-a5e2-73f1-90dd-947fc2938e54:frag:0:0] DEBUG o.a.d.e.p.i.p.ProjectMemoryManager - BATCH_STATS, incoming: Batch size:
> { Records: 60000, Total size: 0, Data size: 300000, Gross row width: 0, Net row width: 5, Density: 0% }
> Batch schema & sizes:
> { `_DEFAULT_COL_TO_READ_`(type: OPTIONAL INT, count: 60000, Per entry: std data size: 4, std net size: 5, actual data size: 4, actual net size: 5 Totals: data size: 240000, net size: 300000) }
> }
> There are four outgoing batches. All are too large. The first three look like this:
> 2018-08-09 15:50:14,799 [24933ad8-a5e2-73f1-90dd-947fc2938e54:frag:0:0] DEBUG o.a.d.e.p.i.p.ProjectRecordBatch - BATCH_STATS, outgoing: Batch size:
> { Records: 16383, Total size: 0, Data size: 409575, Gross row width: 0, Net row width: 25, Density: 0% }
> Batch schema & sizes:
> { CharacterValuea(type: REQUIRED VARCHAR, count: 16383, Per entry: std data size: 50, std net size: 54, actual data size: 1, actual net size: 5 Totals: data size: 16383, net size: 81915) }
> CharacterValueb(type: REQUIRED VARCHAR, count: 16383, Per entry: std data size: 50, std net size: 54, actual data size: 1, actual net size: 5 Totals: data size: 16383, net size: 81915) }
> CharacterValuec(type: REQUIRED VARCHAR, count: 16383, Per entry: std data size: 50, std net size: 54, actual data size: 1, actual net size: 5 Totals: data size: 16383, net size: 81915) }
> CharacterValued(type: REQUIRED VARCHAR, count: 16383, Per entry: std data size: 50, std net size: 54, actual data size: 1, actual net size: 5 Totals: data size: 16383, net size: 81915) }
> CharacterValuee(type: REQUIRED VARCHAR, count: 16383, Per entry: std data size: 50, std net size: 54, actual data size: 1, actual net size: 5 Totals: data size: 16383, net size: 81915) }
> }
> The last batch is smaller because it has the remaining records.
> The data size (409575) exceeds the maximum batch size (131072).
> character415.q



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)