You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Rahul Challapalli (JIRA)" <ji...@apache.org> on 2017/01/27 16:31:24 UTC

[jira] [Updated] (DRILL-5228) External Sort : Several operators in the attached query profile take more time than expected

     [ https://issues.apache.org/jira/browse/DRILL-5228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rahul Challapalli updated DRILL-5228:
-------------------------------------
    Description: 
Environment :
{code}
git.commit.id.abbrev=2af709f
DRILL_MAX_DIRECT_MEMORY="32G"
DRILL_MAX_HEAP="4G"
{code}

Data Set : 
{code}
Size : ~18 GB
No Of Columns : 1
Column Width : 250 bytes
{code}

Query ( took ~127 minutes to complete)
{code}
alter session set `planner.width.max_per_node` = 1;
alter session set `planner.disable_exchanges` = true;
alter session set `planner.memory.max_query_memory_per_node` = 14106127360;
select * from (select * from dfs.`/drill/testdata/resource-manager/250wide.tbl` order by columns[0])d where d.columns[0] = 'ljdfhwuehnoiueyf';
{code}

*Selection Vector Remover*
{code}
Time Spent based on profile : 7m58s
Problem : Since the external sort spilled to the disk in this case, the selection vector remover should have been an no-op. There is no clear justification for the time spent
{code}

*Text Sub Scan*
{code}
Time spent based on profile : 13m25s
Problem : I captured the profile screenshot (before-spill.png) once the memory allocation for the sort reached its limit. Based on this the scan took 2m13s for reading the first 12.48GB of data before sorting/spilling began. For the remaining ~5.5 GB it took  ~11 minutes.
{code}

*Projects*
{code}
Timings for the 4 projects based on profile. While I do not have a good reason to suspect, these numbers seemed high.
Project 1 : 4m54s
Project 2 : 3m07s
Project 3 : 4m10s
Project 4 : 0.003s
{code}

The time spent in the external sort based on the profile is wrong. DRILL-5227 is reported for this.

  was:
Environment :
{code}
git.commit.id.abbrev=2af709f
DRILL_MAX_DIRECT_MEMORY="32G"
DRILL_MAX_HEAP="4G"
{code}

Data Set : 
{code}
Size : ~18 GB
No Of Columns : 1
Column Width : 256 bytes
{code}

Query ( took ~127 minutes to complete)
{code}
alter session set `planner.width.max_per_node` = 1;
alter session set `planner.disable_exchanges` = true;
alter session set `planner.memory.max_query_memory_per_node` = 14106127360;
select * from (select * from dfs.`/drill/testdata/resource-manager/250wide.tbl` order by columns[0])d where d.columns[0] = 'ljdfhwuehnoiueyf';
{code}

*Selection Vector Remover*
{code}
Time Spent based on profile : 7m58s
Problem : Since the external sort spilled to the disk in this case, the selection vector remover should have been an no-op. There is no clear justification for the time spent
{code}

*Text Sub Scan*
{code}
Time spent based on profile : 13m25s
Problem : I captured the profile screenshot (before-spill.png) once the memory allocation for the sort reached its limit. Based on this the scan took 2m13s for reading the first 12.48GB of data before sorting/spilling began. For the remaining ~5.5 GB it took  ~11 minutes.
{code}

*Projects*
{code}
Timings for the 4 projects based on profile. While I do not have a good reason to suspect, these numbers seemed high.
Project 1 : 4m54s
Project 2 : 3m07s
Project 3 : 4m10s
Project 4 : 0.003s
{code}

The time spent in the external sort based on the profile is wrong. DRILL-5227 is reported for this.

        Summary: External Sort : Several operators in the attached query profile take more time than expected  (was: Several operators in the attached query profile take more time than expected)

> External Sort : Several operators in the attached query profile take more time than expected
> --------------------------------------------------------------------------------------------
>
>                 Key: DRILL-5228
>                 URL: https://issues.apache.org/jira/browse/DRILL-5228
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>    Affects Versions: 1.10.0
>            Reporter: Rahul Challapalli
>         Attachments: 2775bf26-404a-4fee-ba3d-740a191899fa.sys.drill, before-spill.png
>
>
> Environment :
> {code}
> git.commit.id.abbrev=2af709f
> DRILL_MAX_DIRECT_MEMORY="32G"
> DRILL_MAX_HEAP="4G"
> {code}
> Data Set : 
> {code}
> Size : ~18 GB
> No Of Columns : 1
> Column Width : 250 bytes
> {code}
> Query ( took ~127 minutes to complete)
> {code}
> alter session set `planner.width.max_per_node` = 1;
> alter session set `planner.disable_exchanges` = true;
> alter session set `planner.memory.max_query_memory_per_node` = 14106127360;
> select * from (select * from dfs.`/drill/testdata/resource-manager/250wide.tbl` order by columns[0])d where d.columns[0] = 'ljdfhwuehnoiueyf';
> {code}
> *Selection Vector Remover*
> {code}
> Time Spent based on profile : 7m58s
> Problem : Since the external sort spilled to the disk in this case, the selection vector remover should have been an no-op. There is no clear justification for the time spent
> {code}
> *Text Sub Scan*
> {code}
> Time spent based on profile : 13m25s
> Problem : I captured the profile screenshot (before-spill.png) once the memory allocation for the sort reached its limit. Based on this the scan took 2m13s for reading the first 12.48GB of data before sorting/spilling began. For the remaining ~5.5 GB it took  ~11 minutes.
> {code}
> *Projects*
> {code}
> Timings for the 4 projects based on profile. While I do not have a good reason to suspect, these numbers seemed high.
> Project 1 : 4m54s
> Project 2 : 3m07s
> Project 3 : 4m10s
> Project 4 : 0.003s
> {code}
> The time spent in the external sort based on the profile is wrong. DRILL-5227 is reported for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)