You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Jinfeng Ni (JIRA)" <ji...@apache.org> on 2015/05/13 02:15:00 UTC

[jira] [Commented] (DRILL-3044) Very deep record batch fetching stack for single table query (TestTpchLimit0.tpch01)

    [ https://issues.apache.org/jira/browse/DRILL-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541084#comment-14541084 ] 

Jinfeng Ni commented on DRILL-3044:
-----------------------------------

Here is the verbose physical plan for tpch01 limit 0.

{code}
Drill Physical : 
00-00    Screen : rowType = RecordType(ANY l_returnflag, ANY l_linestatus, ANY sum_qty, ANY sum_base_price, ANY sum_disc_price, ANY sum_charge, ANY avg_qty, ANY avg_price, ANY avg_disc, BIGINT count_order): rowcount = 3008.75, cumulative cost = {222948.375 rows, 5874703.48590918 cpu, 0.0 io, 0.0 network, 1035010.0000000001 memory}, id = 726
00-01      Project(l_returnflag=[$0], l_linestatus=[$1], sum_qty=[$2], sum_base_price=[$3], sum_disc_price=[$4], sum_charge=[$5], avg_qty=[$6], avg_price=[$7], avg_disc=[$8], count_order=[$9]) : rowType = RecordType(ANY l_returnflag, ANY l_linestatus, ANY sum_qty, ANY sum_base_price, ANY sum_disc_price, ANY sum_charge, ANY avg_qty, ANY avg_price, ANY avg_disc, BIGINT count_order): rowcount = 3008.75, cumulative cost = {222647.5 rows, 5874402.61090918 cpu, 0.0 io, 0.0 network, 1035010.0000000001 memory}, id = 725
00-02        SelectionVectorRemover : rowType = RecordType(ANY l_returnflag, ANY l_linestatus, ANY sum_qty, ANY sum_base_price, ANY sum_disc_price, ANY sum_charge, ANY avg_qty, ANY avg_price, ANY avg_disc, BIGINT count_order): rowcount = 3008.75, cumulative cost = {222647.5 rows, 5874402.61090918 cpu, 0.0 io, 0.0 network, 1035010.0000000001 memory}, id = 724
00-03          Limit(fetch=[0]) : rowType = RecordType(ANY l_returnflag, ANY l_linestatus, ANY sum_qty, ANY sum_base_price, ANY sum_disc_price, ANY sum_charge, ANY avg_qty, ANY avg_price, ANY avg_disc, BIGINT count_order): rowcount = 3008.75, cumulative cost = {219638.75 rows, 5871393.86090918 cpu, 0.0 io, 0.0 network, 1035010.0000000001 memory}, id = 723
00-04            SelectionVectorRemover : rowType = RecordType(ANY l_returnflag, ANY l_linestatus, ANY sum_qty, ANY sum_base_price, ANY sum_disc_price, ANY sum_charge, ANY avg_qty, ANY avg_price, ANY avg_disc, BIGINT count_order): rowcount = 3008.75, cumulative cost = {219638.75 rows, 5871393.86090918 cpu, 0.0 io, 0.0 network, 1035010.0000000001 memory}, id = 722
00-05              Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) : rowType = RecordType(ANY l_returnflag, ANY l_linestatus, ANY sum_qty, ANY sum_base_price, ANY sum_disc_price, ANY sum_charge, ANY avg_qty, ANY avg_price, ANY avg_disc, BIGINT count_order): rowcount = 3008.75, cumulative cost = {216630.0 rows, 5868385.11090918 cpu, 0.0 io, 0.0 network, 1035010.0000000001 memory}, id = 721
00-06                Project(l_returnflag=[$0], l_linestatus=[$1], sum_qty=[CASE(=($3, 0), null, $2)], sum_base_price=[CASE(=($5, 0), null, $4)], sum_disc_price=[CASE(=($7, 0), null, $6)], sum_charge=[CASE(=($9, 0), null, $8)], avg_qty=[CAST(/(CastHigh(CASE(=($3, 0), null, $2)), $3)):ANY NOT NULL], avg_price=[CAST(/(CastHigh(CASE(=($5, 0), null, $4)), $5)):ANY NOT NULL], avg_disc=[CAST(/(CastHigh(CASE(=($11, 0), null, $10)), $11)):ANY NOT NULL], count_order=[$12]) : rowType = RecordType(ANY l_returnflag, ANY l_linestatus, ANY sum_qty, ANY sum_base_price, ANY sum_disc_price, ANY sum_charge, ANY avg_qty, ANY avg_price, ANY avg_disc, BIGINT count_order): rowcount = 3008.75, cumulative cost = {213621.25 rows, 5590257.5 cpu, 0.0 io, 0.0 network, 794310.0000000001 memory}, id = 720
00-07                  HashAgg(group=[{0, 1}], agg#0=[$SUM0($2)], agg#1=[COUNT($2)], agg#2=[$SUM0($3)], agg#3=[COUNT($3)], agg#4=[$SUM0($4)], agg#5=[COUNT($4)], agg#6=[$SUM0($5)], agg#7=[COUNT($5)], agg#8=[$SUM0($6)], agg#9=[COUNT($6)], count_order=[COUNT()]) : rowType = RecordType(ANY l_returnflag, ANY l_linestatus, ANY $f2, BIGINT $f3, ANY $f4, BIGINT $f5, ANY $f6, BIGINT $f7, ANY $f8, BIGINT $f9, ANY $f10, BIGINT $f11, BIGINT count_order): rowcount = 3008.75, cumulative cost = {210612.5 rows, 5506012.5 cpu, 0.0 io, 0.0 network, 794310.0000000001 memory}, id = 719
00-08                    Project(l_returnflag=[$0], l_linestatus=[$1], l_quantity=[$3], l_extendedprice=[$4], $f4=[*($4, -(1, $5))], $f5=[*(*($4, -(1, $5)), +(1, $6))], l_discount=[$5]) : rowType = RecordType(ANY l_returnflag, ANY l_linestatus, ANY l_quantity, ANY l_extendedprice, ANY $f4, ANY $f5, ANY l_discount): rowcount = 30087.5, cumulative cost = {180525.0 rows, 1053062.5 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 718
00-09                      SelectionVectorRemover : rowType = RecordType(ANY l_returnflag, ANY l_linestatus, ANY l_shipdate, ANY l_quantity, ANY l_extendedprice, ANY l_discount, ANY l_tax): rowcount = 30087.5, cumulative cost = {150437.5 rows, 812362.5 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 717
00-10                        Filter(condition=[<=($2, 1998-08-03)]) : rowType = RecordType(ANY l_returnflag, ANY l_linestatus, ANY l_shipdate, ANY l_quantity, ANY l_extendedprice, ANY l_discount, ANY l_tax): rowcount = 30087.5, cumulative cost = {120350.0 rows, 782275.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 716
00-11                          Project(l_returnflag=[$1], l_linestatus=[$6], l_shipdate=[$5], l_quantity=[$2], l_extendedprice=[$3], l_discount=[$0], l_tax=[$4]) : rowType = RecordType(ANY l_returnflag, ANY l_linestatus, ANY l_shipdate, ANY l_quantity, ANY l_extendedprice, ANY l_discount, ANY l_tax): rowcount = 60175.0, cumulative cost = {60175.0 rows, 421225.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 715
00-12                            Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/lineitem.parquet]], selectionRoot=/tpch/lineitem.parquet, numFiles=1, columns=[`l_returnflag`, `l_linestatus`, `l_shipdate`, `l_quantity`, `l_extendedprice`, `l_discount`, `l_tax`]]]) : rowType = RecordType(ANY l_discount, ANY l_returnflag, ANY l_quantity, ANY l_extendedprice, ANY l_tax, ANY l_shipdate, ANY l_linestatus): rowcount = 60175.0, cumulative cost = {60175.0 rows, 421225.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 714
{code}

Under screen operator, there are 12 operators total, 3 of them are SelectionVectorRemover, which is inserted simply because some execution operators could not handle the outcome of sorted/filtered recordbatch.  I'm not clear how the execution would get this deep stack tree (Does each operator would trigger 2 getChildren() call:  There are 24 getChildren() ). But from the plan's perspective, there seems not too much space we could improve.  



> Very deep record batch fetching stack for single table query (TestTpchLimit0.tpch01)
> ------------------------------------------------------------------------------------
>
>                 Key: DRILL-3044
>                 URL: https://issues.apache.org/jira/browse/DRILL-3044
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 0.9.0
>            Reporter: Chris Westin
>            Assignee: Jinfeng Ni
>
> I ran TestTpchLimit0 in a constrained memory environment while hunting for a memory leak.
> Here are the startup parameters (from Eclipse's test launch configuration):
> -Xms512m
> -Xmx3g
> -Ddrill.exec.http.enabled=false
> -Ddrill.exec.sys.store.provider.local.write=false
> -Dorg.apache.drill.exec.server.Drillbit.system_options="org.apache.drill.exec.compile.ClassTransformer.scalar_replacement=on"
> -XX:MaxPermSize=256M -XX:MaxDirectMemorySize=3072M
> -XX:+CMSClassUnloadingEnabled -ea
> -Ddrill.exec.memory.top.max=67108864
> Except for the last value, these were taken from the root pom.xml; the last value constrains the amount of direct memory used to 64M. (We're looking for leaks that happen when queries fail to allocate memory and have to be cancelled and aren't cleaned up properly).
> I find that there is indeed a leak for tpch01 when the fragment is cleaned up. tpch01 looks like this:
> select
>   l_returnflag,
>   l_linestatus,
>   sum(l_quantity) as sum_qty,
>   sum(l_extendedprice) as sum_base_price,
>   sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
>   sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
>   avg(l_quantity) as avg_qty,
>   avg(l_extendedprice) as avg_price,
>   avg(l_discount) as avg_disc,
>   count(*) as count_order
> from
>   cp.`tpch/lineitem.parquet`
> where
>   l_shipdate <= date '1998-12-01' - interval '120' day (3)
> group by
>   l_returnflag,
>   l_linestatus
> order by
>   l_returnflag,
>   l_linestatus;
> Basically a single table query with a group and sort.
> But in the trace file, this is the stack at the time of the creation of the leaked allocator:
>     org.apache.drill.exec.ops.FragmentContext.getNewChildAllocator:302
>     org.apache.drill.exec.ops.OperatorContextImpl.<init>:43
>     org.apache.drill.exec.ops.FragmentContext.newOperatorContext:366
>     org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch:70
>     org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch:1
>     org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch:140
>     org.apache.drill.exec.physical.impl.ImplCreator.getChildren:163
>     org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch:121
>     org.apache.drill.exec.physical.impl.ImplCreator.getChildren:163
>     org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch:121
>     org.apache.drill.exec.physical.impl.ImplCreator.getChildren:163
>     org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch:121
>     org.apache.drill.exec.physical.impl.ImplCreator.getChildren:163
>     org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch:121
>     org.apache.drill.exec.physical.impl.ImplCreator.getChildren:163
>     org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch:121
>     org.apache.drill.exec.physical.impl.ImplCreator.getChildren:163
>     org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch:121
>     org.apache.drill.exec.physical.impl.ImplCreator.getChildren:163
>     org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch:121
>     org.apache.drill.exec.physical.impl.ImplCreator.getChildren:163
>     org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch:121
>     org.apache.drill.exec.physical.impl.ImplCreator.getChildren:163
>     org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch:121
>     org.apache.drill.exec.physical.impl.ImplCreator.getChildren:163
>     org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch:121
>     org.apache.drill.exec.physical.impl.ImplCreator.getChildren:163
>     org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch:121
>     org.apache.drill.exec.physical.impl.ImplCreator.getChildren:163
>     org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch:121
>     org.apache.drill.exec.physical.impl.ImplCreator.getChildren:163
>     org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch:121
>     org.apache.drill.exec.physical.impl.ImplCreator.getChildren:163
>     org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch:121
>     org.apache.drill.exec.physical.impl.ImplCreator.getChildren:163
>     org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch:121
>     org.apache.drill.exec.physical.impl.ImplCreator.getChildren:163
>     org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch:121
>     org.apache.drill.exec.physical.impl.ImplCreator.getChildren:163
>     org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch:121
>     org.apache.drill.exec.physical.impl.ImplCreator.getChildren:163
>     org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch:121
>     org.apache.drill.exec.physical.impl.ImplCreator.getChildren:163
>     org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch:121
>     org.apache.drill.exec.physical.impl.ImplCreator.getChildren:163
>     org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch:121
>     org.apache.drill.exec.physical.impl.ImplCreator.getChildren:163
>     org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch:121
>     org.apache.drill.exec.physical.impl.ImplCreator.getChildren:163
>     org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch:121
>     org.apache.drill.exec.physical.impl.ImplCreator.getChildren:163
>     org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch:121
>     org.apache.drill.exec.physical.impl.ImplCreator.getChildren:163
>     org.apache.drill.exec.physical.impl.ImplCreator.getRootExec:96
>     org.apache.drill.exec.physical.impl.ImplCreator.getExec:77
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run:199
> That seems like it's too deep for this query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)