You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Robert Hou (JIRA)" <ji...@apache.org> on 2018/06/29 20:51:00 UTC
[jira] [Created] (DRILL-6565) cume_dist does not return enough rows

Robert Hou created DRILL-6565:
---------------------------------

             Summary: cume_dist does not return enough rows
                 Key: DRILL-6565
                 URL: https://issues.apache.org/jira/browse/DRILL-6565
             Project: Apache Drill
          Issue Type: Bug
          Components: Execution - Relational Operators
    Affects Versions: 1.14.0
            Reporter: Robert Hou
            Assignee: Pritesh Maker
         Attachments: drillbit.log.7802

This query should return 64 rows but only returns 38 rows:
alter session set `planner.width.max_per_node` = 1;
alter session set `planner.width.max_per_query` = 1;
select * from (
select cume_dist() over (order by Index) IntervalSecondValuea, Index from (select * from dfs.`/drill/testdata/batch_memory/fourvarchar_asc_nulls_16MB_1GB.parquet` order by BigIntvalue)) d where d.Index = 1;

I tried to reproduce the problem by using a smaller table, but it does not reproduce.  I tried to reproduce the problem without the outside select statement, but it does not reproduce.

Here is the explain plan:
{noformat}
| 00-00    Screen : rowType = RecordType(DOUBLE IntervalSecondValuea, ANY Index): rowcount = 12000.0, cumulative cost = {757200.0 rows, 1.1573335922911648E7 cpu, 0.0 io, 0.0 network, 1920000.0 memory}, id = 4034
00-01      ProjectAllowDup(IntervalSecondValuea=[$0], Index=[$1]) : rowType = RecordType(DOUBLE IntervalSecondValuea, ANY Index): rowcount = 12000.0, cumulative cost = {756000.0 rows, 1.1572135922911648E7 cpu, 0.0 io, 0.0 network, 1920000.0 memory}, id = 4033
00-02        Project(w0$o0=[$1], $0=[$0]) : rowType = RecordType(DOUBLE w0$o0, ANY $0): rowcount = 12000.0, cumulative cost = {744000.0 rows, 1.1548135922911648E7 cpu, 0.0 io, 0.0 network, 1920000.0 memory}, id = 4032
00-03          SelectionVectorRemover : rowType = RecordType(ANY $0, DOUBLE w0$o0): rowcount = 12000.0, cumulative cost = {732000.0 rows, 1.1524135922911648E7 cpu, 0.0 io, 0.0 network, 1920000.0 memory}, id = 4031
00-04            Filter(condition=[=($0, 1)]) : rowType = RecordType(ANY $0, DOUBLE w0$o0): rowcount = 12000.0, cumulative cost = {720000.0 rows, 1.1512135922911648E7 cpu, 0.0 io, 0.0 network, 1920000.0 memory}, id = 4030
00-05              Window(window#0=[window(partition {} order by [0] range between UNBOUNDED PRECEDING and CURRENT ROW aggs [CUME_DIST()])]) : rowType = RecordType(ANY $0, DOUBLE w0$o0): rowcount = 80000.0, cumulative cost = {640000.0 rows, 1.1144135922911648E7 cpu, 0.0 io, 0.0 network, 1920000.0 memory}, id = 4029
00-06                SelectionVectorRemover : rowType = RecordType(ANY $0): rowcount = 80000.0, cumulative cost = {560000.0 rows, 1.0984135922911648E7 cpu, 0.0 io, 0.0 network, 1920000.0 memory}, id = 4028
00-07                  Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY $0): rowcount = 80000.0, cumulative cost = {480000.0 rows, 1.0904135922911648E7 cpu, 0.0 io, 0.0 network, 1920000.0 memory}, id = 4027
00-08                    Project($0=[ITEM($0, 'Index')]) : rowType = RecordType(ANY $0): rowcount = 80000.0, cumulative cost = {400000.0 rows, 5692067.961455824 cpu, 0.0 io, 0.0 network, 1280000.0 memory}, id = 4026
00-09                      SelectionVectorRemover : rowType = RecordType(DYNAMIC_STAR T2¦¦**, ANY BigIntvalue): rowcount = 80000.0, cumulative cost = {320000.0 rows, 5612067.961455824 cpu, 0.0 io, 0.0 network, 1280000.0 memory}, id = 4025
00-10                        Sort(sort0=[$1], dir0=[ASC]) : rowType = RecordType(DYNAMIC_STAR T2¦¦**, ANY BigIntvalue): rowcount = 80000.0, cumulative cost = {240000.0 rows, 5532067.961455824 cpu, 0.0 io, 0.0 network, 1280000.0 memory}, id = 4024
00-11                          Project(T2¦¦**=[$0], BigIntvalue=[$1]) : rowType = RecordType(DYNAMIC_STAR T2¦¦**, ANY BigIntvalue): rowcount = 80000.0, cumulative cost = {160000.0 rows, 320000.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 4023
00-12                            Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/batch_memory/fourvarchar_asc_nulls_16MB_1GB.parquet]], selectionRoot=maprfs:/drill/testdata/batch_memory/fourvarchar_asc_nulls_16MB_1GB.parquet, numFiles=1, numRowGroups=6, usedMetadataFile=false, columns=[`**`]]]) : rowType = RecordType(DYNAMIC_STAR **, ANY BigIntvalue): rowcount = 80000.0, cumulative cost = {80000.0 rows, 160000.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 4022
{noformat}

I have attached the drillbit.log.

The commit id is:
| 1.14.0-SNAPSHOT  | aa127b70b1e46f7f4aa19881f25eda583627830a  | DRILL-6523: Fix NPE for describe of partial schema  | 22.06.2018 @ 11:28:23 PDT  | rhou@mapr.com  | 23.06.2018 @ 02:05:10 PDT  |

fourvarchar_asc_nulls95.q



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)