You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Jacques Nadeau (JIRA)" <ji...@apache.org> on 2015/03/20 21:43:39 UTC

[jira] [Updated] (DRILL-2504) Aggregate query with grouping results in Error

     [ https://issues.apache.org/jira/browse/DRILL-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jacques Nadeau updated DRILL-2504:
----------------------------------
    Component/s:     (was: Execution - Relational Operators)
                 Functions - Drill

> Aggregate query with grouping results in Error
> ----------------------------------------------
>
>                 Key: DRILL-2504
>                 URL: https://issues.apache.org/jira/browse/DRILL-2504
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Functions - Drill
>    Affects Versions: 0.8.0
>         Environment: 4 node cluster
> {code}
> 0: jdbc:drill:> select * from sys.version;
> +------------+----------------+-------------+-------------+------------+
> | commit_id  | commit_message | commit_time | build_email | build_time |
> +------------+----------------+-------------+-------------+------------+
> | f658a3c513ddf7f2d1b0ad7aa1f3f65049a594fe | DRILL-2209 Insert ProjectOperator with MuxExchange | 09.03.2015 @ 01:49:18 EDT | Unknown     | 09.03.2015 @ 04:52:49 EDT |
> +------------+----------------+-------------+-------------+------------+
> 1 row selected (0.062 seconds)
> {code}
>            Reporter: Khurram Faraaz
>            Assignee: Chris Westin
>
> The below aggregate query with group by over distinct/non-distinct data results in an Exception. Please note that I had set enable_hashagg=false and I was querying from a CSV file. Query was run over a four node cluster.
> alter system set `planner.enable_hashagg`=true;
> alter session set `planner.enable_hashagg`=true;
> {code}
> 0: jdbc:drill:> alter system set `planner.enable_hashagg`=false;
> +------------+------------+
> |     ok     |  summary   |
> +------------+------------+
> | true       | planner.enable_hashagg updated. |
> +------------+------------+
> 1 row selected (0.075 seconds)
> 0: jdbc:drill:> select columns[4], sum(columns[0]), count(distinct columns[1]), max(columns[2]), count(distinct columns[3]), max(columns[5]), min(columns[6]), avg(columns[7])
> . . . . . . . > from `conftest.csv`
> . . . . . . . > group by columns[4];
> Query failed: Query stopped., Failure while trying to materialize incoming schema.  Errors:
>  
> Error in expression at index -1.  Error: Missing function implementation: [castINT(BIT-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--.. [ 6cd09ba7-3e4b-4b3b-b111-39f74f53e1b0 on centos-01.qa.lab:31010 ]
> Error: exception while executing query: Failure while executing query. (state=,code=0)
> {code}
> {code}
> Stack trace from drillbit.log
> 2015-03-19 17:47:43,123 [2af4f441-8c04-99f9-1a12-a55a7c72ece7:frag:0:0] ERROR o.a.d.e.w.f.AbstractStatusReporter - Error bab1babd-48fe-4719-8a77-dc5826027ba7: Failure while running fragment.
> org.apache.drill.exec.exception.SchemaChangeException: Failure while trying to materialize incoming schema.  Errors:
> Error in expression at index -1.  Error: Missing function implementation: [castINT(BIT-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..
>         at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema(ProjectRecordBatch.java:390) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:78) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:99) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:89) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:67) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:97) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:57) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:121) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:303) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_75]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_75]
>         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
> 2015-03-19 17:47:43,124 [2af4f441-8c04-99f9-1a12-a55a7c72ece7:frag:0:0] INFO  o.a.drill.exec.work.foreman.Foreman - State change requested.  RUNNING --> FAILED
> org.apache.drill.exec.rpc.RemoteRpcException: Failure while running fragment., Failure while trying to materialize incoming schema.  Errors:
> Error in expression at index -1.  Error: Missing function implementation: [castINT(BIT-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--.. [ bab1babd-48fe-4719-8a77-dc5826027ba7 on centos-01.qa.lab:31010 ]
> [ bab1babd-48fe-4719-8a77-dc5826027ba7 on centos-01.qa.lab:31010 ]
>         at org.apache.drill.exec.work.foreman.QueryManager.statusUpdate(QueryManager.java:95) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.work.foreman.QueryManager$RootStatusReporter.statusChange(QueryManager.java:154) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.AbstractStatusReporter.fail(AbstractStatusReporter.java:114) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.AbstractStatusReporter.fail(AbstractStatusReporter.java:110) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor.internalFail(FragmentExecutor.java:176) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:123) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:303) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_75]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_75]
>         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
> 2015-03-19 17:47:43,141 [2af4f441-8c04-99f9-1a12-a55a7c72ece7:frag:0:0] INFO  o.a.drill.exec.work.foreman.Foreman - State change requested.  FAILED --> COMPLETED
> 2015-03-19 17:47:43,141 [2af4f441-8c04-99f9-1a12-a55a7c72ece7:frag:0:0] WARN  o.a.drill.exec.work.foreman.Foreman - Dropping request to move to COMPLETED state as query is already at FAILED state (which is terminal).
> 2015-03-19 17:47:43,142 [2af4f441-8c04-99f9-1a12-a55a7c72ece7:frag:0:0] ERROR o.a.drill.exec.work.foreman.Foreman - Error 82bd01dc-d27d-47e4-836d-426f3625511f: RemoteRpcException: Failure while running fragment., Failure while trying to materialize incoming schema.  Errors:
> Error in expression at index -1.  Error: Missing function implementation: [castINT(BIT-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--.. [ bab1babd-48fe-4719-8a77-dc5826027ba7 on centos-01.qa.lab:31010 ]
> [ bab1babd-48fe-4719-8a77-dc5826027ba7 on centos-01.qa.lab:31010 ]
> org.apache.drill.exec.rpc.RemoteRpcException: Failure while running fragment., Failure while trying to materialize incoming schema.  Errors:
> Error in expression at index -1.  Error: Missing function implementation: [castINT(BIT-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--.. [ bab1babd-48fe-4719-8a77-dc5826027ba7 on centos-01.qa.lab:31010 ]
> [ bab1babd-48fe-4719-8a77-dc5826027ba7 on centos-01.qa.lab:31010 ]
> at org.apache.drill.exec.work.foreman.QueryManager.statusUpdate(QueryManager.java:95) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.work.foreman.QueryManager$RootStatusReporter.statusChange(QueryManager.java:154) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.AbstractStatusReporter.fail(AbstractStatusReporter.java:114) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.AbstractStatusReporter.fail(AbstractStatusReporter.java:110) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor.internalFail(FragmentExecutor.java:176) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:123) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:303) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_75]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_75]
>         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
> {code}
> Physical plan for the failing query
> {code}
> 00-00    Screen : rowType = RecordType(ANY EXPR$0, ANY EXPR$1, BIGINT EXPR$2, ANY EXPR$3, BIGINT EXPR$4, ANY EXPR$5, ANY EXPR$6, ANY EXPR$7): rowcount = 29.5, cumulative cost = {832.95 rows, 13844.73454409817 cpu, 0.0 io, 0.0 network, 5852.799999999999 memory}, id = 2049
> 00-01      Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3], EXPR$4=[$4], EXPR$5=[$5], EXPR$6=[$6], EXPR$7=[$7]) : rowType = RecordType(ANY EXPR$0, ANY EXPR$1, BIGINT EXPR$2, ANY EXPR$3, BIGINT EXPR$4, ANY EXPR$5, ANY EXPR$6, ANY EXPR$7): rowcount = 29.5, cumulative cost = {830.0 rows, 13841.78454409817 cpu, 0.0 io, 0.0 network, 5852.799999999999 memory}, id = 2048
> 00-02        Project(EXPR$0=[$0], EXPR$1=[CASE(=($2, 0), null, $1)], EXPR$2=[$11], EXPR$3=[$3], EXPR$4=[$9], EXPR$5=[$4], EXPR$6=[$5], EXPR$7=[CAST(/(CastHigh(CASE(=($7, 0), null, $6)), $7)):ANY]) : rowType = RecordType(ANY EXPR$0, ANY EXPR$1, BIGINT EXPR$2, ANY EXPR$3, BIGINT EXPR$4, ANY EXPR$5, ANY EXPR$6, ANY EXPR$7): rowcount = 29.5, cumulative cost = {800.5 rows, 13809.78454409817 cpu, 0.0 io, 0.0 network, 5852.799999999999 memory}, id = 2047
> 00-03          MergeJoin(condition=[IS NOT DISTINCT FROM($0, $10)], joinType=[inner]) : rowType = RecordType(ANY EXPR$0, ANY $f1, BIGINT $f2, ANY EXPR$3, ANY EXPR$5, ANY EXPR$6, ANY $f6, BIGINT $f7, ANY EXPR$00, BIGINT EXPR$4, ANY EXPR$01, BIGINT EXPR$2): rowcount = 29.5, cumulative cost = {771.0 rows, 13777.78454409817 cpu, 0.0 io, 0.0 network, 5852.799999999999 memory}, id = 2046
> 00-05            MergeJoin(condition=[IS NOT DISTINCT FROM($0, $8)], joinType=[inner]) : rowType = RecordType(ANY EXPR$0, ANY $f1, BIGINT $f2, ANY EXPR$3, ANY EXPR$5, ANY EXPR$6, ANY $f6, BIGINT $f7, ANY EXPR$00, BIGINT EXPR$4): rowcount = 29.5, cumulative cost = {491.70000000000005 rows, 10177.344151873782 cpu, 0.0 io, 0.0 network, 4814.4 memory}, id = 2038
> 00-08              StreamAgg(group=[{0}], agg#0=[$SUM0($1)], agg#1=[COUNT($1)], EXPR$3=[MAX($3)], EXPR$5=[MAX($5)], EXPR$6=[MIN($6)], agg#5=[$SUM0($7)], agg#6=[COUNT($7)]) : rowType = RecordType(ANY EXPR$0, ANY $f1, BIGINT $f2, ANY EXPR$3, ANY EXPR$5, ANY EXPR$6, ANY $f6, BIGINT $f7): rowcount = 5.9, cumulative cost = {236.0 rows, 6671.303759649394 cpu, 0.0 io, 0.0 network, 3776.0 memory}, id = 2030
> 00-11                Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY EXPR$0, ANY $f1, ANY $f2, ANY $f3, ANY $f4, ANY $f5, ANY $f6, ANY $f7): rowcount = 59.0, cumulative cost = {177.0 rows, 1479.3037596493946 cpu, 0.0 io, 0.0 network, 3776.0 memory}, id = 2029
> 00-14                  Project(EXPR$0=[ITEM($0, 4)], $f1=[ITEM($0, 0)], $f2=[ITEM($0, 1)], $f3=[ITEM($0, 2)], $f4=[ITEM($0, 3)], $f5=[ITEM($0, 5)], $f6=[ITEM($0, 6)], $f7=[ITEM($0, 7)]) : rowType = RecordType(ANY EXPR$0, ANY $f1, ANY $f2, ANY $f3, ANY $f4, ANY $f5, ANY $f6, ANY $f7): rowcount = 59.0, cumulative cost = {118.0 rows, 91.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 2028
> 00-17                    Scan(groupscan=[EasyGroupScan [selectionRoot=/tmp/conftest.csv, numFiles=1, columns=[`columns`[4], `columns`[0], `columns`[1], `columns`[2], `columns`[3], `columns`[5], `columns`[6], `columns`[7]], files=[maprfs:/tmp/conftest.csv]]]) : rowType = RecordType(ANY columns): rowcount = 59.0, cumulative cost = {59.0 rows, 59.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 2027
> 00-07              Project(EXPR$00=[$0], EXPR$4=[$1]) : rowType = RecordType(ANY EXPR$00, BIGINT EXPR$4): rowcount = 1.0, cumulative cost = {248.8 rows, 3478.440392224387 cpu, 0.0 io, 0.0 network, 1038.4 memory}, id = 2037
> 00-10                StreamAgg(group=[{0}], EXPR$4=[COUNT($1)]) : rowType = RecordType(ANY EXPR$0, BIGINT EXPR$4): rowcount = 1.0, cumulative cost = {247.8 rows, 3470.440392224387 cpu, 0.0 io, 0.0 network, 1038.4 memory}, id = 2036
> 00-13                  Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY EXPR$0, ANY $f4): rowcount = 5.9, cumulative cost = {241.9 rows, 3376.040392224387 cpu, 0.0 io, 0.0 network, 1038.4 memory}, id = 2035
> 00-16                    StreamAgg(group=[{0, 1}]) : rowType = RecordType(ANY EXPR$0, ANY $f4): rowcount = 5.9, cumulative cost = {236.0 rows, 3315.607519298789 cpu, 0.0 io, 0.0 network, 944.0 memory}, id = 2034
> 00-19                      Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) : rowType = RecordType(ANY EXPR$0, ANY $f4): rowcount = 59.0, cumulative cost = {177.0 rows, 2843.607519298789 cpu, 0.0 io, 0.0 network, 944.0 memory}, id = 2033
> 00-21                        Project(EXPR$0=[ITEM($0, 4)], $f4=[ITEM($0, 3)]) : rowType = RecordType(ANY EXPR$0, ANY $f4): rowcount = 59.0, cumulative cost = {118.0 rows, 67.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 2032
> 00-22                          Scan(groupscan=[EasyGroupScan [selectionRoot=/tmp/conftest.csv, numFiles=1, columns=[`columns`[4], `columns`[3]], files=[maprfs:/tmp/conftest.csv]]]) : rowType = RecordType(ANY columns): rowcount = 59.0, cumulative cost = {59.0 rows, 59.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 2031
> 00-04            Project(EXPR$01=[$0], EXPR$2=[$1]) : rowType = RecordType(ANY EXPR$01, BIGINT EXPR$2): rowcount = 1.0, cumulative cost = {248.8 rows, 3478.440392224387 cpu, 0.0 io, 0.0 network, 1038.4 memory}, id = 2045
> 00-06              StreamAgg(group=[{0}], EXPR$2=[COUNT($1)]) : rowType = RecordType(ANY EXPR$0, BIGINT EXPR$2): rowcount = 1.0, cumulative cost = {247.8 rows, 3470.440392224387 cpu, 0.0 io, 0.0 network, 1038.4 memory}, id = 2044
> 00-09                Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY EXPR$0, ANY $f2): rowcount = 5.9, cumulative cost = {241.9 rows, 3376.040392224387 cpu, 0.0 io, 0.0 network, 1038.4 memory}, id = 2043
> 00-12                  StreamAgg(group=[{0, 1}]) : rowType = RecordType(ANY EXPR$0, ANY $f2): rowcount = 5.9, cumulative cost = {236.0 rows, 3315.607519298789 cpu, 0.0 io, 0.0 network, 944.0 memory}, id = 2042
> 00-15                    Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) : rowType = RecordType(ANY EXPR$0, ANY $f2): rowcount = 59.0, cumulative cost = {177.0 rows, 2843.607519298789 cpu, 0.0 io, 0.0 network, 944.0 memory}, id = 2041
> 00-18                      Project(EXPR$0=[ITEM($0, 4)], $f2=[ITEM($0, 1)]) : rowType = RecordType(ANY EXPR$0, ANY $f2): rowcount = 59.0, cumulative cost = {118.0 rows, 67.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 2040
> 00-20                        Scan(groupscan=[EasyGroupScan [selectionRoot=/tmp/conftest.csv, numFiles=1, columns=[`columns`[4], `columns`[1]], files=[maprfs:/tmp/conftest.csv]]]) : rowType = RecordType(ANY columns): rowcount = 59.0, cumulative cost = {59.0 rows, 59.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 2039
> {code}
> Query over view returns correct results
> {code}
> 0: jdbc:drill:> create view v8(col_decimal, col_int, col_bigint, col_char, col_varchar, col_timestmp, col_float, col_double)
> . . . . . . . > as
> . . . . . . . > select cast(columns[0] as decimal(11,3)),
> . . . . . . . > cast(columns[1] as int),
> . . . . . . . > cast(columns[2] as bigint),
> . . . . . . . > cast(columns[3] as char(9)),
> . . . . . . . > cast(columns[4] as varchar(256)),
> . . . . . . . > cast(columns[5] as timestamp),
> . . . . . . . > cast(columns[6] as float),
> . . . . . . . > cast(columns[7] as double)
> . . . . . . . > from `conftest.csv`;
> +------------+------------+
> |     ok     |  summary   |
> +------------+------------+
> | true       | View 'v8' created successfully in 'dfs.tmp' schema |
> +------------+------------+
> 1 row selected (0.102 seconds)
> 0: jdbc:drill:> select col_varchar, sum(col_decimal), count(distinct col_int)
> . . . . . . . > from v8
> . . . . . . . > group by col_varchar;
> 209 rows selected (0.713 seconds)
> {code}
> 1. Original query over the VIEW returns correct results.
> 2. Original query over the CSV file, and with casting in the select statement on csv data, also returned correct results.
> So the problem seems to be when we run original aggregate+grouping query over CSV file, when we do not explicitly cast the columns.
> {code}
> 0: jdbc:drill:> select col_varchar, sum(col_decimal), count(distinct col_int), max(col_bigint), count(distinct col_char), max(col_timestmp), min(col_float), avg(col_double)
> . . . . . . . > from v8
> . . . . . . . > group by col_varchar;
> +-------------+------------+------------+------------+------------+------------+------------+------------+
> | col_varchar |   EXPR$1   |   EXPR$2   |   EXPR$3   |   EXPR$4   |   EXPR$5   |   EXPR$6   |   EXPR$7   |
> +-------------+------------+------------+------------+------------+------------+------------+------------+
> 209 rows selected (1.286 seconds)
> The below query returns correct results
> 0: jdbc:drill:> select columns[4], sum(cast(columns[0] as decimal(11,3))), count(distinct cast(columns[1] as int)), max(cast(columns[2] as bigint)), count(distinct cast(columns[3] as char(9))), max(cast(columns[5] as timestamp)), min(cast(columns[6] as float)), avg(cast(columns[7] as double))
> . . . . . . . > from `conftest.csv`
> . . . . . . . > group by columns[4];
> 209 rows selected (1.129 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)