You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Khurram Faraaz (JIRA)" <ji...@apache.org> on 2016/01/08 12:52:39 UTC

[jira] [Created] (DRILL-4255) SELECT DISTINCT query over JSON data returns UNSUPPORTED OPERATION

Khurram Faraaz created DRILL-4255:
-------------------------------------

             Summary: SELECT DISTINCT query over JSON data returns UNSUPPORTED OPERATION
                 Key: DRILL-4255
                 URL: https://issues.apache.org/jira/browse/DRILL-4255
             Project: Apache Drill
          Issue Type: Bug
          Components: Execution - Flow
    Affects Versions: 1.4.0
         Environment: CentOS
            Reporter: Khurram Faraaz


SELECT DISTINCT over mapr fs generated audit logs (JSON files) results in unsupported operation. An exact query over another set of JSON data returns correct results.

MapR Drill 1.4.0, commit ID : 9627a80f
MapRBuildVersion : 5.1.0.36488.GA
OS : CentOS x86_64 GNU/Linux

{noformat}
0: jdbc:drill:schema=dfs.tmp> select distinct t.operation from `auditlogs` t;
Error: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema changes

Fragment 3:3

[Error Id: 1233bf68-13da-4043-a162-cf6d98c07ec9 on example.com:31010] (state=,code=0)
{noformat}

Stack trace from drillbit.log

{noformat}
2016-01-08 11:35:35,093 [297060f9-1c7a-b32c-09e8-24b5ad863e73:frag:3:3] INFO  o.a.d.e.p.i.aggregate.HashAggBatch - User Error Occurred
org.apache.drill.common.exceptions.UserException: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema changes


[Error Id: 1233bf68-13da-4043-a162-cf6d98c07ec9 ]
        at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534) ~[drill-common-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext(HashAggBatch.java:144) [drill-java-exec-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) [drill-java-exec-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) [drill-java-exec-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) [drill-java-exec-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) [drill-java-exec-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:132) [drill-java-exec-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) [drill-java-exec-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) [drill-java-exec-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:93) [drill-java-exec-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) [drill-java-exec-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:256) [drill-java-exec-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:250) [drill-java-exec-1.4.0.jar:1.4.0]
        at java.security.AccessController.doPrivileged(Native Method) [na:1.7.0_65]
        at javax.security.auth.Subject.doAs(Subject.java:415) [na:1.7.0_65]
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595) [hadoop-common-2.7.0-mapr-1506.jar:na]
         at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:250) [drill-java-exec-1.4.0.jar:1.4.0]
        at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.4.0.jar:1.4.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_65]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_65]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
{noformat}

Query plan for above query.
{noformat}
00-00    Screen : rowType = RecordType(ANY operation): rowcount = 141437.16, cumulative cost = {3.4100499276E7 rows, 1.69455861396E8 cpu, 0.0 io, 1.2165858754560001E10 network, 2.7382234176000005E8 memory}, id = 7572
00-01      UnionExchange : rowType = RecordType(ANY operation): rowcount = 141437.16, cumulative cost = {3.408635556E7 rows, 1.6944171768E8 cpu, 0.0 io, 1.2165858754560001E10 network, 2.7382234176000005E8 memory}, id = 7571
01-01        Project(operation=[$0]) : rowType = RecordType(ANY operation): rowcount = 141437.16, cumulative cost = {3.3944918400000006E7 rows, 1.683102204E8 cpu, 0.0 io, 1.15865321472E10 network, 2.7382234176000005E8 memory}, id = 7570
01-02          HashAgg(group=[{0}]) : rowType = RecordType(ANY operation): rowcount = 141437.16, cumulative cost = {3.3944918400000006E7 rows, 1.683102204E8 cpu, 0.0 io, 1.15865321472E10 network, 2.7382234176000005E8 memory}, id = 7569
01-03            Project(operation=[$0]) : rowType = RecordType(ANY operation): rowcount = 1414371.6, cumulative cost = {3.2530546800000004E7 rows, 1.569952476E8 cpu, 0.0 io, 1.15865321472E10 network, 2.4892940160000002E8 memory}, id = 7568
01-04              HashToRandomExchange(dist0=[[$0]]) : rowType = RecordType(ANY operation, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1414371.6, cumulative cost = {3.2530546800000004E7 rows, 1.569952476E8 cpu, 0.0 io, 1.15865321472E10 network, 2.4892940160000002E8 memory}, id = 7567
02-01                UnorderedMuxExchange : rowType = RecordType(ANY operation, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1414371.6, cumulative cost = {3.1116175200000003E7 rows, 1.34365302E8 cpu, 0.0 io, 0.0 network, 2.4892940160000002E8 memory}, id = 7566
03-01                  Project(operation=[$0], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)]) : rowType = RecordType(ANY operation, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1414371.6, cumulative cost = {2.97018036E7 rows, 1.329509304E8 cpu, 0.0 io, 0.0 network, 2.4892940160000002E8 memory}, id = 7565
03-02                    HashAgg(group=[{0}]) : rowType = RecordType(ANY operation): rowcount = 1414371.6, cumulative cost = {2.8287432E7 rows, 1.27293444E8 cpu, 0.0 io, 0.0 network, 2.4892940160000002E8 memory}, id = 7564
03-03                      Scan(groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/auditlogs, numFiles=31, columns=[`operation`], files=[maprfs:/tmp/auditlogs/DBAudit.log-2015-12-30-001.json, maprfs:/tmp/auditlogs/DBAudit.log-2015-12-28-002.json, maprfs:/tmp/auditlogs/FSAudit.log-2015-12-31-001.json, maprfs:/tmp/auditlogs/FSAudit.log-2016-01-06-003.json, maprfs:/tmp/auditlogs/FSAudit.log-2015-12-28-002.json, maprfs:/tmp/auditlogs/DBAudit.log-2015-12-28-001.json, maprfs:/tmp/auditlogs/FSAudit.log-2015-12-30-001.json, maprfs:/tmp/auditlogs/FSAudit.log-2015-12-28-003.json, maprfs:/tmp/auditlogs/DBAudit.log-2015-12-31-002.json, maprfs:/tmp/auditlogs/FSAudit.log-2016-01-04-001.json, maprfs:/tmp/auditlogs/DBAudit.log-2016-01-06-001.json, maprfs:/tmp/auditlogs/DBAudit.log-2015-12-28-003.json, maprfs:/tmp/auditlogs/FSAudit.log-2015-12-31-002.json, maprfs:/tmp/auditlogs/DBAudit.log-2016-01-06-003.json, maprfs:/tmp/auditlogs/FSAudit.log-2015-12-31-003.json, maprfs:/tmp/auditlogs/FSAudit.log-2016-01-06-001.json, maprfs:/tmp/auditlogs/FSAudit.log-2016-01-03-001.json, maprfs:/tmp/auditlogs/DBAudit.log-2015-12-31-001.json, maprfs:/tmp/auditlogs/DBAudit.log-2015-12-29-001.json, maprfs:/tmp/auditlogs/DBAudit.log-2015-12-28-004.json, maprfs:/tmp/auditlogs/FSAudit.log-2016-01-01-001.json, maprfs:/tmp/auditlogs/FSAudit.log-2015-12-28-004.json, maprfs:/tmp/auditlogs/FSAudit.log-2015-12-29-001.json, maprfs:/tmp/auditlogs/FSAudit.log-2015-12-28-001.json, maprfs:/tmp/auditlogs/DBAudit.log-2016-01-01-001.json, maprfs:/tmp/auditlogs/FSAudit.log-2016-01-06-004.json, maprfs:/tmp/auditlogs/DBAudit.log-2016-01-06-004.json, maprfs:/tmp/auditlogs/FSAudit.log-2016-01-06-002.json, maprfs:/tmp/auditlogs/FSAudit.log-2016-01-07-001.json, maprfs:/tmp/auditlogs/DBAudit.log-2016-01-06-002.json, maprfs:/tmp/auditlogs/FSAudit.log-2016-01-08-001.json]]]) : rowType = RecordType(ANY operation): rowcount = 1.4143716E7, cumulative cost = {1.4143716E7 rows, 1.4143716E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 7563
{noformat}

Another query that is exactly like the failing query reported here, this one returns correct results though.

{noformat}
0: jdbc:drill:schema=dfs.tmp> select distinct t.key2 from `twoKeyJsn.json` t;
+-------+
| key2  |
+-------+
| d     |
| c     |
| b     |
| 1     |
| a     |
| 0     |
| k     |
| m     |
| j     |
| h     |
| e     |
| n     |
| g     |
| f     |
| l     |
| i     |
+-------+
16 rows selected (27.097 seconds)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)