You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Bridget Bevens (JIRA)" <ji...@apache.org> on 2018/12/04 01:07:00 UTC
[jira] [Commented] (DRILL-786) Implement CROSS JOIN

    [ https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16708037#comment-16708037 ] 

Bridget Bevens commented on DRILL-786:
--------------------------------------

Hi [~IhorHuzenko] and [~vvysotskyi],

I've updated the following doc sections with info for CROSS JOIN support:
 * [https://drill.apache.org/docs/from-clause/#join-types] (I also included a cross join example in the Examples section)
 * [https://drill.apache.org/docs/select/#joins]

Please review the updates and let me know if I need to make any changes.

Thanks!
 Bridget

> Implement CROSS JOIN
> --------------------
>
>                 Key: DRILL-786
>                 URL: https://issues.apache.org/jira/browse/DRILL-786
>             Project: Apache Drill
>          Issue Type: New Feature
>          Components: Query Planning &amp; Optimization
>    Affects Versions: 1.14.0
>            Reporter: Krystal
>            Assignee: Igor Guzenko
>            Priority: Major
>              Labels: doc-complete, ready-to-commit
>             Fix For: 1.15.0
>
>
> *For documentation:*
> Due to it's nature cross joins can produce extremely large results, and we don't recommend to use the feature if you don't know that results won't cause out of memory errors. That's why cross joins are disabled by default, to allow explicit cross join syntax you'll have to enable it by setting  planner.enable_nljoin_for_scalar_only option to false. There is also another limitation related to usage of aggregation function over cross join relation. When input row count for aggregate function is bigger than value of planner.slice_target option then query can't be planned (because 2 phase aggregation can't be created in such case), as a workaround you should set planner.enable_multiphase_agg to false. This limitation will be active until fix of https://issues.apache.org/jira/browse/DRILL-6839. 
> ---------------------------------------------------------------------------------------------------------------------
> git.commit.id.abbrev=5d7e3d3
> 0: jdbc:drill:schema=dfs> select student.name, student.age, student.studentnum from student cross join voter where student.age = 20 and voter.age = 20;
> Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2"
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): rowcount = 22500.0, cumulative cost = {inf}, id = 320
>   DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id = 316
>     DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314
>       DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312
>         DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307
>           DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 129
>         DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310
>           DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 2000.0 cpu, 0.0 io, 0.0 network}, id = 140
> Stack trace:
> org.eigenbase.relopt.RelOptPlanner$CannotPlanException: Node [rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]] could not be implemented; planner state:
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): rowcount = 22500.0, cumulative cost = {inf}, id = 320
>   DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id = 316
>     DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314
>       DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312
>         DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307
>           DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 129
>         DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310
>           DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 2000.0 cpu, 0.0 io, 0.0 network}, id = 140
> Sets:
> Set#22, type: (DrillRecordRow[*, age, name, studentnum])
> rel#306:Subset#22.LOGICAL.ANY([]).[], best=rel#129, importance=0.5904900000000001
> rel#129:DrillScanRel.LOGICAL.ANY([]).[](table=[dfs, student]), rowcount=1000.0, cumulative cost={1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}
> rel#333:AbstractConverter.LOGICAL.ANY([]).[](child=rel#332:Subset#22.PHYSICAL.ANY([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]), rowcount=1000.0, cumulative cost={inf}
> rel#337:AbstractConverter.LOGICAL.ANY([]).[](child=rel#336:Subset#22.PHYSICAL.SINGLETON([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]), rowcount=1000.0, cumulative cost={inf}
> rel#332:Subset#22.PHYSICAL.ANY([]).[], best=rel#335, importance=0.531441
> rel#334:AbstractConverter.PHYSICAL.ANY([]).[](child=rel#306:Subset#22.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=ANY([]),sort=[]), rowcount=1000.0, cumulative cost={inf}
> rel#338:AbstractConverter.PHYSICAL.ANY([]).[](child=rel#336:Subset#22.PHYSICAL.SINGLETON([]).[],convention=PHYSICAL,DrillDistributionTraitDef=ANY([]),sort=[]), rowcount=1000.0, cumulative cost={inf}
> rel#339:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#306:Subset#22.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]), rowcount=1000.0, cumulative cost={inf}
> rel#340:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#332:Subset#22.PHYSICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]), rowcount=1000.0, cumulative cost={inf}
> rel#335:ScanPrel.PHYSICAL.SINGLETON([]).[](groupscan=ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/p1tests/student]], selectionRoot=/drill/testdata/p1tests/student, columns=[SchemaPath [`age`], SchemaPath [`name`], SchemaPath [`studentnum`]]]), rowcount=1000.0, cumulative cost={1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}
> rel#336:Subset#22.PHYSICAL.SINGLETON([]).[], best=rel#335, importance=0.4782969000000001
> rel#339:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#306:Subset#22.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]), rowcount=1000.0, cumulative cost={inf}
> rel#340:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#332:Subset#22.PHYSICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]), rowcount=1000.0, cumulative cost={inf}
> rel#335:ScanPrel.PHYSICAL.SINGLETON([]).[](groupscan=ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/p1tests/student]], selectionRoot=/drill/testdata/p1tests/student, columns=[SchemaPath [`age`], SchemaPath [`name`], SchemaPath [`studentnum`]]]), rowcount=1000.0, cumulative cost={1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}
> Set#23, type: (DrillRecordRow[*, age, name, studentnum])
> rel#308:Subset#23.LOGICAL.ANY([]).[], best=rel#307, importance=0.6561
> rel#307:DrillFilterRel.LOGICAL.ANY([]).[](child=rel#306:Subset#22.LOGICAL.ANY([]).[],condition==(CAST($1):INTEGER, 20)), rowcount=150.0, cumulative cost={2000.0 rows, 8000.0 cpu, 0.0 io, 0.0 network}
> rel#343:AbstractConverter.LOGICAL.ANY([]).[](child=rel#342:Subset#23.PHYSICAL.SINGLETON([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]), rowcount=150.0, cumulative cost={inf}
> rel#342:Subset#23.PHYSICAL.SINGLETON([]).[], best=rel#341, importance=0.5904900000000001
> rel#344:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#308:Subset#23.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]), rowcount=150.0, cumulative cost={inf}
> rel#341:FilterPrel.PHYSICAL.SINGLETON([]).[](child=rel#332:Subset#22.PHYSICAL.ANY([]).[],condition==(CAST($1):INTEGER, 20)), rowcount=150.0, cumulative cost={2000.0 rows, 8000.0 cpu, 0.0 io, 0.0 network}
> Set#24, type: (DrillRecordRow[*, age])
> rel#309:Subset#24.LOGICAL.ANY([]).[], best=rel#140, importance=0.5904900000000001
> rel#140:DrillScanRel.LOGICAL.ANY([]).[](table=[dfs, voter]), rowcount=1000.0, cumulative cost={1000.0 rows, 2000.0 cpu, 0.0 io, 0.0 network}
> rel#330:AbstractConverter.LOGICAL.ANY([]).[](child=rel#329:Subset#24.PHYSICAL.ANY([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]), rowcount=1000.0, cumulative cost={inf}
> rel#349:AbstractConverter.LOGICAL.ANY([]).[](child=rel#348:Subset#24.PHYSICAL.SINGLETON([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]), rowcount=1000.0, cumulative cost={inf}
> rel#329:Subset#24.PHYSICAL.ANY([]).[], best=rel#347, importance=0.531441
> rel#331:AbstractConverter.PHYSICAL.ANY([]).[](child=rel#309:Subset#24.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=ANY([]),sort=[]), rowcount=1000.0, cumulative cost={inf}
> rel#350:AbstractConverter.PHYSICAL.ANY([]).[](child=rel#348:Subset#24.PHYSICAL.SINGLETON([]).[],convention=PHYSICAL,DrillDistributionTraitDef=ANY([]),sort=[]), rowcount=1000.0, cumulative cost={inf}
> rel#351:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#309:Subset#24.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]), rowcount=1000.0, cumulative cost={inf}
> rel#352:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#329:Subset#24.PHYSICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]), rowcount=1000.0, cumulative cost={inf}
> rel#347:ScanPrel.PHYSICAL.SINGLETON([]).[](groupscan=ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/p1tests/voter]], selectionRoot=/drill/testdata/p1tests/voter, columns=[SchemaPath [`age`]]]), rowcount=1000.0, cumulative cost={1000.0 rows, 2000.0 cpu, 0.0 io, 0.0 network}
> rel#348:Subset#24.PHYSICAL.SINGLETON([]).[], best=rel#347, importance=0.4782969000000001
> rel#351:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#309:Subset#24.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]), rowcount=1000.0, cumulative cost={inf}
> rel#352:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#329:Subset#24.PHYSICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]), rowcount=1000.0, cumulative cost={inf}
> rel#347:ScanPrel.PHYSICAL.SINGLETON([]).[](groupscan=ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/p1tests/voter]], selectionRoot=/drill/testdata/p1tests/voter, columns=[SchemaPath [`age`]]]), rowcount=1000.0, cumulative cost={1000.0 rows, 2000.0 cpu, 0.0 io, 0.0 network}
> Set#25, type: (DrillRecordRow[*, age])
> rel#311:Subset#25.LOGICAL.ANY([]).[], best=rel#310, importance=0.6561
> rel#310:DrillFilterRel.LOGICAL.ANY([]).[](child=rel#309:Subset#24.LOGICAL.ANY([]).[],condition==(CAST($1):INTEGER, 20)), rowcount=150.0, cumulative cost={2000.0 rows, 6000.0 cpu, 0.0 io, 0.0 network}
> rel#355:AbstractConverter.LOGICAL.ANY([]).[](child=rel#354:Subset#25.PHYSICAL.SINGLETON([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]), rowcount=150.0, cumulative cost={inf}
> rel#354:Subset#25.PHYSICAL.SINGLETON([]).[], best=rel#353, importance=0.5904900000000001
> rel#356:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#311:Subset#25.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]), rowcount=150.0, cumulative cost={inf}
> rel#353:FilterPrel.PHYSICAL.SINGLETON([]).[](child=rel#329:Subset#24.PHYSICAL.ANY([]).[],condition==(CAST($1):INTEGER, 20)), rowcount=150.0, cumulative cost={2000.0 rows, 6000.0 cpu, 0.0 io, 0.0 network}
> Set#26, type: RecordType(ANY *, ANY age, ANY name, ANY studentnum, ANY *0, ANY age0)
> rel#313:Subset#26.LOGICAL.ANY([]).[], best=rel#312, importance=0.7290000000000001
> rel#312:DrillJoinRel.LOGICAL.ANY([]).[](left=rel#308:Subset#23.LOGICAL.ANY([]).[],right=rel#311:Subset#25.LOGICAL.ANY([]).[],condition=true,joinType=inner), rowcount=22500.0, cumulative cost={4001.0 rows, 14001.0 cpu, 0.0 io, 0.0 network}
> rel#327:AbstractConverter.LOGICAL.ANY([]).[](child=rel#326:Subset#26.PHYSICAL.ANY([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]), rowcount=1.7976931348623157E308, cumulative cost={inf}
> rel#326:Subset#26.PHYSICAL.ANY([]).[], best=null, importance=0.6561
> rel#328:AbstractConverter.PHYSICAL.ANY([]).[](child=rel#313:Subset#26.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=ANY([]),sort=[]), rowcount=22500.0, cumulative cost={inf}
> Set#27, type: RecordType(ANY name, ANY age, ANY studentnum)
> rel#315:Subset#27.LOGICAL.ANY([]).[], best=rel#314, importance=0.81
> rel#314:DrillProjectRel.LOGICAL.ANY([]).[](child=rel#313:Subset#26.LOGICAL.ANY([]).[],name=$2,age=$1,studentnum=$3), rowcount=22500.0, cumulative cost={26501.0 rows, 14013.0 cpu, 0.0 io, 0.0 network}
> rel#322:AbstractConverter.LOGICAL.ANY([]).[](child=rel#321:Subset#27.PHYSICAL.SINGLETON([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]), rowcount=1.7976931348623157E308, cumulative cost={inf}
> rel#321:Subset#27.PHYSICAL.SINGLETON([]).[], best=null, importance=0.7290000000000001
> rel#323:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#315:Subset#27.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]), rowcount=22500.0, cumulative cost={inf}
> Set#28, type: RecordType(ANY name, ANY age, ANY studentnum)
> rel#317:Subset#28.LOGICAL.ANY([]).[], best=rel#316, importance=0.9
> rel#316:DrillScreenRel.LOGICAL.ANY([]).[](child=rel#315:Subset#27.LOGICAL.ANY([]).[]), rowcount=22500.0, cumulative cost={28751.0 rows, 16263.0 cpu, 0.0 io, 0.0 network}
> rel#319:AbstractConverter.LOGICAL.ANY([]).[](child=rel#318:Subset#28.PHYSICAL.SINGLETON([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]), rowcount=1.7976931348623157E308, cumulative cost={inf}
> rel#318:Subset#28.PHYSICAL.SINGLETON([]).[], best=null, importance=1.0
> rel#320:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#317:Subset#28.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]), rowcount=22500.0, cumulative cost={inf}
> rel#324:ScreenPrel.PHYSICAL.SINGLETON([]).[](child=rel#321:Subset#27.PHYSICAL.SINGLETON([]).[]), rowcount=1.7976931348623157E308, cumulative cost={inf}
> org.eigenbase.relopt.volcano.RelSubset$CheapestPlanReplacer.visit(RelSubset.java:445) ~[optiq-core-0.7-20140513.013236-5.jar:na]
> org.eigenbase.relopt.volcano.RelSubset.buildCheapestPlan(RelSubset.java:287) ~[optiq-core-0.7-20140513.013236-5.jar:na]
> org.eigenbase.relopt.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:669) ~[optiq-core-0.7-20140513.013236-5.jar:na]
> net.hydromatic.optiq.prepare.PlannerImpl.transform(PlannerImpl.java:271) ~[optiq-core-0.7-20140513.013236-5.jar:na]
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToPrel(DefaultSqlHandler.java:119) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:89) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:134) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
> org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:338) [drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
> org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:186) [drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_45]
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_45]
> java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)