You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Volodymyr Vysotskyi (JIRA)" <ji...@apache.org> on 2018/11/12 12:43:00 UTC

[jira] [Commented] (DRILL-6839) Failed to plan (aggregate + Hash or NL join) when slice target is low

    [ https://issues.apache.org/jira/browse/DRILL-6839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16683739#comment-16683739 ] 

Volodymyr Vysotskyi commented on DRILL-6839:
--------------------------------------------

[~amansinha100], is it possible to modify {{StreamAggPrule}} to create either two-phase aggregation or single-phase one for the case when two-phase cannot be created, similar to [{{ProjectPrule}}|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ProjectPrule.java#L68] and other rules?
Or was {{StreamAggPrule}} implemented in such a way because if the plan has both single- and two-phase aggregations, the single one will have less cost? 
If it is true, can we modify the cost calculations to justify it depending on the row count and values of broadcast options?

> Failed to plan (aggregate + Hash or NL join) when slice target is low 
> ----------------------------------------------------------------------
>
>                 Key: DRILL-6839
>                 URL: https://issues.apache.org/jira/browse/DRILL-6839
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Igor Guzenko
>            Priority: Major
>             Fix For: 1.16.0
>
>
> *Case 1.* When nested loop join is about to be used:
>  - Option "_planner.enable_nljoin_for_scalar_only_" is set to false
>  - Option "_planner.slice_target_" is set to low value for imitation of big input tables
>  
> {code:java}
> @Category(SqlTest.class)
> public class CrossJoinTest extends ClusterTest {
>  @BeforeClass
>  public static void setUp() throws Exception {
>  startCluster(ClusterFixture.builder(dirTestWatcher));
>  }
>  @Test
>  public void testCrossJoinSucceedsForLowSliceTarget() throws Exception {
>    try {
>      client.alterSession(PlannerSettings.NLJOIN_FOR_SCALAR.getOptionName(), false);
>      client.alterSession(ExecConstants.SLICE_TARGET, 1);
>      queryBuilder().sql(
>         "SELECT COUNT(l.nation_id) " +
>         "FROM cp.`tpch/nation.parquet` l " +
>         ", cp.`tpch/region.parquet` r")
>      .run();
>    } finally {
>     client.resetSession(ExecConstants.SLICE_TARGET);
>     client.resetSession(PlannerSettings.NLJOIN_FOR_SCALAR.getOptionName());
>    }
>  }
> }{code}
>  
> *Case 2.* When hash join is about to be used:
>  - Option "planner.enable_mergejoin" is set to false, so hash join will be used instead
>  - Option "planner.slice_target" is set to low value for imitation of big input tables
>  - Comment out //ruleList.add(HashJoinPrule.DIST_INSTANCE); in PlannerPhase.getPhysicalRules method
> {code:java}
> @Category(SqlTest.class)
> public class CrossJoinTest extends ClusterTest {
>  @BeforeClass
>  public static void setUp() throws Exception {
>    startCluster(ClusterFixture.builder(dirTestWatcher));
>  }
>  @Test
>  public void testInnerJoinSucceedsForLowSliceTarget() throws Exception {
>    try {
>     client.alterSession(PlannerSettings.MERGEJOIN.getOptionName(), false);
>     client.alterSession(ExecConstants.SLICE_TARGET, 1);
>     queryBuilder().sql(
>       "SELECT COUNT(l.nation_id) " +
>       "FROM cp.`tpch/nation.parquet` l " +
>       "INNER JOIN cp.`tpch/region.parquet` r " +
>       "ON r.nation_id = l.nation_id")
>     .run();
>    } finally {
>     client.resetSession(ExecConstants.SLICE_TARGET);
>     client.resetSession(PlannerSettings.MERGEJOIN.getOptionName());
>    }
>  }
> }
> {code}
>  
> *Workaround:* To avoid the exception we need to set option "_planner.enable_multiphase_agg_" to false. By doing this we avoid unsuccessful attempts to create 2 phase aggregation plan in StreamAggPrule and guarantee that logical aggregate will be converted to physical one. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)