You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Igor Guzenko (JIRA)" <ji...@apache.org> on 2018/11/09 12:39:00 UTC
[jira] [Updated] (DRILL-6839) Failed to plan (aggregate + Hash or NL join) when slice target is low

     [ https://issues.apache.org/jira/browse/DRILL-6839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Igor Guzenko updated DRILL-6839:
--------------------------------
    Description: 
*Case 1.* When nested loop join is about to be used:
 - Option "_planner.enable_nljoin_for_scalar_only_" is set to false
 - Option "_planner.slice_target_" is set to low value for imitation of big input tables

 
{code:java}
@Category(SqlTest.class)
public class CrossJoinTest extends ClusterTest {
 @BeforeClass
 public static void setUp() throws Exception {
 startCluster(ClusterFixture.builder(dirTestWatcher));
 }

 @Test
 public void testCrossJoinSucceedsForLowSliceTarget() throws Exception {
   try {
     client.alterSession(PlannerSettings.NLJOIN_FOR_SCALAR.getOptionName(), false);
     client.alterSession(ExecConstants.SLICE_TARGET, 1);
     queryBuilder().sql(
        "SELECT COUNT(l.nation_id) " +
        "FROM cp.`tpch/nation.parquet` l " +
        ", cp.`tpch/region.parquet` r")
     .run();
   } finally {
    client.resetSession(ExecConstants.SLICE_TARGET);
    client.resetSession(PlannerSettings.NLJOIN_FOR_SCALAR.getOptionName());
   }
 }
}{code}
 

*Case 2.* When hash join is about to be used:
 - Option "planner.enable_mergejoin" is set to false, so hash join will be used instead
 - Option "planner.slice_target" is set to low value for imitation of big input tables
 - Comment out //ruleList.add(HashJoinPrule.DIST_INSTANCE); in PlannerPhase.getPhysicalRules method
{code:java}
@Category(SqlTest.class)
public class CrossJoinTest extends ClusterTest {
 @BeforeClass
 public static void setUp() throws Exception {
   startCluster(ClusterFixture.builder(dirTestWatcher));
 }

 @Test
 public void testInnerJoinSucceedsForLowSliceTarget() throws Exception {
   try {
    client.alterSession(PlannerSettings.MERGEJOIN.getOptionName(), false);
    client.alterSession(ExecConstants.SLICE_TARGET, 1);
    queryBuilder().sql(
      "SELECT COUNT(l.nation_id) " +
      "FROM cp.`tpch/nation.parquet` l " +
      "INNER JOIN cp.`tpch/region.parquet` r " +
      "ON r.nation_id = l.nation_id")
    .run();
   } finally {
    client.resetSession(ExecConstants.SLICE_TARGET);
    client.resetSession(PlannerSettings.MERGEJOIN.getOptionName());
   }
 }
}
{code}
 

*Workaround:* To avoid the exception we need to set option "_planner.enable_multiphase_agg_" to false. By doing this we avoid unsuccessful attempts to create 2 phase aggregation plan in StreamAggPrule and guarantee that logical aggregate will be converted to physical one. 

 

  was:
Case 1. When nested loop join is about to be used:
-Option "planner.enable_nljoin_for_scalar_only" is set to false
-Option "planner.slice_target" is set to low value for imitation of big input tables

 
{code:java}
@Category(SqlTest.class)
public class CrossJoinTest extends ClusterTest {
 @BeforeClass
 public static void setUp() throws Exception {
 startCluster(ClusterFixture.builder(dirTestWatcher));
 }

 @Test
 public void testCrossJoinSucceedsForLowSliceTarget() throws Exception {
   try {
     client.alterSession(PlannerSettings.NLJOIN_FOR_SCALAR.getOptionName(), false);
     client.alterSession(ExecConstants.SLICE_TARGET, 1);
     queryBuilder().sql(
        "SELECT COUNT(l.nation_id) " +
        "FROM cp.`tpch/nation.parquet` l " +
        "CROSS JOIN cp.`tpch/region.parquet` r")
     .run();
   } finally {
    client.resetSession(ExecConstants.SLICE_TARGET);
    client.resetSession(PlannerSettings.NLJOIN_FOR_SCALAR.getOptionName());
   }
 }
}{code}
 

Case 2. When hash join is about to be used:
- Option "planner.enable_mergejoin" is set to false, so hash join will be used instead
- Option "planner.slice_target" is set to low value for imitation of big input tables
- Comment out //ruleList.add(HashJoinPrule.DIST_INSTANCE); in PlannerPhase.getPhysicalRules method
{code:java}
@Category(SqlTest.class)
public class CrossJoinTest extends ClusterTest {
 @BeforeClass
 public static void setUp() throws Exception {
   startCluster(ClusterFixture.builder(dirTestWatcher));
 }

 @Test
 public void testInnerJoinSucceedsForLowSliceTarget() throws Exception {
   try {
    client.alterSession(PlannerSettings.MERGEJOIN.getOptionName(), false);
    client.alterSession(ExecConstants.SLICE_TARGET, 1);
    queryBuilder().sql(
      "SELECT COUNT(l.nation_id) " +
      "FROM cp.`tpch/nation.parquet` l " +
      "INNER JOIN cp.`tpch/region.parquet` r " +
      "ON r.nation_id = l.nation_id")
    .run();
   } finally {
    client.resetSession(ExecConstants.SLICE_TARGET);
    client.resetSession(PlannerSettings.MERGEJOIN.getOptionName());
   }
 }
}
{code}
*Workaround:* To avoid the exception we need to set option
"planner.enable_multiphase_agg" to false. By doing this we avoid unsuccessful attempts to create 2 phase aggregation plan 
in StreamAggPrule and guarantee that logical aggregate will be converted to physical one. 

 


> Failed to plan (aggregate + Hash or NL join) when slice target is low 
> ----------------------------------------------------------------------
>
>                 Key: DRILL-6839
>                 URL: https://issues.apache.org/jira/browse/DRILL-6839
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Igor Guzenko
>            Assignee: Igor Guzenko
>            Priority: Major
>
> *Case 1.* When nested loop join is about to be used:
>  - Option "_planner.enable_nljoin_for_scalar_only_" is set to false
>  - Option "_planner.slice_target_" is set to low value for imitation of big input tables
>  
> {code:java}
> @Category(SqlTest.class)
> public class CrossJoinTest extends ClusterTest {
>  @BeforeClass
>  public static void setUp() throws Exception {
>  startCluster(ClusterFixture.builder(dirTestWatcher));
>  }
>  @Test
>  public void testCrossJoinSucceedsForLowSliceTarget() throws Exception {
>    try {
>      client.alterSession(PlannerSettings.NLJOIN_FOR_SCALAR.getOptionName(), false);
>      client.alterSession(ExecConstants.SLICE_TARGET, 1);
>      queryBuilder().sql(
>         "SELECT COUNT(l.nation_id) " +
>         "FROM cp.`tpch/nation.parquet` l " +
>         ", cp.`tpch/region.parquet` r")
>      .run();
>    } finally {
>     client.resetSession(ExecConstants.SLICE_TARGET);
>     client.resetSession(PlannerSettings.NLJOIN_FOR_SCALAR.getOptionName());
>    }
>  }
> }{code}
>  
> *Case 2.* When hash join is about to be used:
>  - Option "planner.enable_mergejoin" is set to false, so hash join will be used instead
>  - Option "planner.slice_target" is set to low value for imitation of big input tables
>  - Comment out //ruleList.add(HashJoinPrule.DIST_INSTANCE); in PlannerPhase.getPhysicalRules method
> {code:java}
> @Category(SqlTest.class)
> public class CrossJoinTest extends ClusterTest {
>  @BeforeClass
>  public static void setUp() throws Exception {
>    startCluster(ClusterFixture.builder(dirTestWatcher));
>  }
>  @Test
>  public void testInnerJoinSucceedsForLowSliceTarget() throws Exception {
>    try {
>     client.alterSession(PlannerSettings.MERGEJOIN.getOptionName(), false);
>     client.alterSession(ExecConstants.SLICE_TARGET, 1);
>     queryBuilder().sql(
>       "SELECT COUNT(l.nation_id) " +
>       "FROM cp.`tpch/nation.parquet` l " +
>       "INNER JOIN cp.`tpch/region.parquet` r " +
>       "ON r.nation_id = l.nation_id")
>     .run();
>    } finally {
>     client.resetSession(ExecConstants.SLICE_TARGET);
>     client.resetSession(PlannerSettings.MERGEJOIN.getOptionName());
>    }
>  }
> }
> {code}
>  
> *Workaround:* To avoid the exception we need to set option "_planner.enable_multiphase_agg_" to false. By doing this we avoid unsuccessful attempts to create 2 phase aggregation plan in StreamAggPrule and guarantee that logical aggregate will be converted to physical one. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)