You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by jinfengni <gi...@git.apache.org> on 2016/08/05 04:19:11 UTC

[GitHub] drill pull request #559: DRILL-4825: Fix incorrect result issue caused by pa...

GitHub user jinfengni opened a pull request:

    https://github.com/apache/drill/pull/559

    DRILL-4825: Fix incorrect result issue caused by partition pruning wh\u2026

    \u2026en same tables are queried multiple times with different filters in query.
    
    1) Introduce DirPrunedEnumerableTableScan which will take file selection as part of digest.
    2) When directory-based pruning happens, create instance of DirPrunedEnumerableTableScan.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jinfengni/incubator-drill DRILL-4825

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/559.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #559
    
----
commit 79b1602933538d98c301afd621deafe8e8f4e79b
Author: Jinfeng Ni <jn...@apache.org>
Date:   2016-08-05T00:54:30Z

    DRILL-4825: Fix incorrect result issue caused by partition pruning when same tables are queried multiple times with different filters in query.
    
    1) Introduce DirPrunedEnumerableTableScan which will take file selection as part of digest.
    2) When directory-based pruning happens, create instance of DirPrunedEnumerableTableScan.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #559: DRILL-4825: Fix incorrect result issue caused by pa...

Posted by jinfengni <gi...@git.apache.org>.
Github user jinfengni commented on a diff in the pull request:

    https://github.com/apache/drill/pull/559#discussion_r73725279
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DirPrunedEnumerableTableScan.java ---
    @@ -0,0 +1,73 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.drill.exec.planner.logical;
    +
    +import com.google.common.base.Supplier;
    +import com.google.common.collect.ImmutableList;
    +import org.apache.calcite.adapter.enumerable.EnumerableConvention;
    +import org.apache.calcite.adapter.enumerable.EnumerableTableScan;
    +import org.apache.calcite.plan.RelOptCluster;
    +import org.apache.calcite.plan.RelOptTable;
    +import org.apache.calcite.plan.RelTraitSet;
    +import org.apache.calcite.rel.RelCollation;
    +import org.apache.calcite.rel.RelCollationTraitDef;
    +import org.apache.calcite.rel.RelWriter;
    +import org.apache.calcite.schema.Table;
    +
    +import java.util.List;
    +
    +/**
    + * This class extends from EnumerableTableScan. It puts the file selection string into it's digest.
    + * When directory-based partition pruning applied, file selection could be different for the same
    + * table.
    + */
    +public class DirPrunedEnumerableTableScan extends EnumerableTableScan {
    +  private final String digestFromSelection;
    +
    +  public DirPrunedEnumerableTableScan(RelOptCluster cluster, RelTraitSet traitSet,
    --- End diff --
    
    Good suggestion. I made the change. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #559: DRILL-4825: Fix incorrect result issue caused by pa...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/drill/pull/559


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #559: DRILL-4825: Fix incorrect result issue caused by partition...

Posted by jinfengni <gi...@git.apache.org>.
Github user jinfengni commented on the issue:

    https://github.com/apache/drill/pull/559
  
    @amansinha100 , could you please review this patch?
    
    Unit test / functional regression have passed. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #559: DRILL-4825: Fix incorrect result issue caused by partition...

Posted by jinfengni <gi...@git.apache.org>.
Github user jinfengni commented on the issue:

    https://github.com/apache/drill/pull/559
  
    @amansinha100 , thanks for your comments. Could you please take another look?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #559: DRILL-4825: Fix incorrect result issue caused by pa...

Posted by jinfengni <gi...@git.apache.org>.
Github user jinfengni commented on a diff in the pull request:

    https://github.com/apache/drill/pull/559#discussion_r73725525
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestPartitionFilter.java ---
    @@ -386,4 +386,49 @@ public void testPartitionFilterWithInSubquery() throws Exception {
         test("alter session set `planner.in_subquery_threshold` = 10");
         testExcludeFilter(query, 4, "Filter", 40);
       }
    +
    +
    +  @Test // DRILL-4825: querying same table with different filter in UNION ALL.
    +  public void testPruneSameTableInUnionAll() throws Exception {
    +    final String query = String.format("select count(*) as cnt from "
    +        + "( select dir0 from dfs_test.`%s/multilevel/parquet` where dir0 in ('1994') union all "
    +        + "  select dir0 from dfs_test.`%s/multilevel/parquet` where dir0 in ('1995', '1996') )",
    +        TEST_RES_PATH, TEST_RES_PATH);
    +
    +    String [] exclued = {"Filter"};
    --- End diff --
    
    modified. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #559: DRILL-4825: Fix incorrect result issue caused by pa...

Posted by amansinha100 <gi...@git.apache.org>.
Github user amansinha100 commented on a diff in the pull request:

    https://github.com/apache/drill/pull/559#discussion_r73649304
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestPartitionFilter.java ---
    @@ -386,4 +386,49 @@ public void testPartitionFilterWithInSubquery() throws Exception {
         test("alter session set `planner.in_subquery_threshold` = 10");
         testExcludeFilter(query, 4, "Filter", 40);
       }
    +
    +
    +  @Test // DRILL-4825: querying same table with different filter in UNION ALL.
    +  public void testPruneSameTableInUnionAll() throws Exception {
    +    final String query = String.format("select count(*) as cnt from "
    +        + "( select dir0 from dfs_test.`%s/multilevel/parquet` where dir0 in ('1994') union all "
    +        + "  select dir0 from dfs_test.`%s/multilevel/parquet` where dir0 in ('1995', '1996') )",
    +        TEST_RES_PATH, TEST_RES_PATH);
    +
    +    String [] exclued = {"Filter"};
    +
    +    // verify plan that filter is applied in partition pruning.
    +    testPlanMatchingPatterns(query, null, exclued);
    +
    +    // verify we get correct count(*).
    +    testBuilder()
    +        .sqlQuery(query)
    +        .unOrdered()
    +        .baselineColumns("cnt")
    +        .baselineValues((long)120)
    +        .build()
    +        .run();
    +  }
    +
    +  @Test // DRILL-4825: querying same table with different filter in Join.
    +  public void testPruneSameTableInJoin() throws Exception {
    +    final String query = String.format("select *  from "
    +            + "( select sum(o_custkey) as x from dfs_test.`%s/multilevel/parquet` where dir0 in ('1994') ) join "
    +            + " ( select sum(o_custkey) as y from dfs_test.`%s/multilevel/parquet` where dir0 in ('1995', '1996')) "
    +            + " on x = y ",
    +        TEST_RES_PATH, TEST_RES_PATH);
    +
    +    String [] exclued = {"Filter"};
    --- End diff --
    
    same as above


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #559: DRILL-4825: Fix incorrect result issue caused by pa...

Posted by amansinha100 <gi...@git.apache.org>.
Github user amansinha100 commented on a diff in the pull request:

    https://github.com/apache/drill/pull/559#discussion_r73649603
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSelection.java ---
    @@ -402,4 +402,18 @@ public String getCacheFileRoot() {
         return cacheFileRoot;
       }
     
    +  @Override
    +  public String toString() {
    +    final StringBuilder sb = new StringBuilder();
    +    sb.append("root=" + this.selectionRoot);
    +
    +    sb.append("files=[");
    +    for (final String file : this.files) {
    +      sb.append(file);
    --- End diff --
    
    Is this method only used internally or would it show up in the Explain output ?  If latter, you may add a comma or space after each file name. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #559: DRILL-4825: Fix incorrect result issue caused by pa...

Posted by amansinha100 <gi...@git.apache.org>.
Github user amansinha100 commented on a diff in the pull request:

    https://github.com/apache/drill/pull/559#discussion_r73649293
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestPartitionFilter.java ---
    @@ -386,4 +386,49 @@ public void testPartitionFilterWithInSubquery() throws Exception {
         test("alter session set `planner.in_subquery_threshold` = 10");
         testExcludeFilter(query, 4, "Filter", 40);
       }
    +
    +
    +  @Test // DRILL-4825: querying same table with different filter in UNION ALL.
    +  public void testPruneSameTableInUnionAll() throws Exception {
    +    final String query = String.format("select count(*) as cnt from "
    +        + "( select dir0 from dfs_test.`%s/multilevel/parquet` where dir0 in ('1994') union all "
    +        + "  select dir0 from dfs_test.`%s/multilevel/parquet` where dir0 in ('1995', '1996') )",
    +        TEST_RES_PATH, TEST_RES_PATH);
    +
    +    String [] exclued = {"Filter"};
    --- End diff --
    
    'excluded'  (missing 'd')


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #559: DRILL-4825: Fix incorrect result issue caused by partition...

Posted by amansinha100 <gi...@git.apache.org>.
Github user amansinha100 commented on the issue:

    https://github.com/apache/drill/pull/559
  
    LGTM.  +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #559: DRILL-4825: Fix incorrect result issue caused by pa...

Posted by jinfengni <gi...@git.apache.org>.
Github user jinfengni commented on a diff in the pull request:

    https://github.com/apache/drill/pull/559#discussion_r73725484
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSelection.java ---
    @@ -402,4 +402,18 @@ public String getCacheFileRoot() {
         return cacheFileRoot;
       }
     
    +  @Override
    +  public String toString() {
    +    final StringBuilder sb = new StringBuilder();
    +    sb.append("root=" + this.selectionRoot);
    +
    +    sb.append("files=[");
    +    for (final String file : this.files) {
    +      sb.append(file);
    --- End diff --
    
    It's used only internally for now, since Drill's Explain output only generates physical plan. 
    
    You suggestion makes sense. I add a comma after each file name. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #559: DRILL-4825: Fix incorrect result issue caused by pa...

Posted by amansinha100 <gi...@git.apache.org>.
Github user amansinha100 commented on a diff in the pull request:

    https://github.com/apache/drill/pull/559#discussion_r73649124
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DirPrunedEnumerableTableScan.java ---
    @@ -0,0 +1,73 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.drill.exec.planner.logical;
    +
    +import com.google.common.base.Supplier;
    +import com.google.common.collect.ImmutableList;
    +import org.apache.calcite.adapter.enumerable.EnumerableConvention;
    +import org.apache.calcite.adapter.enumerable.EnumerableTableScan;
    +import org.apache.calcite.plan.RelOptCluster;
    +import org.apache.calcite.plan.RelOptTable;
    +import org.apache.calcite.plan.RelTraitSet;
    +import org.apache.calcite.rel.RelCollation;
    +import org.apache.calcite.rel.RelCollationTraitDef;
    +import org.apache.calcite.rel.RelWriter;
    +import org.apache.calcite.schema.Table;
    +
    +import java.util.List;
    +
    +/**
    + * This class extends from EnumerableTableScan. It puts the file selection string into it's digest.
    + * When directory-based partition pruning applied, file selection could be different for the same
    + * table.
    + */
    +public class DirPrunedEnumerableTableScan extends EnumerableTableScan {
    +  private final String digestFromSelection;
    +
    +  public DirPrunedEnumerableTableScan(RelOptCluster cluster, RelTraitSet traitSet,
    --- End diff --
    
    Besides the constructor and the create() methods, there's also the copy() method that I think should be overridden.  Several parts of the code make a copy of EnumerableTableScan and they will not see the digestFromSelection. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #559: DRILL-4825: Fix incorrect result issue caused by pa...

Posted by jinfengni <gi...@git.apache.org>.
Github user jinfengni commented on a diff in the pull request:

    https://github.com/apache/drill/pull/559#discussion_r73725537
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestPartitionFilter.java ---
    @@ -386,4 +386,49 @@ public void testPartitionFilterWithInSubquery() throws Exception {
         test("alter session set `planner.in_subquery_threshold` = 10");
         testExcludeFilter(query, 4, "Filter", 40);
       }
    +
    +
    +  @Test // DRILL-4825: querying same table with different filter in UNION ALL.
    +  public void testPruneSameTableInUnionAll() throws Exception {
    +    final String query = String.format("select count(*) as cnt from "
    +        + "( select dir0 from dfs_test.`%s/multilevel/parquet` where dir0 in ('1994') union all "
    +        + "  select dir0 from dfs_test.`%s/multilevel/parquet` where dir0 in ('1995', '1996') )",
    +        TEST_RES_PATH, TEST_RES_PATH);
    +
    +    String [] exclued = {"Filter"};
    +
    +    // verify plan that filter is applied in partition pruning.
    +    testPlanMatchingPatterns(query, null, exclued);
    +
    +    // verify we get correct count(*).
    +    testBuilder()
    +        .sqlQuery(query)
    +        .unOrdered()
    +        .baselineColumns("cnt")
    +        .baselineValues((long)120)
    +        .build()
    +        .run();
    +  }
    +
    +  @Test // DRILL-4825: querying same table with different filter in Join.
    +  public void testPruneSameTableInJoin() throws Exception {
    +    final String query = String.format("select *  from "
    +            + "( select sum(o_custkey) as x from dfs_test.`%s/multilevel/parquet` where dir0 in ('1994') ) join "
    +            + " ( select sum(o_custkey) as y from dfs_test.`%s/multilevel/parquet` where dir0 in ('1995', '1996')) "
    +            + " on x = y ",
    +        TEST_RES_PATH, TEST_RES_PATH);
    +
    +    String [] exclued = {"Filter"};
    --- End diff --
    
    modified. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---