You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by jinfengni <gi...@git.apache.org> on 2016/01/18 06:32:08 UTC

[GitHub] drill pull request: Drill 2517: Apply directory-based partition pr...

GitHub user jinfengni opened a pull request:

    https://github.com/apache/drill/pull/328

    Drill 2517: Apply directory-based partition pruning before reading files in planning.

    1. Run the pre-commit tests and unit tests. Some queries in pre-commit suites have changed plan. Most of the changed plan looks better than before. The only exception is for the cases caused by one existing issue (DRILL-4279), when * column is used together with SKIP_ALL mode. That happens when the filter is applied and then removed, for the following query:
    
    SELECT count(*) from T1 where dir0 = 1990 and dir1 = 'Q1'.
    
    2. I need figure out how to cherry-pick Adam's patch in DRILL-2517, since that's the initial work on this issue, although there is quite big change from that patch.  
    
    @amansinha100 , could you please take a look and give some initial review comments? Thanks!
     

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jinfengni/incubator-drill DRILL-2517

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/328.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #328
    
----
commit bc5e13972d116f41ed9441b49a6781b2b602c2fd
Author: Mehant Baid <me...@gmail.com>
Date:   2015-11-11T06:26:26Z

    DRILL-2571: (Prototype from Mehant) Move directory based partition pruning to logical phase.

commit 1b05e372ee7193308bea420302bdd0e259193e3a
Author: Jinfeng Ni <jn...@apache.org>
Date:   2016-01-08T18:28:53Z

    DRILL-2517: Move directory-based partition pruning to Calcite logical planning phase.
    
    1) Make directory-based pruning rule both work in calcite logical and drill logical planning phase.
    
    2) Only apply directory-based pruning in logical phase when there is no metadata cache.
    
    3) Make FileSelection constructor public, since FileSelection.create() would modify selectionRoot.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request: Drill 2517: Apply directory-based partition pr...

Posted by amansinha100 <gi...@git.apache.org>.
Github user amansinha100 commented on the pull request:

    https://github.com/apache/drill/pull/328#issuecomment-174415824
  
    Revised patch looks good to me..other than a few minor comments.   +1. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request: Drill 2517: Apply directory-based partition pr...

Posted by amansinha100 <gi...@git.apache.org>.
Github user amansinha100 commented on a diff in the pull request:

    https://github.com/apache/drill/pull/328#discussion_r50302251
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/FileSystemPartitionDescriptor.java ---
    @@ -84,9 +99,17 @@ public int getMaxHierarchyLevel() {
     
       @Override
       public GroupScan createNewGroupScan(List<String> newFiles) throws IOException {
    -    final FileSelection newSelection = FileSelection.create(null, newFiles, getBaseTableLocation());
    -    final FileGroupScan newScan = ((FileGroupScan)scanRel.getGroupScan()).clone(newSelection);
    -    return newScan;
    +    if (scanRel instanceof DrillScanRel) {
    +      final FileSelection newFileSelection = new FileSelection(null, newFiles, getBaseTableLocation());
    +      final FileGroupScan newScan = ((FileGroupScan)((DrillScanRel)scanRel).getGroupScan()).clone(newFileSelection);
    +      return newScan;
    +    } else {
    +      throw new UnsupportedOperationException("Does not allow to get groupScan for EnumerableTableScan");
    --- End diff --
    
    The else condition could be anything other than DrillScanRel, so you need not mention EnumerableTableScan. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #328: Drill 2517: Apply directory-based partition pruning...

Posted by jinfengni <gi...@git.apache.org>.
Github user jinfengni closed the pull request at:

    https://github.com/apache/drill/pull/328


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request: Drill 2517: Apply directory-based partition pr...

Posted by amansinha100 <gi...@git.apache.org>.
Github user amansinha100 commented on a diff in the pull request:

    https://github.com/apache/drill/pull/328#discussion_r50300305
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/PruneScanRule.java ---
    @@ -78,71 +94,68 @@ public PruneScanRule(RelOptRuleOperand operand, String id, OptimizerRulesContext
         this.optimizerContext = optimizerContext;
       }
     
    -  public static final RelOptRule getFilterOnProject(OptimizerRulesContext optimizerRulesContext) {
    -    return new PruneScanRule(
    -        RelOptHelper.some(DrillFilterRel.class, RelOptHelper.some(DrillProjectRel.class, RelOptHelper.any(DrillScanRel.class))),
    -        "PruneScanRule:Filter_On_Project",
    -        optimizerRulesContext) {
    +  private static class DirPruneScanFilterOnProjectRule extends PruneScanRule {
    +    private final boolean calciteLogical;
    --- End diff --
    
    It's not clear to me why a flag is needed to keep track of Calcite logical...ideally this should be handled by the type of the RelNode. We may need to do some refactoring of this rule to separate out the ones that apply during Calcite logical planning vs. Drill logical planning.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request: Drill 2517: Apply directory-based partition pr...

Posted by amansinha100 <gi...@git.apache.org>.
Github user amansinha100 commented on a diff in the pull request:

    https://github.com/apache/drill/pull/328#discussion_r50657159
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/PruneScanRule.java ---
    @@ -382,5 +381,28 @@ protected OptimizerRulesContext getOptimizerRulesContext() {
         return optimizerContext;
       }
     
    -  public abstract PartitionDescriptor getPartitionDescriptor(PlannerSettings settings, DrillScanRel scanRel);
    +  public abstract PartitionDescriptor getPartitionDescriptor(PlannerSettings settings, TableScan scanRel);
    +
    +  private static boolean isQualifiedDirPruning(final TableScan scan) {
    +    if (scan instanceof EnumerableTableScan) {
    +      DrillTable drillTable;
    +      drillTable = scan.getTable().unwrap(DrillTable.class);
    +      if (drillTable == null) {
    +        drillTable = scan.getTable().unwrap(DrillTranslatableTable.class).getDrillTable();
    +      }
    +      final Object selection = drillTable.getSelection();
    --- End diff --
    
    Would be good to assert drillTable != null or put an if() condition. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request: Drill 2517: Apply directory-based partition pr...

Posted by amansinha100 <gi...@git.apache.org>.
Github user amansinha100 commented on a diff in the pull request:

    https://github.com/apache/drill/pull/328#discussion_r50656611
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/FileSystemPartitionDescriptor.java ---
    @@ -82,11 +101,19 @@ public int getMaxHierarchyLevel() {
         return MAX_NESTED_SUBDIRS;
       }
     
    -  @Override
    -  public GroupScan createNewGroupScan(List<String> newFiles) throws IOException {
    -    final FileSelection newSelection = FileSelection.create(null, newFiles, getBaseTableLocation());
    -    final FileGroupScan newScan = ((FileGroupScan)scanRel.getGroupScan()).clone(newSelection);
    -    return newScan;
    +//  @Override
    +//  public GroupScan createNewGroupScan(List<String> newFiles) throws IOException {
    --- End diff --
    
    This function can be removed completely...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request: Drill 2517: Apply directory-based partition pr...

Posted by jinfengni <gi...@git.apache.org>.
Github user jinfengni commented on a diff in the pull request:

    https://github.com/apache/drill/pull/328#discussion_r50302540
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/PruneScanRule.java ---
    @@ -78,71 +94,68 @@ public PruneScanRule(RelOptRuleOperand operand, String id, OptimizerRulesContext
         this.optimizerContext = optimizerContext;
       }
     
    -  public static final RelOptRule getFilterOnProject(OptimizerRulesContext optimizerRulesContext) {
    -    return new PruneScanRule(
    -        RelOptHelper.some(DrillFilterRel.class, RelOptHelper.some(DrillProjectRel.class, RelOptHelper.any(DrillScanRel.class))),
    -        "PruneScanRule:Filter_On_Project",
    -        optimizerRulesContext) {
    +  private static class DirPruneScanFilterOnProjectRule extends PruneScanRule {
    +    private final boolean calciteLogical;
    --- End diff --
    
    You are right. The flag is not needed now. I will refactor the rules. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #328: Drill 2517: Apply directory-based partition pruning before...

Posted by jinfengni <gi...@git.apache.org>.
Github user jinfengni commented on the issue:

    https://github.com/apache/drill/pull/328
  
    this PR has been merged to drill 1.6.0. Close this PR. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---