You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@carbondata.apache.org by kevinjmh <gi...@git.apache.org> on 2018/08/28 12:29:30 UTC

[GitHub] carbondata pull request #2665: [CARBONDATA-2897][DataMap] Optimize datamap c...

GitHub user kevinjmh opened a pull request:

    https://github.com/apache/carbondata/pull/2665

    [CARBONDATA-2897][DataMap] Optimize datamap chooser

    In this PR,
    1. Remove code for merging into one datamap when some datamap hits both child nodes of And/Or expression in DataMapChooser. This aims to make datamap focus on pruning single index column without any logic process. Leave logic stuff to be done by AndDataMapExprWrapper and OrDataMapExprWrapper
    2. Only extract ColumnExpression of Expression which our datamap can handle in DataMapChooser.
    3. Add short circuit to return pruned result when result of left node is empty in AndDataMapExprWrapper.
    
    
    
    Be sure to do all of the following checklist to help us incorporate 
    your contribution quickly and easily:
    
     - [ ] Any interfaces changed?
     
     - [ ] Any backward compatibility impacted?
     
     - [ ] Document update required?
    
     - [ ] Testing done
            Please provide details on 
            - Whether new unit test cases have been added or why no new tests are required?
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance test report.
            - Any additional information to help reviewers in testing this change.
           
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. 
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kevinjmh/carbondata dmChooser

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2665.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2665
    
----
commit 2c03e75f7d77df7609f2b054dcf93bdd7256bfa8
Author: Manhua <ke...@...>
Date:   2018-08-28T11:51:52Z

    datamap chooser

----


---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/15/



---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/343/



---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8593/



---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/458/



---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/51/



---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/390/



---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/523/



---

[GitHub] carbondata pull request #2665: [CARBONDATA-2897][DataMap] Optimize datamap c...

Posted by kevinjmh <gi...@git.apache.org>.

Github user kevinjmh commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2665#discussion_r219378217
  
    --- Diff: datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapFactory.java ---
    @@ -123,6 +129,42 @@ public BloomCoarseGrainDataMapFactory(CarbonTable carbonTable, DataMapSchema dat
         }
       }
     
    +  @Override
    +  public boolean isSupport(Expression expression) {
    +    ColumnExpression filterColumn = null;
    +    // First check type of child nodes, and get filter column name
    +    switch (expression.getFilterExpressionType()) {
    +      case IN:
    +        InExpression inExpr = (InExpression) expression;
    +        if (inExpr.getLeft() instanceof ColumnExpression &&
    +            inExpr.getRight() instanceof ListExpression) {
    +          filterColumn = (ColumnExpression) inExpr.getLeft();
    +        } else if (inExpr.getRight() instanceof ColumnExpression &&
    +            inExpr.getLeft() instanceof ListExpression) {
    +          filterColumn = (ColumnExpression) inExpr.getRight();
    +        }
    +        break;
    +      case EQUALS:
    +        EqualToExpression equalToExpr = (EqualToExpression) expression;
    +        if (equalToExpr.getLeft() instanceof ColumnExpression &&
    +            equalToExpr.getRight() instanceof LiteralExpression) {
    +          filterColumn = (ColumnExpression) equalToExpr.getLeft();
    +        } else if (equalToExpr.getRight() instanceof ColumnExpression &&
    +            equalToExpr.getLeft() instanceof LiteralExpression) {
    +          filterColumn = (ColumnExpression) equalToExpr.getRight();
    +        }
    +        break;
    +      default:
    +        break;
    +    }
    +    // Then check if filter column is in index columns
    +    if (null != filterColumn && dataMapMeta.getIndexedColumnNames().contains(
    --- End diff --
    
    for safety, convert to lowercase in both side to check


---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/60/



---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Assign to datamap only fo...

Posted by xuchuanyin <gi...@git.apache.org>.

Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    @ravipesala 
    @kevinjmh will check your PR


---

[GitHub] carbondata pull request #2665: [CARBONDATA-2897][DataMap] Optimize datamap c...

Posted by kevinjmh <gi...@git.apache.org>.

Github user kevinjmh commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2665#discussion_r219376540
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java ---
    @@ -364,6 +365,13 @@ public DataMapMeta getMeta() {
         return null;
       }
     
    +  @Override
    +  public boolean isSupport(Expression expression) {
    +    // this method is used for choosing cg/fg datamap
    +    // for default datamap always return false
    +    return false;
    --- End diff --
    
    I get your point. I will change it to return true for better code reading. Also will add checking meta info( meta of default datamap is NULLnow) to avoid null point exception


---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Assign to datamap only fo...

Posted by xuchuanyin <gi...@git.apache.org>.

Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    @kevinjmh I accepted another PR #2767 which also intends to fix this problem.
    You can close this PR now.
    Thanks for your working all the same!


---

[GitHub] carbondata pull request #2665: [CARBONDATA-2897][DataMap] Optimize datamap c...

Posted by xuchuanyin <gi...@git.apache.org>.

Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2665#discussion_r219066622
  
    --- Diff: datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapFactory.java ---
    @@ -123,6 +129,42 @@ public BloomCoarseGrainDataMapFactory(CarbonTable carbonTable, DataMapSchema dat
         }
       }
     
    +  @Override
    +  public boolean isSupport(Expression expression) {
    +    ColumnExpression filterColumn = null;
    +    // First check type of child nodes, and get filter column name
    +    switch (expression.getFilterExpressionType()) {
    +      case IN:
    +        InExpression inExpr = (InExpression) expression;
    +        if (inExpr.getLeft() instanceof ColumnExpression &&
    +            inExpr.getRight() instanceof ListExpression) {
    +          filterColumn = (ColumnExpression) inExpr.getLeft();
    +        } else if (inExpr.getRight() instanceof ColumnExpression &&
    +            inExpr.getLeft() instanceof ListExpression) {
    +          filterColumn = (ColumnExpression) inExpr.getRight();
    +        }
    +        break;
    +      case EQUALS:
    +        EqualToExpression equalToExpr = (EqualToExpression) expression;
    +        if (equalToExpr.getLeft() instanceof ColumnExpression &&
    +            equalToExpr.getRight() instanceof LiteralExpression) {
    +          filterColumn = (ColumnExpression) equalToExpr.getLeft();
    +        } else if (equalToExpr.getRight() instanceof ColumnExpression &&
    +            equalToExpr.getLeft() instanceof LiteralExpression) {
    +          filterColumn = (ColumnExpression) equalToExpr.getRight();
    +        }
    +        break;
    +      default:
    +        break;
    --- End diff --
    
    just return 'false' here for unsupported expression


---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

Posted by xuchuanyin <gi...@git.apache.org>.

Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    since it changes the datamap chooser logic, I'd recommend you to test it with larger amount of data and observe the pruning


---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/222/



---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Assign to datamap only fo...

Posted by xuchuanyin <gi...@git.apache.org>.

Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    @ravipesala At last, I think current PR:
    1. do fix the bug
    2. has limited impact on other features (only impact bloomfilter datamap)
    and it can be accepted.
    
    Maybe in the next version, we can optimize the frame work by
    1. assembling expression that exactly can be supported by the index datamap
    2. supporting pipeline pruning (use a bitmap to indicate the hitted blocklet number in previous pruning procedure and use the bitmap in current pruning procedure)


---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/359/



---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8122/



---

[GitHub] carbondata pull request #2665: [CARBONDATA-2897][DataMap] Optimize datamap c...

Posted by xuchuanyin <gi...@git.apache.org>.

Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2665#discussion_r219067294
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java ---
    @@ -364,6 +365,13 @@ public DataMapMeta getMeta() {
         return null;
       }
     
    +  @Override
    +  public boolean isSupport(Expression expression) {
    +    // this method is used for choosing cg/fg datamap
    +    // for default datamap always return false
    +    return false;
    --- End diff --
    
    I think  you can by default return true here, even it has no side effect, but it will make sense for code reading: default datamap can handle all the expressions


---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Assign to datamap only fo...

Posted by xuchuanyin <gi...@git.apache.org>.

Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    In addition, if the expression forwarded to bloomfilter datamap is `indexCol1 = 1 AND indexCol2 = 2`, bloomfilter datamap will prune the blocklets separately and giving result sets `blockletSet1` and `blockletSet2`. The bloomfilter has to do intersection of the two result sets based on the 'AND' relation of the two expressions. ---- That's what I called 'dirty work'.


---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Assign to datamap only fo...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/624/



---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8591/



---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Assign to datamap only fo...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/714/



---

[GitHub] carbondata pull request #2665: [CARBONDATA-2897][DataMap] Optimize datamap c...

Posted by xuchuanyin <gi...@git.apache.org>.

Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2665#discussion_r217945289
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapChooser.java ---
    @@ -268,23 +238,38 @@ private ExpressionTuple selectDataMap(Expression expression, List<TableDataMap>
     
       private void extractColumnExpression(Expression expression,
           List<ColumnExpression> columnExpressions) {
    -    if (expression instanceof ColumnExpression) {
    -      columnExpressions.add((ColumnExpression) expression);
    -    } else if (expression instanceof MatchExpression) {
    -      // this is a special case for lucene
    -      // build a fake ColumnExpression to filter datamaps which contain target column
    -      // a Lucene query string is alike "column:query term"
    -      String[] queryItems = expression.getString().split(":", 2);
    -      if (queryItems.length == 2) {
    -        columnExpressions.add(new ColumnExpression(queryItems[0], null));
    -      }
    -    } else if (expression != null) {
    -      List<Expression> children = expression.getChildren();
    -      if (children != null && children.size() > 0) {
    -        for (Expression exp : children) {
    -          extractColumnExpression(exp, columnExpressions);
    +    switch (expression.getFilterExpressionType()) {
    --- End diff --
    
    I think these operation should not be in DataMapChooser.
    DataMapChooser is for common logic and should not handle specific datamap's logic.
    If you want to decide which expression will be supported by the specific datamap, here I do propose you to refactor the 'SUPPORTED_EXPRESSION' in the specific datamap. In that place, the datamap should declare what kind of operand and operator it will support and in DataMapChooser we just need to call that method and decide which expression will be handled by that datamap.


---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/520/



---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8606/



---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

Posted by xuchuanyin <gi...@git.apache.org>.

Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    LGTM


---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

Posted by xuchuanyin <gi...@git.apache.org>.

Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    retest this please


---

[GitHub] carbondata pull request #2665: [CARBONDATA-2897][DataMap] Assign to datamap ...

Posted by kevinjmh <gi...@git.apache.org>.

Github user kevinjmh closed the pull request at:

    https://github.com/apache/carbondata/pull/2665


---

[GitHub] carbondata pull request #2665: [CARBONDATA-2897][DataMap] Optimize datamap c...

Posted by xuchuanyin <gi...@git.apache.org>.

Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2665#discussion_r219066781
  
    --- Diff: datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapFactory.java ---
    @@ -123,6 +129,42 @@ public BloomCoarseGrainDataMapFactory(CarbonTable carbonTable, DataMapSchema dat
         }
       }
     
    +  @Override
    +  public boolean isSupport(Expression expression) {
    +    ColumnExpression filterColumn = null;
    +    // First check type of child nodes, and get filter column name
    +    switch (expression.getFilterExpressionType()) {
    +      case IN:
    +        InExpression inExpr = (InExpression) expression;
    +        if (inExpr.getLeft() instanceof ColumnExpression &&
    +            inExpr.getRight() instanceof ListExpression) {
    +          filterColumn = (ColumnExpression) inExpr.getLeft();
    +        } else if (inExpr.getRight() instanceof ColumnExpression &&
    +            inExpr.getLeft() instanceof ListExpression) {
    +          filterColumn = (ColumnExpression) inExpr.getRight();
    +        }
    +        break;
    +      case EQUALS:
    +        EqualToExpression equalToExpr = (EqualToExpression) expression;
    +        if (equalToExpr.getLeft() instanceof ColumnExpression &&
    +            equalToExpr.getRight() instanceof LiteralExpression) {
    +          filterColumn = (ColumnExpression) equalToExpr.getLeft();
    +        } else if (equalToExpr.getRight() instanceof ColumnExpression &&
    +            equalToExpr.getLeft() instanceof LiteralExpression) {
    +          filterColumn = (ColumnExpression) equalToExpr.getRight();
    +        }
    +        break;
    +      default:
    +        break;
    +    }
    +    // Then check if filter column is in index columns
    +    if (null != filterColumn && dataMapMeta.getIndexedColumnNames().contains(
    --- End diff --
    
    will there be a case-sensitive problem here?


---

[GitHub] carbondata pull request #2665: [CARBONDATA-2897][DataMap] Optimize datamap c...

Posted by kevinjmh <gi...@git.apache.org>.

Github user kevinjmh commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2665#discussion_r219378413
  
    --- Diff: datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapFactory.java ---
    @@ -123,6 +129,42 @@ public BloomCoarseGrainDataMapFactory(CarbonTable carbonTable, DataMapSchema dat
         }
       }
     
    +  @Override
    +  public boolean isSupport(Expression expression) {
    +    ColumnExpression filterColumn = null;
    +    // First check type of child nodes, and get filter column name
    +    switch (expression.getFilterExpressionType()) {
    +      case IN:
    +        InExpression inExpr = (InExpression) expression;
    +        if (inExpr.getLeft() instanceof ColumnExpression &&
    +            inExpr.getRight() instanceof ListExpression) {
    +          filterColumn = (ColumnExpression) inExpr.getLeft();
    +        } else if (inExpr.getRight() instanceof ColumnExpression &&
    +            inExpr.getLeft() instanceof ListExpression) {
    +          filterColumn = (ColumnExpression) inExpr.getRight();
    +        }
    +        break;
    +      case EQUALS:
    +        EqualToExpression equalToExpr = (EqualToExpression) expression;
    +        if (equalToExpr.getLeft() instanceof ColumnExpression &&
    +            equalToExpr.getRight() instanceof LiteralExpression) {
    +          filterColumn = (ColumnExpression) equalToExpr.getLeft();
    +        } else if (equalToExpr.getRight() instanceof ColumnExpression &&
    +            equalToExpr.getLeft() instanceof LiteralExpression) {
    +          filterColumn = (ColumnExpression) equalToExpr.getRight();
    +        }
    +        break;
    +      default:
    +        break;
    --- End diff --
    
    ok


---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8589/



---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/557/



---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

Posted by xuchuanyin <gi...@git.apache.org>.

Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    I've verified this PR for another bug and it works fine.
    see http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Issue-Bloomfilter-datamap-td63254.html


---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Assign to datamap only fo...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8885/



---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.3/8460/



---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

Posted by kevinjmh <gi...@git.apache.org>.

Github user kevinjmh commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    retest this please


---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Assign to datamap only fo...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/536/



---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

Posted by ravipesala <gi...@git.apache.org>.

Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    @xuchuanyin Datamap expression merging is the basic framework feature but not tested, so we might expect some issues from it as when it was implemented there were no index datamaps implemented. So we should fix those issues instead of removing it. It changes the very nature of index datamap itself if you remove it.
    The expected feature should be as follows.
    When the user creates index datamap on multiple columns then it should be treated as a composite index, not an individual index. 
    The user should create individual datamaps on the columns if he wants an individual index on the column.


---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Assign to datamap only fo...

Posted by ravipesala <gi...@git.apache.org>.

Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    @xuchuanyin Please check my fix in the PR https://github.com/apache/carbondata/pull/2767


---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8707/



---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

Posted by ravipesala <gi...@git.apache.org>.

Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    @xuchuanyin when user creates datamap on multiple columns then it should be composite index. But bloom creates index for individual columns even though user creates datamap on multiple columns. If user wants index for individual columns then he should create multiple datamaps not a single datamap.


---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/346/



---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Assign to datamap only fo...

Posted by xuchuanyin <gi...@git.apache.org>.

Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    @ravipesala  I do think assembling expressions and forward them to the corresponding datamap is a good idea. But current implementation needs more work to achieve this goal. We can not just pass the expressions to datamaps and let the datamap do the dirty work to do intersection or union of each expression.
    
    As you mentioned we can create multiple individual datamaps, this will also be a problem now. For example bloomfilter datamap only support 'equalTo' and 'in', but current DataMapChooser will forward 'NotEqual','Greater','Less'... to the datamap also, which will cause performance waste.
    
    Besides, for both BloomFilter datamap and Lucene datamap, we never treat specifying multiple columns in 'index_columns' as composite index, actually we just index them individually.


---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

Posted by ravipesala <gi...@git.apache.org>.

Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    @xuchuanyin In general if a datamap is created with a list of columns for suppose c1 and c2 then we create an index datamap on c1 and c2 and treat that as a composite index.  So if user queries with filters on both the columns then indexdatamap should combine both and make a single call of prune of that datamap because user has created composite datamap. 


---

[GitHub] carbondata pull request #2665: [CARBONDATA-2897][DataMap] Optimize datamap c...

Posted by kevinjmh <gi...@git.apache.org>.

Github user kevinjmh commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2665#discussion_r218679293
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapChooser.java ---
    @@ -268,23 +238,38 @@ private ExpressionTuple selectDataMap(Expression expression, List<TableDataMap>
     
       private void extractColumnExpression(Expression expression,
           List<ColumnExpression> columnExpressions) {
    -    if (expression instanceof ColumnExpression) {
    -      columnExpressions.add((ColumnExpression) expression);
    -    } else if (expression instanceof MatchExpression) {
    -      // this is a special case for lucene
    -      // build a fake ColumnExpression to filter datamaps which contain target column
    -      // a Lucene query string is alike "column:query term"
    -      String[] queryItems = expression.getString().split(":", 2);
    -      if (queryItems.length == 2) {
    -        columnExpressions.add(new ColumnExpression(queryItems[0], null));
    -      }
    -    } else if (expression != null) {
    -      List<Expression> children = expression.getChildren();
    -      if (children != null && children.size() > 0) {
    -        for (Expression exp : children) {
    -          extractColumnExpression(exp, columnExpressions);
    +    switch (expression.getFilterExpressionType()) {
    --- End diff --
    
    Change to check by method `isSupport` in datamap factory


---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/344/



---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8590/



---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

Posted by ravipesala <gi...@git.apache.org>.

Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6441/



---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Assign to datamap only fo...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8781/



---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/379/



---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8131/



---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

Posted by kevinjmh <gi...@git.apache.org>.

Github user kevinjmh commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    retest this please


---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/8627/



---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

Posted by xuchuanyin <gi...@git.apache.org>.

Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    @ravipesala 
    It's a good idea to do multiple pruning at one time, but it also has shortcomings.
    
    1. The datamap has to deal with the AND/OR logic of the composite expressions. If the expressions are AND, it needs to do intersection of the pruned result; If the expressions are OR, it needs to do union of the pruned result. It will makes the datamap complicated to handle.
    
    2. Besides, you can see it  from the maillist that, current expressions forwarded to BloomDataMap contains too much unwanted expressions.


---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/536/



---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

Posted by xuchuanyin <gi...@git.apache.org>.

Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    @ravipesala What I want to say is that even if we support composite index, the current datamap chooser still failed to work, because it composites irrelevant expressions and forwards that to the datamap ignoring that the expression contains non-index columns. You can see it from the logs in the maillist that currently we have to deal 27 expressions while after applying this patch we only need to deal 4 expressions.


---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/521/



---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Assign to datamap only fo...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/816/



---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

Posted by QiangCai <gi...@git.apache.org>.

Github user QiangCai commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    retest this please


---

[GitHub] carbondata issue #2665: [CARBONDATA-2897][DataMap] Optimize datamap chooser

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2665
  
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/637/



---