You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by ravipesala <gi...@git.apache.org> on 2018/04/21 16:38:55 UTC

[GitHub] carbondata pull request #2204: [WIP] Added CG prune before FG prune.

GitHub user ravipesala opened a pull request:

    https://github.com/apache/carbondata/pull/2204

    [WIP]  Added CG prune before FG prune.

    Be sure to do all of the following checklist to help us incorporate 
    your contribution quickly and easily:
    
     - [ ] Any interfaces changed?
     
     - [ ] Any backward compatibility impacted?
     
     - [ ] Document update required?
    
     - [ ] Testing done
            Please provide details on 
            - Whether new unit test cases have been added or why no new tests are required?
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance test report.
            - Any additional information to help reviewers in testing this change.
           
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. 
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ravipesala/incubator-carbondata datamap-prune-fix

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2204.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2204
    
----
commit 4c0a4efaaaa6f4f23d274e75e333f18e3d5b9226
Author: ravipesala <ra...@...>
Date:   2018-04-21T16:29:50Z

    Added CG prune before FG prune.

----


---

[GitHub] carbondata pull request #2204: [CARBONDATA-2375] Added CG prune before FG pr...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2204#discussion_r183230082
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapChooser.java ---
    @@ -93,6 +93,48 @@ public DataMapExprWrapper choose(CarbonTable carbonTable, FilterResolverIntf res
             resolverIntf);
       }
     
    +  /**
    +   * Return a chosen datamap based on input filter. See {@link DataMapChooser}
    +   */
    +  public DataMapExprWrapper chooseFG(CarbonTable carbonTable, FilterResolverIntf resolverIntf)
    +      throws IOException {
    +    if (resolverIntf != null) {
    +      Expression expression = resolverIntf.getFilterExpression();
    +      // First check for FG datamaps if any exist
    +      List<TableDataMap> allDataMapFG =
    +          DataMapStoreManager.getInstance().getAllDataMap(carbonTable, DataMapLevel.FG);
    +      ExpressionTuple tuple = selectDataMap(expression, allDataMapFG, resolverIntf);
    +      if (tuple.dataMapExprWrapper != null) {
    +        return tuple.dataMapExprWrapper;
    +      }
    +    }
    +    // Return the default datamap if no other datamap exists.
    +    return null;
    +  }
    +
    +  /**
    +   * Return a chosen datamap based on input filter. See {@link DataMapChooser}
    --- End diff --
    
    `datamap` to `CG datamap`


---

[GitHub] carbondata pull request #2204: [CARBONDATA-2375] Added CG prune before FG pr...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2204#discussion_r183230077
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapChooser.java ---
    @@ -93,6 +93,48 @@ public DataMapExprWrapper choose(CarbonTable carbonTable, FilterResolverIntf res
             resolverIntf);
       }
     
    +  /**
    +   * Return a chosen datamap based on input filter. See {@link DataMapChooser}
    --- End diff --
    
    change `datamap` to `FG datamap`


---

[GitHub] carbondata pull request #2204: [CARBONDATA-2375] Added CG prune before FG pr...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2204#discussion_r183233431
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java ---
    @@ -359,23 +359,27 @@ protected Expression getFilterPredicates(Configuration configuration) {
             .getProperty(CarbonCommonConstants.USE_DISTRIBUTED_DATAMAP,
                 CarbonCommonConstants.USE_DISTRIBUTED_DATAMAP_DEFAULT));
         DataMapExprWrapper dataMapExprWrapper =
    -        DataMapChooser.get().choose(getOrCreateCarbonTable(job.getConfiguration()), resolver);
    +        DataMapChooser.get().chooseCG(getOrCreateCarbonTable(job.getConfiguration()), resolver);
         DataMapJob dataMapJob = getDataMapJob(job.getConfiguration());
         List<PartitionSpec> partitionsToPrune = getPartitionsToPrune(job.getConfiguration());
         List<ExtendedBlocklet> prunedBlocklets;
    -    DataMapLevel dataMapLevel = dataMapExprWrapper.getDataMapType();
    -    if (dataMapJob != null &&
    -        (distributedCG ||
    -        (dataMapLevel == DataMapLevel.FG && isFgDataMapPruningEnable(job.getConfiguration())))) {
    -      DistributableDataMapFormat datamapDstr =
    -          new DistributableDataMapFormat(carbonTable, dataMapExprWrapper, segmentIds,
    -              partitionsToPrune, BlockletDataMapFactory.class.getName());
    -      prunedBlocklets = dataMapJob.execute(datamapDstr, resolver);
    -      // Apply expression on the blocklets.
    -      prunedBlocklets = dataMapExprWrapper.pruneBlocklets(prunedBlocklets);
    +    if (distributedCG) {
    +      prunedBlocklets =
    +          executeDataMapJob(carbonTable, resolver, segmentIds, dataMapExprWrapper, dataMapJob,
    +              partitionsToPrune);
         } else {
           prunedBlocklets = dataMapExprWrapper.prune(segmentIds, partitionsToPrune);
         }
    +    dataMapExprWrapper =
    +        DataMapChooser.get().chooseFG(getOrCreateCarbonTable(job.getConfiguration()), resolver);
    +    if (dataMapExprWrapper != null &&
    +        dataMapExprWrapper.getDataMapType() == DataMapLevel.FG &&
    +        isFgDataMapPruningEnable(job.getConfiguration())) {
    --- End diff --
    
    ok


---

[GitHub] carbondata issue #2204: [CARBONDATA-2375] Added CG prune before FG prune.

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2204
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5302/



---

[GitHub] carbondata pull request #2204: [CARBONDATA-2375] Added CG prune before FG pr...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2204#discussion_r183230236
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java ---
    @@ -417,6 +421,38 @@ protected Expression getFilterPredicates(Configuration configuration) {
         return resultFilterredBlocks;
       }
     
    +  private List<ExtendedBlocklet> executeDataMapJob(CarbonTable carbonTable,
    +      FilterResolverIntf resolver, List<Segment> segmentIds, DataMapExprWrapper dataMapExprWrapper,
    +      DataMapJob dataMapJob, List<PartitionSpec> partitionsToPrune) throws IOException {
    +    DistributableDataMapFormat datamapDstr =
    +        new DistributableDataMapFormat(carbonTable, dataMapExprWrapper, segmentIds,
    +            partitionsToPrune, BlockletDataMapFactory.class.getName());
    +    List<ExtendedBlocklet> prunedBlocklets = dataMapJob.execute(datamapDstr, resolver);
    +    // Apply expression on the blocklets.
    +    prunedBlocklets = dataMapExprWrapper.pruneBlocklets(prunedBlocklets);
    +    return prunedBlocklets;
    +  }
    +
    +  private void updateSegments(List<Segment> segments, List<ExtendedBlocklet> prunedBlocklets) {
    --- End diff --
    
    please provide comment for this function, it is not easy to understand what it does


---

[GitHub] carbondata pull request #2204: [CARBONDATA-2375] Added CG prune before FG pr...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2204#discussion_r183230061
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapChooser.java ---
    @@ -93,6 +93,48 @@ public DataMapExprWrapper choose(CarbonTable carbonTable, FilterResolverIntf res
             resolverIntf);
       }
     
    +  /**
    +   * Return a chosen datamap based on input filter. See {@link DataMapChooser}
    +   */
    +  public DataMapExprWrapper chooseFG(CarbonTable carbonTable, FilterResolverIntf resolverIntf)
    --- End diff --
    
    I think better to rename to `chooseFGDataMap`


---

[GitHub] carbondata issue #2204: [CARBONDATA-2375] Added CG prune before FG prune.

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/2204
  
    LGTM


---

[GitHub] carbondata pull request #2204: [CARBONDATA-2375] Added CG prune before FG pr...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2204#discussion_r183233432
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java ---
    @@ -417,6 +421,38 @@ protected Expression getFilterPredicates(Configuration configuration) {
         return resultFilterredBlocks;
       }
     
    +  private List<ExtendedBlocklet> executeDataMapJob(CarbonTable carbonTable,
    +      FilterResolverIntf resolver, List<Segment> segmentIds, DataMapExprWrapper dataMapExprWrapper,
    +      DataMapJob dataMapJob, List<PartitionSpec> partitionsToPrune) throws IOException {
    +    DistributableDataMapFormat datamapDstr =
    +        new DistributableDataMapFormat(carbonTable, dataMapExprWrapper, segmentIds,
    +            partitionsToPrune, BlockletDataMapFactory.class.getName());
    +    List<ExtendedBlocklet> prunedBlocklets = dataMapJob.execute(datamapDstr, resolver);
    +    // Apply expression on the blocklets.
    +    prunedBlocklets = dataMapExprWrapper.pruneBlocklets(prunedBlocklets);
    +    return prunedBlocklets;
    +  }
    +
    +  private void updateSegments(List<Segment> segments, List<ExtendedBlocklet> prunedBlocklets) {
    --- End diff --
    
    ok, added


---

[GitHub] carbondata pull request #2204: [CARBONDATA-2375] Added CG prune before FG pr...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2204#discussion_r183230067
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapChooser.java ---
    @@ -93,6 +93,48 @@ public DataMapExprWrapper choose(CarbonTable carbonTable, FilterResolverIntf res
             resolverIntf);
       }
     
    +  /**
    +   * Return a chosen datamap based on input filter. See {@link DataMapChooser}
    +   */
    +  public DataMapExprWrapper chooseFG(CarbonTable carbonTable, FilterResolverIntf resolverIntf)
    +      throws IOException {
    +    if (resolverIntf != null) {
    +      Expression expression = resolverIntf.getFilterExpression();
    +      // First check for FG datamaps if any exist
    +      List<TableDataMap> allDataMapFG =
    +          DataMapStoreManager.getInstance().getAllDataMap(carbonTable, DataMapLevel.FG);
    +      ExpressionTuple tuple = selectDataMap(expression, allDataMapFG, resolverIntf);
    +      if (tuple.dataMapExprWrapper != null) {
    +        return tuple.dataMapExprWrapper;
    +      }
    +    }
    +    // Return the default datamap if no other datamap exists.
    +    return null;
    +  }
    +
    +  /**
    +   * Return a chosen datamap based on input filter. See {@link DataMapChooser}
    +   */
    +  public DataMapExprWrapper chooseCG(CarbonTable carbonTable, FilterResolverIntf resolverIntf)
    --- End diff --
    
    rename to `chooseCGDataMap`


---

[GitHub] carbondata pull request #2204: [CARBONDATA-2375] Added CG prune before FG pr...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/2204


---

[GitHub] carbondata pull request #2204: [CARBONDATA-2375] Added CG prune before FG pr...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2204#discussion_r183233439
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapChooser.java ---
    @@ -93,6 +93,48 @@ public DataMapExprWrapper choose(CarbonTable carbonTable, FilterResolverIntf res
             resolverIntf);
       }
     
    +  /**
    +   * Return a chosen datamap based on input filter. See {@link DataMapChooser}
    --- End diff --
    
    ok


---

[GitHub] carbondata issue #2204: [WIP] Added CG prune before FG prune.

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2204
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5283/



---

[GitHub] carbondata pull request #2204: [CARBONDATA-2375] Added CG prune before FG pr...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2204#discussion_r183233442
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapChooser.java ---
    @@ -93,6 +93,48 @@ public DataMapExprWrapper choose(CarbonTable carbonTable, FilterResolverIntf res
             resolverIntf);
       }
     
    +  /**
    +   * Return a chosen datamap based on input filter. See {@link DataMapChooser}
    +   */
    +  public DataMapExprWrapper chooseFG(CarbonTable carbonTable, FilterResolverIntf resolverIntf)
    +      throws IOException {
    +    if (resolverIntf != null) {
    +      Expression expression = resolverIntf.getFilterExpression();
    +      // First check for FG datamaps if any exist
    +      List<TableDataMap> allDataMapFG =
    +          DataMapStoreManager.getInstance().getAllDataMap(carbonTable, DataMapLevel.FG);
    +      ExpressionTuple tuple = selectDataMap(expression, allDataMapFG, resolverIntf);
    +      if (tuple.dataMapExprWrapper != null) {
    +        return tuple.dataMapExprWrapper;
    +      }
    +    }
    +    // Return the default datamap if no other datamap exists.
    +    return null;
    +  }
    +
    +  /**
    +   * Return a chosen datamap based on input filter. See {@link DataMapChooser}
    +   */
    +  public DataMapExprWrapper chooseCG(CarbonTable carbonTable, FilterResolverIntf resolverIntf)
    --- End diff --
    
    ok


---

[GitHub] carbondata issue #2204: [WIP] Added CG prune before FG prune.

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2204
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4103/



---

[GitHub] carbondata issue #2204: [CARBONDATA-2375] Added CG prune before FG prune.

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2204
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4122/



---

[GitHub] carbondata pull request #2204: [CARBONDATA-2375] Added CG prune before FG pr...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2204#discussion_r183233426
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java ---
    @@ -359,23 +359,27 @@ protected Expression getFilterPredicates(Configuration configuration) {
             .getProperty(CarbonCommonConstants.USE_DISTRIBUTED_DATAMAP,
                 CarbonCommonConstants.USE_DISTRIBUTED_DATAMAP_DEFAULT));
         DataMapExprWrapper dataMapExprWrapper =
    -        DataMapChooser.get().choose(getOrCreateCarbonTable(job.getConfiguration()), resolver);
    +        DataMapChooser.get().chooseCG(getOrCreateCarbonTable(job.getConfiguration()), resolver);
         DataMapJob dataMapJob = getDataMapJob(job.getConfiguration());
         List<PartitionSpec> partitionsToPrune = getPartitionsToPrune(job.getConfiguration());
         List<ExtendedBlocklet> prunedBlocklets;
    -    DataMapLevel dataMapLevel = dataMapExprWrapper.getDataMapType();
    -    if (dataMapJob != null &&
    -        (distributedCG ||
    -        (dataMapLevel == DataMapLevel.FG && isFgDataMapPruningEnable(job.getConfiguration())))) {
    -      DistributableDataMapFormat datamapDstr =
    -          new DistributableDataMapFormat(carbonTable, dataMapExprWrapper, segmentIds,
    -              partitionsToPrune, BlockletDataMapFactory.class.getName());
    -      prunedBlocklets = dataMapJob.execute(datamapDstr, resolver);
    -      // Apply expression on the blocklets.
    -      prunedBlocklets = dataMapExprWrapper.pruneBlocklets(prunedBlocklets);
    +    if (distributedCG) {
    --- End diff --
    
    Ok, corrected now. First always prune with Default datamap and then further pruned with remaining datamaps.


---

[GitHub] carbondata pull request #2204: [CARBONDATA-2375] Added CG prune before FG pr...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2204#discussion_r183230227
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java ---
    @@ -359,23 +359,27 @@ protected Expression getFilterPredicates(Configuration configuration) {
             .getProperty(CarbonCommonConstants.USE_DISTRIBUTED_DATAMAP,
                 CarbonCommonConstants.USE_DISTRIBUTED_DATAMAP_DEFAULT));
         DataMapExprWrapper dataMapExprWrapper =
    -        DataMapChooser.get().choose(getOrCreateCarbonTable(job.getConfiguration()), resolver);
    +        DataMapChooser.get().chooseCG(getOrCreateCarbonTable(job.getConfiguration()), resolver);
         DataMapJob dataMapJob = getDataMapJob(job.getConfiguration());
         List<PartitionSpec> partitionsToPrune = getPartitionsToPrune(job.getConfiguration());
         List<ExtendedBlocklet> prunedBlocklets;
    -    DataMapLevel dataMapLevel = dataMapExprWrapper.getDataMapType();
    -    if (dataMapJob != null &&
    -        (distributedCG ||
    -        (dataMapLevel == DataMapLevel.FG && isFgDataMapPruningEnable(job.getConfiguration())))) {
    -      DistributableDataMapFormat datamapDstr =
    -          new DistributableDataMapFormat(carbonTable, dataMapExprWrapper, segmentIds,
    -              partitionsToPrune, BlockletDataMapFactory.class.getName());
    -      prunedBlocklets = dataMapJob.execute(datamapDstr, resolver);
    -      // Apply expression on the blocklets.
    -      prunedBlocklets = dataMapExprWrapper.pruneBlocklets(prunedBlocklets);
    +    if (distributedCG) {
    +      prunedBlocklets =
    +          executeDataMapJob(carbonTable, resolver, segmentIds, dataMapExprWrapper, dataMapJob,
    +              partitionsToPrune);
         } else {
           prunedBlocklets = dataMapExprWrapper.prune(segmentIds, partitionsToPrune);
         }
    +    dataMapExprWrapper =
    +        DataMapChooser.get().chooseFG(getOrCreateCarbonTable(job.getConfiguration()), resolver);
    +    if (dataMapExprWrapper != null &&
    +        dataMapExprWrapper.getDataMapType() == DataMapLevel.FG &&
    +        isFgDataMapPruningEnable(job.getConfiguration())) {
    --- End diff --
    
    seems `datatMapJob != null` is missing


---

[GitHub] carbondata pull request #2204: [CARBONDATA-2375] Added CG prune before FG pr...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2204#discussion_r183230216
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java ---
    @@ -359,23 +359,27 @@ protected Expression getFilterPredicates(Configuration configuration) {
             .getProperty(CarbonCommonConstants.USE_DISTRIBUTED_DATAMAP,
                 CarbonCommonConstants.USE_DISTRIBUTED_DATAMAP_DEFAULT));
         DataMapExprWrapper dataMapExprWrapper =
    -        DataMapChooser.get().choose(getOrCreateCarbonTable(job.getConfiguration()), resolver);
    +        DataMapChooser.get().chooseCG(getOrCreateCarbonTable(job.getConfiguration()), resolver);
         DataMapJob dataMapJob = getDataMapJob(job.getConfiguration());
         List<PartitionSpec> partitionsToPrune = getPartitionsToPrune(job.getConfiguration());
         List<ExtendedBlocklet> prunedBlocklets;
    -    DataMapLevel dataMapLevel = dataMapExprWrapper.getDataMapType();
    -    if (dataMapJob != null &&
    -        (distributedCG ||
    -        (dataMapLevel == DataMapLevel.FG && isFgDataMapPruningEnable(job.getConfiguration())))) {
    -      DistributableDataMapFormat datamapDstr =
    -          new DistributableDataMapFormat(carbonTable, dataMapExprWrapper, segmentIds,
    -              partitionsToPrune, BlockletDataMapFactory.class.getName());
    -      prunedBlocklets = dataMapJob.execute(datamapDstr, resolver);
    -      // Apply expression on the blocklets.
    -      prunedBlocklets = dataMapExprWrapper.pruneBlocklets(prunedBlocklets);
    +    if (distributedCG) {
    --- End diff --
    
    I am a bit confused. For CG, there is always a BlockletDataMap (CG), and user may define additional CG datamap. So there are 1 + N CG datamap where N >=0. In this case, we should prune using BlockletDataMap first, then use user defined CG datamap. Am I right?
    If I am correct, this if check should change to `if (dataMapJob != null)`?


---

[GitHub] carbondata pull request #2204: [CARBONDATA-2375] Added CG prune before FG pr...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2204#discussion_r183233446
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapChooser.java ---
    @@ -93,6 +93,48 @@ public DataMapExprWrapper choose(CarbonTable carbonTable, FilterResolverIntf res
             resolverIntf);
       }
     
    +  /**
    +   * Return a chosen datamap based on input filter. See {@link DataMapChooser}
    +   */
    +  public DataMapExprWrapper chooseFG(CarbonTable carbonTable, FilterResolverIntf resolverIntf)
    --- End diff --
    
    ok


---