You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by manishgupta88 <gi...@git.apache.org> on 2018/07/09 11:58:55 UTC

[GitHub] carbondata pull request #2467: [WIP] [CARBONDATA-2649] Add code for caching ...

GitHub user manishgupta88 opened a pull request:

    https://github.com/apache/carbondata/pull/2467

    [WIP] [CARBONDATA-2649] Add code for caching min/max only for specified columns

    Things done as part of this PR
    1. Supported configuring column for caching min/max in driver
    2. Added test cases for query verification for COLUMN_META_CACHE and CACHE_LEVEL properties
    3. Handled comments for PR #2454 
    
    This PR is dependent on PR #2454 
    
     - [ ] Any interfaces changed?
     No
     - [ ] Any backward compatibility impacted?
     No
     - [ ] Document update required?
    Yes. Will be done as part of jira-2651
     - [ ] Testing done
    In Progress
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. 
    NA


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/manishgupta88/carbondata configurable_cache_columns

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2467.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2467
    
----
commit 30c96944fa11a63cabfcecbd1fd45643c68106f2
Author: manishgupta88 <to...@...>
Date:   2018-07-04T15:30:54Z

    Refactor Block and Blocklet DataMap to store only segmentProeprties Index instead of segmentProperties

commit b7a1c24a284b56e7bb95029c37256abadea5cbd7
Author: manishgupta88 <to...@...>
Date:   2018-07-08T08:54:21Z

    supported configuring column for min/max caching in driver

----


---

[GitHub] carbondata issue #2467: [CARBONDATA-2649] Add code for caching min/max only ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2467
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5773/



---

[GitHub] carbondata pull request #2467: [CARBONDATA-2649] Add code for caching min/ma...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/2467


---

[GitHub] carbondata issue #2467: [CARBONDATA-2649] Add code for caching min/max only ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2467
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6969/



---

[GitHub] carbondata pull request #2467: [CARBONDATA-2649] Add code for caching min/ma...

Posted by kumarvishal09 <gi...@git.apache.org>.
Github user kumarvishal09 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2467#discussion_r201242748
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/block/SegmentPropertiesAndSchemaHolder.java ---
    @@ -289,20 +293,24 @@ public SegmentProperties getSegmentProperties() {
           return columnCardinality;
         }
     
    -    public CarbonRowSchema[] getBlockSchema() {
    -      return SchemaGenerator.createBlockSchema(segmentProperties);
    +    public synchronized CarbonRowSchema[] getTaskSummarySchema() {
    +      return taskSummarySchema;
         }
     
    -    public CarbonRowSchema[] getBlocketSchema() {
    -      return SchemaGenerator.createBlockletSchema(segmentProperties);
    +    public synchronized void setTaskSummarySchema(CarbonRowSchema[] taskSummarySchema) {
    --- End diff --
    
    I think synchronized is no required here , please check 


---

[GitHub] carbondata issue #2467: [WIP] [CARBONDATA-2649] Add code for caching min/max...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2467
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5733/



---

[GitHub] carbondata issue #2467: [CARBONDATA-2649] Add code for caching min/max only ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2467
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5744/



---

[GitHub] carbondata issue #2467: [CARBONDATA-2649] Add code for caching min/max only ...

Posted by kumarvishal09 <gi...@git.apache.org>.
Github user kumarvishal09 commented on the issue:

    https://github.com/apache/carbondata/pull/2467
  
    Lgtm


---

[GitHub] carbondata issue #2467: [CARBONDATA-2649] Add code for caching min/max only ...

Posted by manishgupta88 <gi...@git.apache.org>.
Github user manishgupta88 commented on the issue:

    https://github.com/apache/carbondata/pull/2467
  
    @kumarvishal09 ...handled review comments...kindly review and merge


---

[GitHub] carbondata pull request #2467: [CARBONDATA-2649] Add code for caching min/ma...

Posted by kumarvishal09 <gi...@git.apache.org>.
Github user kumarvishal09 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2467#discussion_r201292476
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/FilterUtil.java ---
    @@ -251,6 +273,137 @@ private static FilterExecuter getIncludeFilterExecuter(
         }
       }
     
    +  /**
    +   * check if current need to be replaced with TrueFilter expression. This will happen in case
    +   * filter column min/max is not cached in the driver
    +   *
    +   * @param dimColEvaluatorInfoList
    +   * @param msrColEvaluatorInfoList
    +   * @param segmentProperties
    +   * @param minMaxCacheColumns
    +   * @return
    +   */
    +  private static boolean checkIfCurrentNodeToBeReplacedWithTrueFilterExpression(
    +      List<DimColumnResolvedFilterInfo> dimColEvaluatorInfoList,
    +      List<MeasureColumnResolvedFilterInfo> msrColEvaluatorInfoList,
    +      SegmentProperties segmentProperties, List<CarbonColumn> minMaxCacheColumns) {
    +    boolean replaceCurrentNodeWithTrueFilter = false;
    +    ColumnResolvedFilterInfo columnResolvedFilterInfo = null;
    +    if (!msrColEvaluatorInfoList.isEmpty()) {
    +      columnResolvedFilterInfo = msrColEvaluatorInfoList.get(0);
    +      replaceCurrentNodeWithTrueFilter =
    +          checkIfFilterColumnIsCachedInDriver(columnResolvedFilterInfo, segmentProperties,
    +              minMaxCacheColumns, true);
    +    } else {
    +      columnResolvedFilterInfo = dimColEvaluatorInfoList.get(0);
    +      if (!columnResolvedFilterInfo.getDimension().hasEncoding(Encoding.IMPLICIT)) {
    +        replaceCurrentNodeWithTrueFilter =
    +            checkIfFilterColumnIsCachedInDriver(columnResolvedFilterInfo, segmentProperties,
    +                minMaxCacheColumns, false);
    +      }
    +    }
    +    return replaceCurrentNodeWithTrueFilter;
    +  }
    +
    +  /**
    +   * check if current need to be replaced with TrueFilter expression. This will happen in case
    +   * filter column min/max is not cached in the driver
    +   *
    +   * @param filterExpressionResolverTree
    +   * @param segmentProperties
    +   * @param minMaxCacheColumns
    +   * @return
    +   */
    +  private static boolean checkIfCurrentNodeToBeReplacedWithTrueFilterExpression(
    +      FilterResolverIntf filterExpressionResolverTree, SegmentProperties segmentProperties,
    +      List<CarbonColumn> minMaxCacheColumns) {
    +    boolean replaceCurrentNodeWithTrueFilter = false;
    +    ColumnResolvedFilterInfo columnResolvedFilterInfo = null;
    +    if (null != filterExpressionResolverTree.getMsrColResolvedFilterInfo()) {
    +      columnResolvedFilterInfo = filterExpressionResolverTree.getMsrColResolvedFilterInfo();
    +      replaceCurrentNodeWithTrueFilter =
    +          checkIfFilterColumnIsCachedInDriver(columnResolvedFilterInfo, segmentProperties,
    +              minMaxCacheColumns, true);
    +    } else {
    +      columnResolvedFilterInfo = filterExpressionResolverTree.getDimColResolvedFilterInfo();
    +      if (!columnResolvedFilterInfo.getDimension().hasEncoding(Encoding.IMPLICIT)) {
    +        replaceCurrentNodeWithTrueFilter =
    +            checkIfFilterColumnIsCachedInDriver(columnResolvedFilterInfo, segmentProperties,
    +                minMaxCacheColumns, false);
    +      }
    +    }
    +    return replaceCurrentNodeWithTrueFilter;
    +  }
    +
    +  /**
    +   * Method to check whether current node needs to be replaced with true filter to avoid pruning
    +   * for case when filter column is not cached in the min/max cached dimension
    +   *
    +   * @param columnResolvedFilterInfo
    +   * @param segmentProperties
    +   * @param minMaxCacheColumns
    +   * @param isMeasure
    +   * @return
    +   */
    +  private static boolean checkIfFilterColumnIsCachedInDriver(
    +      ColumnResolvedFilterInfo columnResolvedFilterInfo, SegmentProperties segmentProperties,
    +      List<CarbonColumn> minMaxCacheColumns, boolean isMeasure) {
    +    boolean replaceCurrentNodeWithTrueFilter = false;
    +    CarbonColumn columnFromCurrentBlock = null;
    +    if (isMeasure) {
    +      columnFromCurrentBlock = segmentProperties
    +          .getMeasureFromCurrentBlock(columnResolvedFilterInfo.getMeasure().getColumnId());
    +    } else {
    +      columnFromCurrentBlock =
    +          segmentProperties.getDimensionFromCurrentBlock(columnResolvedFilterInfo.getDimension());
    +    }
    +    if (null != columnFromCurrentBlock) {
    +      // check for filter dimension in the cached column list
    +      if (null != minMaxCacheColumns) {
    +        int columnIndexInMinMaxByteArray =
    +            getFilterDimensionIndexInCachedColumns(minMaxCacheColumns, columnFromCurrentBlock);
    +        if (columnIndexInMinMaxByteArray != -1) {
    +          columnResolvedFilterInfo.setColumnIndexInMinMaxByteArray(columnIndexInMinMaxByteArray);
    +        } else {
    +          // will be true only if column caching is enabled and current filter column is not cached
    +          replaceCurrentNodeWithTrueFilter = true;
    +        }
    +      } else {
    +        // if columns to be cached are not specified then in that case all columns will be cached
    +        // and  then the ordinal of column will be its index in the min/max byte array
    +        if (isMeasure) {
    +          columnResolvedFilterInfo.setColumnIndexInMinMaxByteArray(
    +              segmentProperties.getLastDimensionColOrdinal() + columnFromCurrentBlock.getOrdinal());
    +        } else {
    +          columnResolvedFilterInfo
    +              .setColumnIndexInMinMaxByteArray(columnFromCurrentBlock.getOrdinal());
    +        }
    +      }
    +    }
    +    return replaceCurrentNodeWithTrueFilter;
    +  }
    +
    +  /**
    +   * Method to check whether the filter dimension exists in the cached dimensions for a table
    +   *
    +   * @param carbonDimensionsToBeCached
    +   * @param filterColumn
    +   * @return
    +   */
    +  private static int getFilterDimensionIndexInCachedColumns(
    --- End diff --
    
    Change method name as it will be called for both dimension and measure columns


---

[GitHub] carbondata issue #2467: [CARBONDATA-2649] Add code for caching min/max only ...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2467
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5747/



---

[GitHub] carbondata issue #2467: [CARBONDATA-2649] Add code for caching min/max only ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2467
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6964/



---

[GitHub] carbondata issue #2467: [CARBONDATA-2649] Add code for caching min/max only ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2467
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6993/



---

[GitHub] carbondata issue #2467: [CARBONDATA-2649] Add code for caching min/max only ...

Posted by kumarvishal09 <gi...@git.apache.org>.
Github user kumarvishal09 commented on the issue:

    https://github.com/apache/carbondata/pull/2467
  
    @manishgupta88 
    LGTM except 2 minor comments , please re base


---

[GitHub] carbondata issue #2467: [CARBONDATA-2649] Add code for caching min/max only ...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2467
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5726/



---

[GitHub] carbondata issue #2467: [WIP] [CARBONDATA-2649] Add code for caching min/max...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2467
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6958/



---