You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by manishgupta88 <gi...@git.apache.org> on 2018/07/09 11:58:55 UTC
[GitHub] carbondata pull request #2467: [WIP] [CARBONDATA-2649] Add code for caching ...
GitHub user manishgupta88 opened a pull request:
https://github.com/apache/carbondata/pull/2467
[WIP] [CARBONDATA-2649] Add code for caching min/max only for specified columns
Things done as part of this PR
1. Supported configuring column for caching min/max in driver
2. Added test cases for query verification for COLUMN_META_CACHE and CACHE_LEVEL properties
3. Handled comments for PR #2454
This PR is dependent on PR #2454
- [ ] Any interfaces changed?
No
- [ ] Any backward compatibility impacted?
No
- [ ] Document update required?
Yes. Will be done as part of jira-2651
- [ ] Testing done
In Progress
- [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
NA
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/manishgupta88/carbondata configurable_cache_columns
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/carbondata/pull/2467.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2467
----
commit 30c96944fa11a63cabfcecbd1fd45643c68106f2
Author: manishgupta88 <to...@...>
Date: 2018-07-04T15:30:54Z
Refactor Block and Blocklet DataMap to store only segmentProeprties Index instead of segmentProperties
commit b7a1c24a284b56e7bb95029c37256abadea5cbd7
Author: manishgupta88 <to...@...>
Date: 2018-07-08T08:54:21Z
supported configuring column for min/max caching in driver
----
---
[GitHub] carbondata issue #2467: [CARBONDATA-2649] Add code for caching min/max only ...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2467
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5773/
---
[GitHub] carbondata pull request #2467: [CARBONDATA-2649] Add code for caching min/ma...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/carbondata/pull/2467
---
[GitHub] carbondata issue #2467: [CARBONDATA-2649] Add code for caching min/max only ...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2467
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6969/
---
[GitHub] carbondata pull request #2467: [CARBONDATA-2649] Add code for caching min/ma...
Posted by kumarvishal09 <gi...@git.apache.org>.
Github user kumarvishal09 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2467#discussion_r201242748
--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/block/SegmentPropertiesAndSchemaHolder.java ---
@@ -289,20 +293,24 @@ public SegmentProperties getSegmentProperties() {
return columnCardinality;
}
- public CarbonRowSchema[] getBlockSchema() {
- return SchemaGenerator.createBlockSchema(segmentProperties);
+ public synchronized CarbonRowSchema[] getTaskSummarySchema() {
+ return taskSummarySchema;
}
- public CarbonRowSchema[] getBlocketSchema() {
- return SchemaGenerator.createBlockletSchema(segmentProperties);
+ public synchronized void setTaskSummarySchema(CarbonRowSchema[] taskSummarySchema) {
--- End diff --
I think synchronized is no required here , please check
---
[GitHub] carbondata issue #2467: [WIP] [CARBONDATA-2649] Add code for caching min/max...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2467
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5733/
---
[GitHub] carbondata issue #2467: [CARBONDATA-2649] Add code for caching min/max only ...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2467
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5744/
---
[GitHub] carbondata issue #2467: [CARBONDATA-2649] Add code for caching min/max only ...
Posted by kumarvishal09 <gi...@git.apache.org>.
Github user kumarvishal09 commented on the issue:
https://github.com/apache/carbondata/pull/2467
Lgtm
---
[GitHub] carbondata issue #2467: [CARBONDATA-2649] Add code for caching min/max only ...
Posted by manishgupta88 <gi...@git.apache.org>.
Github user manishgupta88 commented on the issue:
https://github.com/apache/carbondata/pull/2467
@kumarvishal09 ...handled review comments...kindly review and merge
---
[GitHub] carbondata pull request #2467: [CARBONDATA-2649] Add code for caching min/ma...
Posted by kumarvishal09 <gi...@git.apache.org>.
Github user kumarvishal09 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2467#discussion_r201292476
--- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/FilterUtil.java ---
@@ -251,6 +273,137 @@ private static FilterExecuter getIncludeFilterExecuter(
}
}
+ /**
+ * check if current need to be replaced with TrueFilter expression. This will happen in case
+ * filter column min/max is not cached in the driver
+ *
+ * @param dimColEvaluatorInfoList
+ * @param msrColEvaluatorInfoList
+ * @param segmentProperties
+ * @param minMaxCacheColumns
+ * @return
+ */
+ private static boolean checkIfCurrentNodeToBeReplacedWithTrueFilterExpression(
+ List<DimColumnResolvedFilterInfo> dimColEvaluatorInfoList,
+ List<MeasureColumnResolvedFilterInfo> msrColEvaluatorInfoList,
+ SegmentProperties segmentProperties, List<CarbonColumn> minMaxCacheColumns) {
+ boolean replaceCurrentNodeWithTrueFilter = false;
+ ColumnResolvedFilterInfo columnResolvedFilterInfo = null;
+ if (!msrColEvaluatorInfoList.isEmpty()) {
+ columnResolvedFilterInfo = msrColEvaluatorInfoList.get(0);
+ replaceCurrentNodeWithTrueFilter =
+ checkIfFilterColumnIsCachedInDriver(columnResolvedFilterInfo, segmentProperties,
+ minMaxCacheColumns, true);
+ } else {
+ columnResolvedFilterInfo = dimColEvaluatorInfoList.get(0);
+ if (!columnResolvedFilterInfo.getDimension().hasEncoding(Encoding.IMPLICIT)) {
+ replaceCurrentNodeWithTrueFilter =
+ checkIfFilterColumnIsCachedInDriver(columnResolvedFilterInfo, segmentProperties,
+ minMaxCacheColumns, false);
+ }
+ }
+ return replaceCurrentNodeWithTrueFilter;
+ }
+
+ /**
+ * check if current need to be replaced with TrueFilter expression. This will happen in case
+ * filter column min/max is not cached in the driver
+ *
+ * @param filterExpressionResolverTree
+ * @param segmentProperties
+ * @param minMaxCacheColumns
+ * @return
+ */
+ private static boolean checkIfCurrentNodeToBeReplacedWithTrueFilterExpression(
+ FilterResolverIntf filterExpressionResolverTree, SegmentProperties segmentProperties,
+ List<CarbonColumn> minMaxCacheColumns) {
+ boolean replaceCurrentNodeWithTrueFilter = false;
+ ColumnResolvedFilterInfo columnResolvedFilterInfo = null;
+ if (null != filterExpressionResolverTree.getMsrColResolvedFilterInfo()) {
+ columnResolvedFilterInfo = filterExpressionResolverTree.getMsrColResolvedFilterInfo();
+ replaceCurrentNodeWithTrueFilter =
+ checkIfFilterColumnIsCachedInDriver(columnResolvedFilterInfo, segmentProperties,
+ minMaxCacheColumns, true);
+ } else {
+ columnResolvedFilterInfo = filterExpressionResolverTree.getDimColResolvedFilterInfo();
+ if (!columnResolvedFilterInfo.getDimension().hasEncoding(Encoding.IMPLICIT)) {
+ replaceCurrentNodeWithTrueFilter =
+ checkIfFilterColumnIsCachedInDriver(columnResolvedFilterInfo, segmentProperties,
+ minMaxCacheColumns, false);
+ }
+ }
+ return replaceCurrentNodeWithTrueFilter;
+ }
+
+ /**
+ * Method to check whether current node needs to be replaced with true filter to avoid pruning
+ * for case when filter column is not cached in the min/max cached dimension
+ *
+ * @param columnResolvedFilterInfo
+ * @param segmentProperties
+ * @param minMaxCacheColumns
+ * @param isMeasure
+ * @return
+ */
+ private static boolean checkIfFilterColumnIsCachedInDriver(
+ ColumnResolvedFilterInfo columnResolvedFilterInfo, SegmentProperties segmentProperties,
+ List<CarbonColumn> minMaxCacheColumns, boolean isMeasure) {
+ boolean replaceCurrentNodeWithTrueFilter = false;
+ CarbonColumn columnFromCurrentBlock = null;
+ if (isMeasure) {
+ columnFromCurrentBlock = segmentProperties
+ .getMeasureFromCurrentBlock(columnResolvedFilterInfo.getMeasure().getColumnId());
+ } else {
+ columnFromCurrentBlock =
+ segmentProperties.getDimensionFromCurrentBlock(columnResolvedFilterInfo.getDimension());
+ }
+ if (null != columnFromCurrentBlock) {
+ // check for filter dimension in the cached column list
+ if (null != minMaxCacheColumns) {
+ int columnIndexInMinMaxByteArray =
+ getFilterDimensionIndexInCachedColumns(minMaxCacheColumns, columnFromCurrentBlock);
+ if (columnIndexInMinMaxByteArray != -1) {
+ columnResolvedFilterInfo.setColumnIndexInMinMaxByteArray(columnIndexInMinMaxByteArray);
+ } else {
+ // will be true only if column caching is enabled and current filter column is not cached
+ replaceCurrentNodeWithTrueFilter = true;
+ }
+ } else {
+ // if columns to be cached are not specified then in that case all columns will be cached
+ // and then the ordinal of column will be its index in the min/max byte array
+ if (isMeasure) {
+ columnResolvedFilterInfo.setColumnIndexInMinMaxByteArray(
+ segmentProperties.getLastDimensionColOrdinal() + columnFromCurrentBlock.getOrdinal());
+ } else {
+ columnResolvedFilterInfo
+ .setColumnIndexInMinMaxByteArray(columnFromCurrentBlock.getOrdinal());
+ }
+ }
+ }
+ return replaceCurrentNodeWithTrueFilter;
+ }
+
+ /**
+ * Method to check whether the filter dimension exists in the cached dimensions for a table
+ *
+ * @param carbonDimensionsToBeCached
+ * @param filterColumn
+ * @return
+ */
+ private static int getFilterDimensionIndexInCachedColumns(
--- End diff --
Change method name as it will be called for both dimension and measure columns
---
[GitHub] carbondata issue #2467: [CARBONDATA-2649] Add code for caching min/max only ...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2467
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5747/
---
[GitHub] carbondata issue #2467: [CARBONDATA-2649] Add code for caching min/max only ...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2467
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6964/
---
[GitHub] carbondata issue #2467: [CARBONDATA-2649] Add code for caching min/max only ...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2467
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6993/
---
[GitHub] carbondata issue #2467: [CARBONDATA-2649] Add code for caching min/max only ...
Posted by kumarvishal09 <gi...@git.apache.org>.
Github user kumarvishal09 commented on the issue:
https://github.com/apache/carbondata/pull/2467
@manishgupta88
LGTM except 2 minor comments , please re base
---
[GitHub] carbondata issue #2467: [CARBONDATA-2649] Add code for caching min/max only ...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2467
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5726/
---
[GitHub] carbondata issue #2467: [WIP] [CARBONDATA-2649] Add code for caching min/max...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2467
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6958/
---