You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by ravipesala <gi...@git.apache.org> on 2018/04/21 16:38:55 UTC
[GitHub] carbondata pull request #2204: [WIP] Added CG prune before FG prune.
GitHub user ravipesala opened a pull request:
https://github.com/apache/carbondata/pull/2204
[WIP] Added CG prune before FG prune.
Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:
- [ ] Any interfaces changed?
- [ ] Any backward compatibility impacted?
- [ ] Document update required?
- [ ] Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance test report.
- Any additional information to help reviewers in testing this change.
- [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ravipesala/incubator-carbondata datamap-prune-fix
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/carbondata/pull/2204.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2204
----
commit 4c0a4efaaaa6f4f23d274e75e333f18e3d5b9226
Author: ravipesala <ra...@...>
Date: 2018-04-21T16:29:50Z
Added CG prune before FG prune.
----
---
[GitHub] carbondata pull request #2204: [CARBONDATA-2375] Added CG prune before FG pr...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2204#discussion_r183230082
--- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapChooser.java ---
@@ -93,6 +93,48 @@ public DataMapExprWrapper choose(CarbonTable carbonTable, FilterResolverIntf res
resolverIntf);
}
+ /**
+ * Return a chosen datamap based on input filter. See {@link DataMapChooser}
+ */
+ public DataMapExprWrapper chooseFG(CarbonTable carbonTable, FilterResolverIntf resolverIntf)
+ throws IOException {
+ if (resolverIntf != null) {
+ Expression expression = resolverIntf.getFilterExpression();
+ // First check for FG datamaps if any exist
+ List<TableDataMap> allDataMapFG =
+ DataMapStoreManager.getInstance().getAllDataMap(carbonTable, DataMapLevel.FG);
+ ExpressionTuple tuple = selectDataMap(expression, allDataMapFG, resolverIntf);
+ if (tuple.dataMapExprWrapper != null) {
+ return tuple.dataMapExprWrapper;
+ }
+ }
+ // Return the default datamap if no other datamap exists.
+ return null;
+ }
+
+ /**
+ * Return a chosen datamap based on input filter. See {@link DataMapChooser}
--- End diff --
`datamap` to `CG datamap`
---
[GitHub] carbondata pull request #2204: [CARBONDATA-2375] Added CG prune before FG pr...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2204#discussion_r183230077
--- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapChooser.java ---
@@ -93,6 +93,48 @@ public DataMapExprWrapper choose(CarbonTable carbonTable, FilterResolverIntf res
resolverIntf);
}
+ /**
+ * Return a chosen datamap based on input filter. See {@link DataMapChooser}
--- End diff --
change `datamap` to `FG datamap`
---
[GitHub] carbondata pull request #2204: [CARBONDATA-2375] Added CG prune before FG pr...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2204#discussion_r183233431
--- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java ---
@@ -359,23 +359,27 @@ protected Expression getFilterPredicates(Configuration configuration) {
.getProperty(CarbonCommonConstants.USE_DISTRIBUTED_DATAMAP,
CarbonCommonConstants.USE_DISTRIBUTED_DATAMAP_DEFAULT));
DataMapExprWrapper dataMapExprWrapper =
- DataMapChooser.get().choose(getOrCreateCarbonTable(job.getConfiguration()), resolver);
+ DataMapChooser.get().chooseCG(getOrCreateCarbonTable(job.getConfiguration()), resolver);
DataMapJob dataMapJob = getDataMapJob(job.getConfiguration());
List<PartitionSpec> partitionsToPrune = getPartitionsToPrune(job.getConfiguration());
List<ExtendedBlocklet> prunedBlocklets;
- DataMapLevel dataMapLevel = dataMapExprWrapper.getDataMapType();
- if (dataMapJob != null &&
- (distributedCG ||
- (dataMapLevel == DataMapLevel.FG && isFgDataMapPruningEnable(job.getConfiguration())))) {
- DistributableDataMapFormat datamapDstr =
- new DistributableDataMapFormat(carbonTable, dataMapExprWrapper, segmentIds,
- partitionsToPrune, BlockletDataMapFactory.class.getName());
- prunedBlocklets = dataMapJob.execute(datamapDstr, resolver);
- // Apply expression on the blocklets.
- prunedBlocklets = dataMapExprWrapper.pruneBlocklets(prunedBlocklets);
+ if (distributedCG) {
+ prunedBlocklets =
+ executeDataMapJob(carbonTable, resolver, segmentIds, dataMapExprWrapper, dataMapJob,
+ partitionsToPrune);
} else {
prunedBlocklets = dataMapExprWrapper.prune(segmentIds, partitionsToPrune);
}
+ dataMapExprWrapper =
+ DataMapChooser.get().chooseFG(getOrCreateCarbonTable(job.getConfiguration()), resolver);
+ if (dataMapExprWrapper != null &&
+ dataMapExprWrapper.getDataMapType() == DataMapLevel.FG &&
+ isFgDataMapPruningEnable(job.getConfiguration())) {
--- End diff --
ok
---
[GitHub] carbondata issue #2204: [CARBONDATA-2375] Added CG prune before FG prune.
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2204
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5302/
---
[GitHub] carbondata pull request #2204: [CARBONDATA-2375] Added CG prune before FG pr...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2204#discussion_r183230236
--- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java ---
@@ -417,6 +421,38 @@ protected Expression getFilterPredicates(Configuration configuration) {
return resultFilterredBlocks;
}
+ private List<ExtendedBlocklet> executeDataMapJob(CarbonTable carbonTable,
+ FilterResolverIntf resolver, List<Segment> segmentIds, DataMapExprWrapper dataMapExprWrapper,
+ DataMapJob dataMapJob, List<PartitionSpec> partitionsToPrune) throws IOException {
+ DistributableDataMapFormat datamapDstr =
+ new DistributableDataMapFormat(carbonTable, dataMapExprWrapper, segmentIds,
+ partitionsToPrune, BlockletDataMapFactory.class.getName());
+ List<ExtendedBlocklet> prunedBlocklets = dataMapJob.execute(datamapDstr, resolver);
+ // Apply expression on the blocklets.
+ prunedBlocklets = dataMapExprWrapper.pruneBlocklets(prunedBlocklets);
+ return prunedBlocklets;
+ }
+
+ private void updateSegments(List<Segment> segments, List<ExtendedBlocklet> prunedBlocklets) {
--- End diff --
please provide comment for this function, it is not easy to understand what it does
---
[GitHub] carbondata pull request #2204: [CARBONDATA-2375] Added CG prune before FG pr...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2204#discussion_r183230061
--- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapChooser.java ---
@@ -93,6 +93,48 @@ public DataMapExprWrapper choose(CarbonTable carbonTable, FilterResolverIntf res
resolverIntf);
}
+ /**
+ * Return a chosen datamap based on input filter. See {@link DataMapChooser}
+ */
+ public DataMapExprWrapper chooseFG(CarbonTable carbonTable, FilterResolverIntf resolverIntf)
--- End diff --
I think better to rename to `chooseFGDataMap`
---
[GitHub] carbondata issue #2204: [CARBONDATA-2375] Added CG prune before FG prune.
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on the issue:
https://github.com/apache/carbondata/pull/2204
LGTM
---
[GitHub] carbondata pull request #2204: [CARBONDATA-2375] Added CG prune before FG pr...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2204#discussion_r183233432
--- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java ---
@@ -417,6 +421,38 @@ protected Expression getFilterPredicates(Configuration configuration) {
return resultFilterredBlocks;
}
+ private List<ExtendedBlocklet> executeDataMapJob(CarbonTable carbonTable,
+ FilterResolverIntf resolver, List<Segment> segmentIds, DataMapExprWrapper dataMapExprWrapper,
+ DataMapJob dataMapJob, List<PartitionSpec> partitionsToPrune) throws IOException {
+ DistributableDataMapFormat datamapDstr =
+ new DistributableDataMapFormat(carbonTable, dataMapExprWrapper, segmentIds,
+ partitionsToPrune, BlockletDataMapFactory.class.getName());
+ List<ExtendedBlocklet> prunedBlocklets = dataMapJob.execute(datamapDstr, resolver);
+ // Apply expression on the blocklets.
+ prunedBlocklets = dataMapExprWrapper.pruneBlocklets(prunedBlocklets);
+ return prunedBlocklets;
+ }
+
+ private void updateSegments(List<Segment> segments, List<ExtendedBlocklet> prunedBlocklets) {
--- End diff --
ok, added
---
[GitHub] carbondata pull request #2204: [CARBONDATA-2375] Added CG prune before FG pr...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2204#discussion_r183230067
--- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapChooser.java ---
@@ -93,6 +93,48 @@ public DataMapExprWrapper choose(CarbonTable carbonTable, FilterResolverIntf res
resolverIntf);
}
+ /**
+ * Return a chosen datamap based on input filter. See {@link DataMapChooser}
+ */
+ public DataMapExprWrapper chooseFG(CarbonTable carbonTable, FilterResolverIntf resolverIntf)
+ throws IOException {
+ if (resolverIntf != null) {
+ Expression expression = resolverIntf.getFilterExpression();
+ // First check for FG datamaps if any exist
+ List<TableDataMap> allDataMapFG =
+ DataMapStoreManager.getInstance().getAllDataMap(carbonTable, DataMapLevel.FG);
+ ExpressionTuple tuple = selectDataMap(expression, allDataMapFG, resolverIntf);
+ if (tuple.dataMapExprWrapper != null) {
+ return tuple.dataMapExprWrapper;
+ }
+ }
+ // Return the default datamap if no other datamap exists.
+ return null;
+ }
+
+ /**
+ * Return a chosen datamap based on input filter. See {@link DataMapChooser}
+ */
+ public DataMapExprWrapper chooseCG(CarbonTable carbonTable, FilterResolverIntf resolverIntf)
--- End diff --
rename to `chooseCGDataMap`
---
[GitHub] carbondata pull request #2204: [CARBONDATA-2375] Added CG prune before FG pr...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/carbondata/pull/2204
---
[GitHub] carbondata pull request #2204: [CARBONDATA-2375] Added CG prune before FG pr...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2204#discussion_r183233439
--- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapChooser.java ---
@@ -93,6 +93,48 @@ public DataMapExprWrapper choose(CarbonTable carbonTable, FilterResolverIntf res
resolverIntf);
}
+ /**
+ * Return a chosen datamap based on input filter. See {@link DataMapChooser}
--- End diff --
ok
---
[GitHub] carbondata issue #2204: [WIP] Added CG prune before FG prune.
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2204
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5283/
---
[GitHub] carbondata pull request #2204: [CARBONDATA-2375] Added CG prune before FG pr...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2204#discussion_r183233442
--- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapChooser.java ---
@@ -93,6 +93,48 @@ public DataMapExprWrapper choose(CarbonTable carbonTable, FilterResolverIntf res
resolverIntf);
}
+ /**
+ * Return a chosen datamap based on input filter. See {@link DataMapChooser}
+ */
+ public DataMapExprWrapper chooseFG(CarbonTable carbonTable, FilterResolverIntf resolverIntf)
+ throws IOException {
+ if (resolverIntf != null) {
+ Expression expression = resolverIntf.getFilterExpression();
+ // First check for FG datamaps if any exist
+ List<TableDataMap> allDataMapFG =
+ DataMapStoreManager.getInstance().getAllDataMap(carbonTable, DataMapLevel.FG);
+ ExpressionTuple tuple = selectDataMap(expression, allDataMapFG, resolverIntf);
+ if (tuple.dataMapExprWrapper != null) {
+ return tuple.dataMapExprWrapper;
+ }
+ }
+ // Return the default datamap if no other datamap exists.
+ return null;
+ }
+
+ /**
+ * Return a chosen datamap based on input filter. See {@link DataMapChooser}
+ */
+ public DataMapExprWrapper chooseCG(CarbonTable carbonTable, FilterResolverIntf resolverIntf)
--- End diff --
ok
---
[GitHub] carbondata issue #2204: [WIP] Added CG prune before FG prune.
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2204
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4103/
---
[GitHub] carbondata issue #2204: [CARBONDATA-2375] Added CG prune before FG prune.
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2204
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4122/
---
[GitHub] carbondata pull request #2204: [CARBONDATA-2375] Added CG prune before FG pr...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2204#discussion_r183233426
--- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java ---
@@ -359,23 +359,27 @@ protected Expression getFilterPredicates(Configuration configuration) {
.getProperty(CarbonCommonConstants.USE_DISTRIBUTED_DATAMAP,
CarbonCommonConstants.USE_DISTRIBUTED_DATAMAP_DEFAULT));
DataMapExprWrapper dataMapExprWrapper =
- DataMapChooser.get().choose(getOrCreateCarbonTable(job.getConfiguration()), resolver);
+ DataMapChooser.get().chooseCG(getOrCreateCarbonTable(job.getConfiguration()), resolver);
DataMapJob dataMapJob = getDataMapJob(job.getConfiguration());
List<PartitionSpec> partitionsToPrune = getPartitionsToPrune(job.getConfiguration());
List<ExtendedBlocklet> prunedBlocklets;
- DataMapLevel dataMapLevel = dataMapExprWrapper.getDataMapType();
- if (dataMapJob != null &&
- (distributedCG ||
- (dataMapLevel == DataMapLevel.FG && isFgDataMapPruningEnable(job.getConfiguration())))) {
- DistributableDataMapFormat datamapDstr =
- new DistributableDataMapFormat(carbonTable, dataMapExprWrapper, segmentIds,
- partitionsToPrune, BlockletDataMapFactory.class.getName());
- prunedBlocklets = dataMapJob.execute(datamapDstr, resolver);
- // Apply expression on the blocklets.
- prunedBlocklets = dataMapExprWrapper.pruneBlocklets(prunedBlocklets);
+ if (distributedCG) {
--- End diff --
Ok, corrected now. First always prune with Default datamap and then further pruned with remaining datamaps.
---
[GitHub] carbondata pull request #2204: [CARBONDATA-2375] Added CG prune before FG pr...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2204#discussion_r183230227
--- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java ---
@@ -359,23 +359,27 @@ protected Expression getFilterPredicates(Configuration configuration) {
.getProperty(CarbonCommonConstants.USE_DISTRIBUTED_DATAMAP,
CarbonCommonConstants.USE_DISTRIBUTED_DATAMAP_DEFAULT));
DataMapExprWrapper dataMapExprWrapper =
- DataMapChooser.get().choose(getOrCreateCarbonTable(job.getConfiguration()), resolver);
+ DataMapChooser.get().chooseCG(getOrCreateCarbonTable(job.getConfiguration()), resolver);
DataMapJob dataMapJob = getDataMapJob(job.getConfiguration());
List<PartitionSpec> partitionsToPrune = getPartitionsToPrune(job.getConfiguration());
List<ExtendedBlocklet> prunedBlocklets;
- DataMapLevel dataMapLevel = dataMapExprWrapper.getDataMapType();
- if (dataMapJob != null &&
- (distributedCG ||
- (dataMapLevel == DataMapLevel.FG && isFgDataMapPruningEnable(job.getConfiguration())))) {
- DistributableDataMapFormat datamapDstr =
- new DistributableDataMapFormat(carbonTable, dataMapExprWrapper, segmentIds,
- partitionsToPrune, BlockletDataMapFactory.class.getName());
- prunedBlocklets = dataMapJob.execute(datamapDstr, resolver);
- // Apply expression on the blocklets.
- prunedBlocklets = dataMapExprWrapper.pruneBlocklets(prunedBlocklets);
+ if (distributedCG) {
+ prunedBlocklets =
+ executeDataMapJob(carbonTable, resolver, segmentIds, dataMapExprWrapper, dataMapJob,
+ partitionsToPrune);
} else {
prunedBlocklets = dataMapExprWrapper.prune(segmentIds, partitionsToPrune);
}
+ dataMapExprWrapper =
+ DataMapChooser.get().chooseFG(getOrCreateCarbonTable(job.getConfiguration()), resolver);
+ if (dataMapExprWrapper != null &&
+ dataMapExprWrapper.getDataMapType() == DataMapLevel.FG &&
+ isFgDataMapPruningEnable(job.getConfiguration())) {
--- End diff --
seems `datatMapJob != null` is missing
---
[GitHub] carbondata pull request #2204: [CARBONDATA-2375] Added CG prune before FG pr...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2204#discussion_r183230216
--- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java ---
@@ -359,23 +359,27 @@ protected Expression getFilterPredicates(Configuration configuration) {
.getProperty(CarbonCommonConstants.USE_DISTRIBUTED_DATAMAP,
CarbonCommonConstants.USE_DISTRIBUTED_DATAMAP_DEFAULT));
DataMapExprWrapper dataMapExprWrapper =
- DataMapChooser.get().choose(getOrCreateCarbonTable(job.getConfiguration()), resolver);
+ DataMapChooser.get().chooseCG(getOrCreateCarbonTable(job.getConfiguration()), resolver);
DataMapJob dataMapJob = getDataMapJob(job.getConfiguration());
List<PartitionSpec> partitionsToPrune = getPartitionsToPrune(job.getConfiguration());
List<ExtendedBlocklet> prunedBlocklets;
- DataMapLevel dataMapLevel = dataMapExprWrapper.getDataMapType();
- if (dataMapJob != null &&
- (distributedCG ||
- (dataMapLevel == DataMapLevel.FG && isFgDataMapPruningEnable(job.getConfiguration())))) {
- DistributableDataMapFormat datamapDstr =
- new DistributableDataMapFormat(carbonTable, dataMapExprWrapper, segmentIds,
- partitionsToPrune, BlockletDataMapFactory.class.getName());
- prunedBlocklets = dataMapJob.execute(datamapDstr, resolver);
- // Apply expression on the blocklets.
- prunedBlocklets = dataMapExprWrapper.pruneBlocklets(prunedBlocklets);
+ if (distributedCG) {
--- End diff --
I am a bit confused. For CG, there is always a BlockletDataMap (CG), and user may define additional CG datamap. So there are 1 + N CG datamap where N >=0. In this case, we should prune using BlockletDataMap first, then use user defined CG datamap. Am I right?
If I am correct, this if check should change to `if (dataMapJob != null)`?
---
[GitHub] carbondata pull request #2204: [CARBONDATA-2375] Added CG prune before FG pr...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2204#discussion_r183233446
--- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapChooser.java ---
@@ -93,6 +93,48 @@ public DataMapExprWrapper choose(CarbonTable carbonTable, FilterResolverIntf res
resolverIntf);
}
+ /**
+ * Return a chosen datamap based on input filter. See {@link DataMapChooser}
+ */
+ public DataMapExprWrapper chooseFG(CarbonTable carbonTable, FilterResolverIntf resolverIntf)
--- End diff --
ok
---