You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by ajantha-bhat <gi...@git.apache.org> on 2018/11/23 13:22:36 UTC
[GitHub] carbondata pull request #2949: [WIP] support parallel block pruning for non-...
GitHub user ajantha-bhat opened a pull request:
https://github.com/apache/carbondata/pull/2949
[WIP] support parallel block pruning for non-default datamaps
[WIP] support parallel block pruning for non-default datamaps
This PR dependent on #2936
Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:
- [ ] Any interfaces changed?
- [ ] Any backward compatibility impacted?
- [ ] Document update required?
- [ ] Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance test report.
- Any additional information to help reviewers in testing this change.
- [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ajantha-bhat/carbondata working_backup
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/carbondata/pull/2949.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2949
----
commit 6237d69fcc0ddc1a08c74579762b721108a251fe
Author: ajantha-bhat <aj...@...>
Date: 2018-11-20T16:45:06Z
parllelize block pruning
commit e8e912daf3ada357352e006ec9ce435d4c4b1625
Author: ajantha-bhat <aj...@...>
Date: 2018-11-22T11:01:53Z
reveiw comment fix
commit d0bf82f276618f6fa09cbce65f714394b5fa4e0c
Author: ajantha-bhat <aj...@...>
Date: 2018-11-23T13:22:07Z
support parallel pruning for non-default datamaps
----
---
[GitHub] carbondata issue #2949: [WIP] support parallel block pruning for non-default...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2949
Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9795/
---
[GitHub] carbondata pull request #2949: [CARBONDATA-3118] support parallel block prun...
Posted by ajantha-bhat <gi...@git.apache.org>.
Github user ajantha-bhat commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2949#discussion_r241737152
--- Diff: datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMap.java ---
@@ -436,4 +436,9 @@ public String toString() {
public void finish() {
}
+
+ @Override public int getNumberOfEntries() {
--- End diff --
done
---
[GitHub] carbondata pull request #2949: [WIP] support parallel block pruning for non-...
Posted by ajantha-bhat <gi...@git.apache.org>.
Github user ajantha-bhat commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2949#discussion_r236746764
--- Diff: core/src/main/java/org/apache/carbondata/core/datamap/dev/DataMap.java ---
@@ -70,4 +70,6 @@ void init(DataMapModel dataMapModel)
*/
void finish();
+ // can return , number of records information that are stored in datamap.
--- End diff --
ok, changed to just "returns"
---
[GitHub] carbondata pull request #2949: [CARBONDATA-3118] support parallel block prun...
Posted by ajantha-bhat <gi...@git.apache.org>.
Github user ajantha-bhat commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2949#discussion_r240900313
--- Diff: core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java ---
@@ -205,26 +195,53 @@ public BlockletDetailsFetcher getBlockletDetailsFetcher() {
final FilterResolverIntf filterExp, final List<PartitionSpec> partitions,
List<ExtendedBlocklet> blocklets, final Map<Segment, List<DataMap>> dataMaps,
int totalFiles) {
+ /*
+ *********************************************************************************
+ * Below is the example of how this part of code works.
+ * consider a scenario of having 5 segments, 10 datamaps in each segment,
--- End diff --
BlockDatamap and blockletDatamap can store multiple files information. Each file is one row in that datamap. But non-default datamaps are not like that, so default datamaps distribution in multithread happens based on number of entries in datamaps, for non-default datamps distribution is based on number of datamaps (one datamap is considered as one record for non-default datamaps)
ALso 10 datamap in a segment means, one merge index file has info of 10 index files
---
[GitHub] carbondata issue #2949: [CARBONDATA-3118] support parallel block pruning for...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2949
Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9818/
---
[GitHub] carbondata issue #2949: [CARBONDATA-3118] support parallel block pruning for...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2949
LGTM
---
[GitHub] carbondata pull request #2949: [CARBONDATA-3118] support parallel block prun...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2949#discussion_r241279625
--- Diff: datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMap.java ---
@@ -436,4 +436,9 @@ public String toString() {
public void finish() {
}
+
+ @Override public int getNumberOfEntries() {
--- End diff --
Move this method to available abstract class .
---
[GitHub] carbondata issue #2949: [CARBONDATA-3118] support parallel block pruning for...
Posted by ajantha-bhat <gi...@git.apache.org>.
Github user ajantha-bhat commented on the issue:
https://github.com/apache/carbondata/pull/2949
@ravipesala : PR is ready please check.
---
[GitHub] carbondata pull request #2949: [WIP] support parallel block pruning for non-...
Posted by xuchuanyin <gi...@git.apache.org>.
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2949#discussion_r236571984
--- Diff: core/src/main/java/org/apache/carbondata/core/datamap/dev/DataMap.java ---
@@ -70,4 +70,6 @@ void init(DataMapModel dataMapModel)
*/
void finish();
+ // can return , number of records information that are stored in datamap.
--- End diff --
"can return"?
What does this mean?
---
[GitHub] carbondata issue #2949: [CARBONDATA-3118] support parallel block pruning for...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2949
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1765/
---
[GitHub] carbondata pull request #2949: [CARBONDATA-3118] support parallel block prun...
Posted by xuchuanyin <gi...@git.apache.org>.
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2949#discussion_r236907320
--- Diff: core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java ---
@@ -205,26 +195,53 @@ public BlockletDetailsFetcher getBlockletDetailsFetcher() {
final FilterResolverIntf filterExp, final List<PartitionSpec> partitions,
List<ExtendedBlocklet> blocklets, final Map<Segment, List<DataMap>> dataMaps,
int totalFiles) {
+ /*
+ *********************************************************************************
+ * Below is the example of how this part of code works.
+ * consider a scenario of having 5 segments, 10 datamaps in each segment,
--- End diff --
Also what does the 'record' mean below?
---
[GitHub] carbondata issue #2949: [WIP] support parallel block pruning for non-default...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2949
Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9785/
---
[GitHub] carbondata issue #2949: [WIP] support parallel block pruning for non-default...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2949
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1526/
---
[GitHub] carbondata issue #2949: [WIP] support parallel block pruning for non-default...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2949
Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9788/
---
[GitHub] carbondata pull request #2949: [CARBONDATA-3118] support parallel block prun...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2949#discussion_r241279768
--- Diff: core/src/main/java/org/apache/carbondata/core/datamap/dev/DataMap.java ---
@@ -70,4 +70,8 @@ void init(DataMapModel dataMapModel)
*/
void finish();
+ /*
+ * Returns number of records information that are stored in datamap.
+ * */
+ int getNumberOfEntries();
--- End diff --
Add comment here the purpose of this number here.
---
[GitHub] carbondata issue #2949: [WIP] support parallel block pruning for non-default...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2949
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1527/
---
[GitHub] carbondata issue #2949: [CARBONDATA-3118] support parallel block pruning for...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2949
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1560/
---
[GitHub] carbondata issue #2949: [WIP] support parallel block pruning for non-default...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2949
Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1740/
---
[GitHub] carbondata pull request #2949: [CARBONDATA-3118] support parallel block prun...
Posted by ajantha-bhat <gi...@git.apache.org>.
Github user ajantha-bhat commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2949#discussion_r241737142
--- Diff: core/src/main/java/org/apache/carbondata/core/datamap/dev/DataMap.java ---
@@ -70,4 +70,8 @@ void init(DataMapModel dataMapModel)
*/
void finish();
+ /*
+ * Returns number of records information that are stored in datamap.
+ * */
+ int getNumberOfEntries();
--- End diff --
done
---
[GitHub] carbondata issue #2949: [WIP] support parallel block pruning for non-default...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2949
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1530/
---
[GitHub] carbondata issue #2949: [WIP] support parallel block pruning for non-default...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2949
Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1747/
---
[GitHub] carbondata pull request #2949: [CARBONDATA-3118] support parallel block prun...
Posted by xuchuanyin <gi...@git.apache.org>.
Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2949#discussion_r236907065
--- Diff: core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java ---
@@ -205,26 +195,53 @@ public BlockletDetailsFetcher getBlockletDetailsFetcher() {
final FilterResolverIntf filterExp, final List<PartitionSpec> partitions,
List<ExtendedBlocklet> blocklets, final Map<Segment, List<DataMap>> dataMaps,
int totalFiles) {
+ /*
+ *********************************************************************************
+ * Below is the example of how this part of code works.
+ * consider a scenario of having 5 segments, 10 datamaps in each segment,
--- End diff --
What do you mean by saying '10 datamaps in each segment'?
Do you mean '10 index files or merged index files or blocklet or something else?'
---
[GitHub] carbondata pull request #2949: [CARBONDATA-3118] support parallel block prun...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/carbondata/pull/2949
---
[GitHub] carbondata issue #2949: [CARBONDATA-3118] support parallel block pruning for...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2949
Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1771/
---
[GitHub] carbondata issue #2949: [WIP] support parallel block pruning for non-default...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2949
Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1737/
---
[GitHub] carbondata issue #2949: [WIP] support parallel block pruning for non-default...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2949
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1536/
---
[GitHub] carbondata issue #2949: [CARBONDATA-3118] support parallel block pruning for...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2949
@ajantha-bhat Please rebase
---
[GitHub] carbondata issue #2949: [CARBONDATA-3118] support parallel block pruning for...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2949
Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10026/
---