You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by ajantha-bhat <gi...@git.apache.org> on 2018/11/23 13:22:36 UTC

[GitHub] carbondata pull request #2949: [WIP] support parallel block pruning for non-...

GitHub user ajantha-bhat opened a pull request:

    https://github.com/apache/carbondata/pull/2949

    [WIP] support parallel block pruning for non-default datamaps

    [WIP] support parallel block pruning for non-default datamaps
    
    This PR dependent on #2936 
    
    Be sure to do all of the following checklist to help us incorporate 
    your contribution quickly and easily:
    
     - [ ] Any interfaces changed?
     
     - [ ] Any backward compatibility impacted?
     
     - [ ] Document update required?
    
     - [ ] Testing done
            Please provide details on 
            - Whether new unit test cases have been added or why no new tests are required?
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance test report.
            - Any additional information to help reviewers in testing this change.
           
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. 
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ajantha-bhat/carbondata working_backup

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2949.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2949
    
----
commit 6237d69fcc0ddc1a08c74579762b721108a251fe
Author: ajantha-bhat <aj...@...>
Date:   2018-11-20T16:45:06Z

    parllelize block pruning

commit e8e912daf3ada357352e006ec9ce435d4c4b1625
Author: ajantha-bhat <aj...@...>
Date:   2018-11-22T11:01:53Z

    reveiw comment fix

commit d0bf82f276618f6fa09cbce65f714394b5fa4e0c
Author: ajantha-bhat <aj...@...>
Date:   2018-11-23T13:22:07Z

    support parallel pruning for non-default datamaps

----


---

[GitHub] carbondata issue #2949: [WIP] support parallel block pruning for non-default...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2949
  
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9795/



---

[GitHub] carbondata pull request #2949: [CARBONDATA-3118] support parallel block prun...

Posted by ajantha-bhat <gi...@git.apache.org>.
Github user ajantha-bhat commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2949#discussion_r241737152
  
    --- Diff: datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMap.java ---
    @@ -436,4 +436,9 @@ public String toString() {
       public void finish() {
     
       }
    +
    +  @Override public int getNumberOfEntries() {
    --- End diff --
    
    done


---

[GitHub] carbondata pull request #2949: [WIP] support parallel block pruning for non-...

Posted by ajantha-bhat <gi...@git.apache.org>.
Github user ajantha-bhat commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2949#discussion_r236746764
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/dev/DataMap.java ---
    @@ -70,4 +70,6 @@ void init(DataMapModel dataMapModel)
        */
       void finish();
     
    +  // can return , number of records information that are stored in datamap.
    --- End diff --
    
    ok, changed to just "returns"


---

[GitHub] carbondata pull request #2949: [CARBONDATA-3118] support parallel block prun...

Posted by ajantha-bhat <gi...@git.apache.org>.
Github user ajantha-bhat commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2949#discussion_r240900313
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java ---
    @@ -205,26 +195,53 @@ public BlockletDetailsFetcher getBlockletDetailsFetcher() {
           final FilterResolverIntf filterExp, final List<PartitionSpec> partitions,
           List<ExtendedBlocklet> blocklets, final Map<Segment, List<DataMap>> dataMaps,
           int totalFiles) {
    +    /*
    +     *********************************************************************************
    +     * Below is the example of how this part of code works.
    +     * consider a scenario of having 5 segments, 10 datamaps in each segment,
    --- End diff --
    
    BlockDatamap and blockletDatamap can store multiple files information. Each file is one row in that datamap. But non-default datamaps are not like that, so default datamaps distribution in multithread happens based on number of entries in datamaps, for non-default datamps distribution is based on number of datamaps (one datamap is considered as one record for non-default datamaps)
    
    ALso 10 datamap in a segment means, one merge index file has info of 10 index files


---

[GitHub] carbondata issue #2949: [CARBONDATA-3118] support parallel block pruning for...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2949
  
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9818/



---

[GitHub] carbondata issue #2949: [CARBONDATA-3118] support parallel block pruning for...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2949
  
    LGTM


---

[GitHub] carbondata pull request #2949: [CARBONDATA-3118] support parallel block prun...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2949#discussion_r241279625
  
    --- Diff: datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMap.java ---
    @@ -436,4 +436,9 @@ public String toString() {
       public void finish() {
     
       }
    +
    +  @Override public int getNumberOfEntries() {
    --- End diff --
    
    Move this method to available abstract class . 


---

[GitHub] carbondata issue #2949: [CARBONDATA-3118] support parallel block pruning for...

Posted by ajantha-bhat <gi...@git.apache.org>.
Github user ajantha-bhat commented on the issue:

    https://github.com/apache/carbondata/pull/2949
  
    @ravipesala : PR is ready please check.


---

[GitHub] carbondata pull request #2949: [WIP] support parallel block pruning for non-...

Posted by xuchuanyin <gi...@git.apache.org>.
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2949#discussion_r236571984
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/dev/DataMap.java ---
    @@ -70,4 +70,6 @@ void init(DataMapModel dataMapModel)
        */
       void finish();
     
    +  // can return , number of records information that are stored in datamap.
    --- End diff --
    
    "can return"?
    What does this mean?


---

[GitHub] carbondata issue #2949: [CARBONDATA-3118] support parallel block pruning for...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2949
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1765/



---

[GitHub] carbondata pull request #2949: [CARBONDATA-3118] support parallel block prun...

Posted by xuchuanyin <gi...@git.apache.org>.
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2949#discussion_r236907320
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java ---
    @@ -205,26 +195,53 @@ public BlockletDetailsFetcher getBlockletDetailsFetcher() {
           final FilterResolverIntf filterExp, final List<PartitionSpec> partitions,
           List<ExtendedBlocklet> blocklets, final Map<Segment, List<DataMap>> dataMaps,
           int totalFiles) {
    +    /*
    +     *********************************************************************************
    +     * Below is the example of how this part of code works.
    +     * consider a scenario of having 5 segments, 10 datamaps in each segment,
    --- End diff --
    
    Also what does the 'record' mean below?


---

[GitHub] carbondata issue #2949: [WIP] support parallel block pruning for non-default...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2949
  
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9785/



---

[GitHub] carbondata issue #2949: [WIP] support parallel block pruning for non-default...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2949
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1526/



---

[GitHub] carbondata issue #2949: [WIP] support parallel block pruning for non-default...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2949
  
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9788/



---

[GitHub] carbondata pull request #2949: [CARBONDATA-3118] support parallel block prun...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2949#discussion_r241279768
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/dev/DataMap.java ---
    @@ -70,4 +70,8 @@ void init(DataMapModel dataMapModel)
        */
       void finish();
     
    +  /*
    +  * Returns number of records information that are stored in datamap.
    +  * */
    +  int getNumberOfEntries();
    --- End diff --
    
    Add comment here the purpose of this number here.


---

[GitHub] carbondata issue #2949: [WIP] support parallel block pruning for non-default...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2949
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1527/



---

[GitHub] carbondata issue #2949: [CARBONDATA-3118] support parallel block pruning for...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2949
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1560/



---

[GitHub] carbondata issue #2949: [WIP] support parallel block pruning for non-default...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2949
  
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1740/



---

[GitHub] carbondata pull request #2949: [CARBONDATA-3118] support parallel block prun...

Posted by ajantha-bhat <gi...@git.apache.org>.
Github user ajantha-bhat commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2949#discussion_r241737142
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/dev/DataMap.java ---
    @@ -70,4 +70,8 @@ void init(DataMapModel dataMapModel)
        */
       void finish();
     
    +  /*
    +  * Returns number of records information that are stored in datamap.
    +  * */
    +  int getNumberOfEntries();
    --- End diff --
    
    done


---

[GitHub] carbondata issue #2949: [WIP] support parallel block pruning for non-default...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2949
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1530/



---

[GitHub] carbondata issue #2949: [WIP] support parallel block pruning for non-default...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2949
  
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1747/



---

[GitHub] carbondata pull request #2949: [CARBONDATA-3118] support parallel block prun...

Posted by xuchuanyin <gi...@git.apache.org>.
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2949#discussion_r236907065
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java ---
    @@ -205,26 +195,53 @@ public BlockletDetailsFetcher getBlockletDetailsFetcher() {
           final FilterResolverIntf filterExp, final List<PartitionSpec> partitions,
           List<ExtendedBlocklet> blocklets, final Map<Segment, List<DataMap>> dataMaps,
           int totalFiles) {
    +    /*
    +     *********************************************************************************
    +     * Below is the example of how this part of code works.
    +     * consider a scenario of having 5 segments, 10 datamaps in each segment,
    --- End diff --
    
    What do you mean by saying '10 datamaps in each segment'?
    Do you mean '10 index files or merged index files or blocklet or something else?'


---

[GitHub] carbondata pull request #2949: [CARBONDATA-3118] support parallel block prun...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/2949


---

[GitHub] carbondata issue #2949: [CARBONDATA-3118] support parallel block pruning for...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2949
  
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1771/



---

[GitHub] carbondata issue #2949: [WIP] support parallel block pruning for non-default...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2949
  
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1737/



---

[GitHub] carbondata issue #2949: [WIP] support parallel block pruning for non-default...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2949
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1536/



---

[GitHub] carbondata issue #2949: [CARBONDATA-3118] support parallel block pruning for...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2949
  
    @ajantha-bhat Please rebase


---

[GitHub] carbondata issue #2949: [CARBONDATA-3118] support parallel block pruning for...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2949
  
    Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10026/



---