You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@carbondata.apache.org by ajantha-bhat <gi...@git.apache.org> on 2018/11/21 02:45:09 UTC

[GitHub] carbondata pull request #2936: [WIP] Parallelize block pruning of default da...

GitHub user ajantha-bhat opened a pull request:

    https://github.com/apache/carbondata/pull/2936

    [WIP] Parallelize block pruning of default datamap in driver for filter query processing

    Be sure to do all of the following checklist to help us incorporate 
    your contribution quickly and easily:
    
     - [ ] Any interfaces changed?
     
     - [ ] Any backward compatibility impacted?
     
     - [ ] Document update required?
    
     - [ ] Testing done
            Please provide details on 
            - Whether new unit test cases have been added or why no new tests are required?
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance test report.
            - Any additional information to help reviewers in testing this change.
           
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. 
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ajantha-bhat/carbondata issue_fix

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2936.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2936
    
----
commit 7e2d16903565effab3c1c085291178865f6ad7ba
Author: ajantha-bhat <aj...@...>
Date:   2018-11-20T16:45:06Z

    pruning compliling

commit 059b69b46d5e543ba4a65af85ce694a1899de395
Author: ajantha-bhat <aj...@...>
Date:   2018-11-21T02:27:52Z

    issue fix

----


---

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1738/



---

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

Posted by ajantha-bhat <gi...@git.apache.org>.

Github user ajantha-bhat commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    did a conflict resolve with documentation file. Let the build run again 


---

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1754/



---

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

Posted by ajantha-bhat <gi...@git.apache.org>.

Github user ajantha-bhat commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    retest this please


---

[GitHub] carbondata issue #2936: [WIP] Parallelize block pruning of default datamap i...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1470/



---

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9775/



---

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1709/



---

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1509/



---

[GitHub] carbondata issue #2936: [WIP] Parallelize block pruning of default datamap i...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9746/



---

[GitHub] carbondata issue #2936: [WIP] Parallelize block pruning of default datamap i...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9731/



---

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1719/



---

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9754/



---

[GitHub] carbondata issue #2936: [WIP] Parallelize block pruning of default datamap i...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1687/



---

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

Posted by ravipesala <gi...@git.apache.org>.

Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    LGTM


---

[GitHub] carbondata pull request #2936: [CARBONDATA-3118] Parallelize block pruning o...

Posted by ajantha-bhat <gi...@git.apache.org>.

Github user ajantha-bhat commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2936#discussion_r237174006
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java ---
    @@ -63,6 +75,8 @@
     
       private SegmentPropertiesFetcher segmentPropertiesFetcher;
     
    +  private static final Log LOG = LogFactory.getLog(TableDataMap.class);
    --- End diff --
    
    ok


---

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

Posted by ravipesala <gi...@git.apache.org>.

Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    retest this please


---

[GitHub] carbondata issue #2936: [WIP] Parallelize block pruning of default datamap i...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9728/



---

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9786/



---

[GitHub] carbondata issue #2936: [WIP] Parallelize block pruning of default datamap i...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1683/



---

[GitHub] carbondata issue #2936: [WIP] Parallelize block pruning of default datamap i...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9735/



---

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9787/



---

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1727/



---

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1517/



---

[GitHub] carbondata pull request #2936: [CARBONDATA-3118] Parallelize block pruning o...

Posted by ajantha-bhat <gi...@git.apache.org>.

Github user ajantha-bhat commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2936#discussion_r235774840
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java ---
    @@ -120,37 +132,166 @@ public BlockletDetailsFetcher getBlockletDetailsFetcher() {
        * @param filterExp
        * @return
        */
    -  public List<ExtendedBlocklet> prune(List<Segment> segments, FilterResolverIntf filterExp,
    -      List<PartitionSpec> partitions) throws IOException {
    -    List<ExtendedBlocklet> blocklets = new ArrayList<>();
    -    SegmentProperties segmentProperties;
    -    Map<Segment, List<DataMap>> dataMaps = dataMapFactory.getDataMaps(segments);
    +  public List<ExtendedBlocklet> prune(List<Segment> segments, final FilterResolverIntf filterExp,
    +      final List<PartitionSpec> partitions) throws IOException {
    +    final List<ExtendedBlocklet> blocklets = new ArrayList<>();
    +    final Map<Segment, List<DataMap>> dataMaps = dataMapFactory.getDataMaps(segments);
    +    // for non-filter queries
    +    if (filterExp == null) {
    +      // if filter is not passed, then return all the blocklets.
    +      return pruneWithoutFilter(segments, partitions, blocklets);
    +    }
    +    // for filter queries
    +    int totalFiles = 0;
    +    boolean isBlockDataMapType = true;
    +    for (Segment segment : segments) {
    +      for (DataMap dataMap : dataMaps.get(segment)) {
    +        if (!(dataMap instanceof BlockDataMap)) {
    --- End diff --
    
    Two reasons:
    
    1.  number of datamaps will be very less if it is not a block or blocklet datamap. Hence multi-threading is not required (as it is overhead for driver in concurrent scenarios)
    
    2. Other datamaps doesn't have number entries count in them.
    
    I will check


---

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1739/



---

[GitHub] carbondata issue #2936: [WIP] Parallelize block pruning of default datamap i...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1497/



---

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1706/



---

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1505/



---

[GitHub] carbondata issue #2936: [WIP] Parallelize block pruning of default datamap i...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9742/



---

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1519/



---

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9777/



---

[GitHub] carbondata issue #2936: [WIP] Parallelize block pruning of default datamap i...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1699/



---

[GitHub] carbondata issue #2936: [WIP] Parallelize block pruning of default datamap i...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1680/



---

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9802/



---

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

Posted by ajantha-bhat <gi...@git.apache.org>.

Github user ajantha-bhat commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    retest this please


---

[GitHub] carbondata pull request #2936: [CARBONDATA-3118] Parallelize block pruning o...

Posted by xuchuanyin <gi...@git.apache.org>.

Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2936#discussion_r236564769
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java ---
    @@ -63,6 +75,8 @@
     
       private SegmentPropertiesFetcher segmentPropertiesFetcher;
     
    +  private static final Log LOG = LogFactory.getLog(TableDataMap.class);
    --- End diff --
    
    We do not use apache-common-logs in carbondata project! Please take care of this


---

[GitHub] carbondata pull request #2936: [CARBONDATA-3118] Parallelize block pruning o...

Posted by ajantha-bhat <gi...@git.apache.org>.

Github user ajantha-bhat commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2936#discussion_r237173860
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ---
    @@ -1399,6 +1399,17 @@ private CarbonCommonConstants() {
     
       public static final String CARBON_PUSH_ROW_FILTERS_FOR_VECTOR_DEFAULT = "false";
     
    +  /**
    +   * max driver threads used for block pruning [1 to 4 threads]
    +   */
    +  @CarbonProperty public static final String CARBON_MAX_DRIVER_THREADS_FOR_BLOCK_PRUNING =
    +      "carbon.max.driver.threads.for.block.pruning";
    --- End diff --
    
    I have another non-default datamap PR. I will check about this. I feel this name also OK


---

[GitHub] carbondata pull request #2936: [CARBONDATA-3118] Parallelize block pruning o...

Posted by ravipesala <gi...@git.apache.org>.

Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2936#discussion_r235611496
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java ---
    @@ -120,37 +132,166 @@ public BlockletDetailsFetcher getBlockletDetailsFetcher() {
        * @param filterExp
        * @return
        */
    -  public List<ExtendedBlocklet> prune(List<Segment> segments, FilterResolverIntf filterExp,
    -      List<PartitionSpec> partitions) throws IOException {
    -    List<ExtendedBlocklet> blocklets = new ArrayList<>();
    -    SegmentProperties segmentProperties;
    -    Map<Segment, List<DataMap>> dataMaps = dataMapFactory.getDataMaps(segments);
    +  public List<ExtendedBlocklet> prune(List<Segment> segments, final FilterResolverIntf filterExp,
    +      final List<PartitionSpec> partitions) throws IOException {
    +    final List<ExtendedBlocklet> blocklets = new ArrayList<>();
    +    final Map<Segment, List<DataMap>> dataMaps = dataMapFactory.getDataMaps(segments);
    +    // for non-filter queries
    +    if (filterExp == null) {
    +      // if filter is not passed, then return all the blocklets.
    +      return pruneWithoutFilter(segments, partitions, blocklets);
    --- End diff --
    
    Please check what is the time taken to get all blocks in case of millions of files. If it takes more time then we may need to parallelize this also.


---

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1528/



---

[GitHub] carbondata pull request #2936: [CARBONDATA-3118] Parallelize block pruning o...

Posted by xuchuanyin <gi...@git.apache.org>.

Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2936#discussion_r236568719
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java ---
    @@ -487,6 +487,8 @@ private int getBlockCount(List<ExtendedBlocklet> blocklets) {
         // First prune using default datamap on driver side.
         TableDataMap defaultDataMap = DataMapStoreManager.getInstance().getDefaultDataMap(carbonTable);
         List<ExtendedBlocklet> prunedBlocklets = null;
    +    // This is to log the event, so user will know what is happening by seeing logs.
    +    LOG.info("Started block pruning ...");
    --- End diff --
    
    Instead of adding these logs, I think we'd better add the time consumed for pruning in statistics.


---

[GitHub] carbondata pull request #2936: [CARBONDATA-3118] Parallelize block pruning o...

Posted by xuchuanyin <gi...@git.apache.org>.

Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2936#discussion_r236565153
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ---
    @@ -1399,6 +1399,17 @@ private CarbonCommonConstants() {
     
       public static final String CARBON_PUSH_ROW_FILTERS_FOR_VECTOR_DEFAULT = "false";
     
    +  /**
    +   * max driver threads used for block pruning [1 to 4 threads]
    +   */
    +  @CarbonProperty public static final String CARBON_MAX_DRIVER_THREADS_FOR_BLOCK_PRUNING =
    +      "carbon.max.driver.threads.for.block.pruning";
    --- End diff --
    
    I think it's better to use the name
    `carbon.query.pruning.parallelism.driver`


---

[GitHub] carbondata pull request #2936: [CARBONDATA-3118] Parallelize block pruning o...

Posted by ajantha-bhat <gi...@git.apache.org>.

Github user ajantha-bhat commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2936#discussion_r237173126
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java ---
    @@ -487,6 +487,8 @@ private int getBlockCount(List<ExtendedBlocklet> blocklets) {
         // First prune using default datamap on driver side.
         TableDataMap defaultDataMap = DataMapStoreManager.getInstance().getDefaultDataMap(carbonTable);
         List<ExtendedBlocklet> prunedBlocklets = null;
    +    // This is to log the event, so user will know what is happening by seeing logs.
    +    LOG.info("Started block pruning ...");
    --- End diff --
    
    log will anyways have timestap, we can subtract stop and start time. I have another non-default datamap PR. I will check about this. 


---

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

Posted by ajantha-bhat <gi...@git.apache.org>.

Github user ajantha-bhat commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    @ravipesala : PR is ready. please check


---

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1543/



---

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9755/



---

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1500/



---

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9762/



---

[GitHub] carbondata pull request #2936: [CARBONDATA-3118] Parallelize block pruning o...

Posted by ravipesala <gi...@git.apache.org>.

Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2936#discussion_r235611698
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java ---
    @@ -120,37 +132,166 @@ public BlockletDetailsFetcher getBlockletDetailsFetcher() {
        * @param filterExp
        * @return
        */
    -  public List<ExtendedBlocklet> prune(List<Segment> segments, FilterResolverIntf filterExp,
    -      List<PartitionSpec> partitions) throws IOException {
    -    List<ExtendedBlocklet> blocklets = new ArrayList<>();
    -    SegmentProperties segmentProperties;
    -    Map<Segment, List<DataMap>> dataMaps = dataMapFactory.getDataMaps(segments);
    +  public List<ExtendedBlocklet> prune(List<Segment> segments, final FilterResolverIntf filterExp,
    +      final List<PartitionSpec> partitions) throws IOException {
    +    final List<ExtendedBlocklet> blocklets = new ArrayList<>();
    +    final Map<Segment, List<DataMap>> dataMaps = dataMapFactory.getDataMaps(segments);
    +    // for non-filter queries
    +    if (filterExp == null) {
    +      // if filter is not passed, then return all the blocklets.
    +      return pruneWithoutFilter(segments, partitions, blocklets);
    +    }
    +    // for filter queries
    +    int totalFiles = 0;
    +    boolean isBlockDataMapType = true;
    +    for (Segment segment : segments) {
    +      for (DataMap dataMap : dataMaps.get(segment)) {
    +        if (!(dataMap instanceof BlockDataMap)) {
    --- End diff --
    
    This flow can be used by all datamaps why only for blockdatamap?


---

[GitHub] carbondata issue #2936: [WIP] Parallelize block pruning of default datamap i...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1484/



---

[GitHub] carbondata issue #2936: [WIP] Parallelize block pruning of default datamap i...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1694/



---

[GitHub] carbondata issue #2936: [WIP] Parallelize block pruning of default datamap i...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1476/



---

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

Posted by ajantha-bhat <gi...@git.apache.org>.

Github user ajantha-bhat commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    @manishgupta88 , @ravipesala : please review


---

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1707/



---

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1729/



---

[GitHub] carbondata issue #2936: [WIP] Parallelize block pruning of default datamap i...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1488/



---

[GitHub] carbondata pull request #2936: [CARBONDATA-3118] Parallelize block pruning o...

Posted by ajantha-bhat <gi...@git.apache.org>.

Github user ajantha-bhat commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2936#discussion_r235842114
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java ---
    @@ -120,37 +132,166 @@ public BlockletDetailsFetcher getBlockletDetailsFetcher() {
        * @param filterExp
        * @return
        */
    -  public List<ExtendedBlocklet> prune(List<Segment> segments, FilterResolverIntf filterExp,
    -      List<PartitionSpec> partitions) throws IOException {
    -    List<ExtendedBlocklet> blocklets = new ArrayList<>();
    -    SegmentProperties segmentProperties;
    -    Map<Segment, List<DataMap>> dataMaps = dataMapFactory.getDataMaps(segments);
    +  public List<ExtendedBlocklet> prune(List<Segment> segments, final FilterResolverIntf filterExp,
    +      final List<PartitionSpec> partitions) throws IOException {
    +    final List<ExtendedBlocklet> blocklets = new ArrayList<>();
    +    final Map<Segment, List<DataMap>> dataMaps = dataMapFactory.getDataMaps(segments);
    +    // for non-filter queries
    +    if (filterExp == null) {
    +      // if filter is not passed, then return all the blocklets.
    +      return pruneWithoutFilter(segments, partitions, blocklets);
    +    }
    +    // for filter queries
    +    int totalFiles = 0;
    +    boolean isBlockDataMapType = true;
    +    for (Segment segment : segments) {
    +      for (DataMap dataMap : dataMaps.get(segment)) {
    +        if (!(dataMap instanceof BlockDataMap)) {
    +          isBlockDataMapType = false;
    +          break;
    +        }
    +        totalFiles += ((BlockDataMap) dataMap).getTotalBlocks();
    +      }
    +      if (!isBlockDataMapType) {
    +        // totalFiles fill be 0 for non-BlockDataMap Type. ex: lucene, bloom datamap. use old flow.
    +        break;
    +      }
    +    }
    +    int numOfThreadsForPruning = getNumOfThreadsForPruning();
    +    int filesPerEachThread = totalFiles / numOfThreadsForPruning;
    +    if (numOfThreadsForPruning == 1 || filesPerEachThread == 1
    +        || segments.size() < numOfThreadsForPruning || totalFiles
    +        < CarbonCommonConstants.CARBON_DRIVER_PRUNING_MULTI_THREAD_ENABLE_FILES_COUNT) {
    +      // use multi-thread, only if the files are more than 0.1 million.
    +      // As 0.1 million files block pruning can take only 1 second.
    +      // Doing multi-thread for smaller values is not recommended as
    +      // driver should have minimum threads opened to support multiple concurrent queries.
    +      return pruneWithFilter(segments, filterExp, partitions, blocklets, dataMaps);
    +    }
    +    // handle by multi-thread
    +    return pruneWithFilterMultiThread(segments, filterExp, partitions, blocklets, dataMaps,
    +        totalFiles);
    +  }
    +
    +  private List<ExtendedBlocklet> pruneWithoutFilter(List<Segment> segments,
    +      List<PartitionSpec> partitions, List<ExtendedBlocklet> blocklets) throws IOException {
    +    for (Segment segment : segments) {
    +      List<Blocklet> allBlocklets = blockletDetailsFetcher.getAllBlocklets(segment, partitions);
    +      blocklets.addAll(
    +          addSegmentId(blockletDetailsFetcher.getExtendedBlocklets(allBlocklets, segment),
    +              segment.toString()));
    +    }
    +    return blocklets;
    +  }
    +
    +  private List<ExtendedBlocklet> pruneWithFilter(List<Segment> segments,
    +      FilterResolverIntf filterExp, List<PartitionSpec> partitions,
    +      List<ExtendedBlocklet> blocklets, Map<Segment, List<DataMap>> dataMaps) throws IOException {
         for (Segment segment : segments) {
           List<Blocklet> pruneBlocklets = new ArrayList<>();
    -      // if filter is not passed then return all the blocklets
    -      if (filterExp == null) {
    -        pruneBlocklets = blockletDetailsFetcher.getAllBlocklets(segment, partitions);
    -      } else {
    -        segmentProperties = segmentPropertiesFetcher.getSegmentProperties(segment);
    -        for (DataMap dataMap : dataMaps.get(segment)) {
    -          pruneBlocklets.addAll(dataMap.prune(filterExp, segmentProperties, partitions));
    +      SegmentProperties segmentProperties = segmentPropertiesFetcher.getSegmentProperties(segment);
    +      for (DataMap dataMap : dataMaps.get(segment)) {
    +        pruneBlocklets.addAll(dataMap.prune(filterExp, segmentProperties, partitions));
    +      }
    +      blocklets.addAll(
    +          addSegmentId(blockletDetailsFetcher.getExtendedBlocklets(pruneBlocklets, segment),
    +              segment.toString()));
    +    }
    +    return blocklets;
    +  }
    +
    +  private List<ExtendedBlocklet> pruneWithFilterMultiThread(List<Segment> segments,
    +      final FilterResolverIntf filterExp, final List<PartitionSpec> partitions,
    +      List<ExtendedBlocklet> blocklets, final Map<Segment, List<DataMap>> dataMaps,
    +      int totalFiles) {
    +    int numOfThreadsForPruning = getNumOfThreadsForPruning();
    +    int filesPerEachThread = (int) Math.ceil((double)totalFiles / numOfThreadsForPruning);
    +    int prev = 0;
    +    int filesCount = 0;
    +    int processedFileCount = 0;
    +    List<List<Segment>> segmentList = new ArrayList<>();
    --- End diff --
    
    done


---

[GitHub] carbondata issue #2936: [WIP] Parallelize block pruning of default datamap i...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1698/



---

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1714/



---

[GitHub] carbondata pull request #2936: [CARBONDATA-3118] Parallelize block pruning o...

Posted by ravipesala <gi...@git.apache.org>.

Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2936#discussion_r235612449
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java ---
    @@ -120,37 +132,166 @@ public BlockletDetailsFetcher getBlockletDetailsFetcher() {
        * @param filterExp
        * @return
        */
    -  public List<ExtendedBlocklet> prune(List<Segment> segments, FilterResolverIntf filterExp,
    -      List<PartitionSpec> partitions) throws IOException {
    -    List<ExtendedBlocklet> blocklets = new ArrayList<>();
    -    SegmentProperties segmentProperties;
    -    Map<Segment, List<DataMap>> dataMaps = dataMapFactory.getDataMaps(segments);
    +  public List<ExtendedBlocklet> prune(List<Segment> segments, final FilterResolverIntf filterExp,
    +      final List<PartitionSpec> partitions) throws IOException {
    +    final List<ExtendedBlocklet> blocklets = new ArrayList<>();
    +    final Map<Segment, List<DataMap>> dataMaps = dataMapFactory.getDataMaps(segments);
    +    // for non-filter queries
    +    if (filterExp == null) {
    +      // if filter is not passed, then return all the blocklets.
    +      return pruneWithoutFilter(segments, partitions, blocklets);
    +    }
    +    // for filter queries
    +    int totalFiles = 0;
    +    boolean isBlockDataMapType = true;
    +    for (Segment segment : segments) {
    +      for (DataMap dataMap : dataMaps.get(segment)) {
    +        if (!(dataMap instanceof BlockDataMap)) {
    +          isBlockDataMapType = false;
    +          break;
    +        }
    +        totalFiles += ((BlockDataMap) dataMap).getTotalBlocks();
    +      }
    +      if (!isBlockDataMapType) {
    +        // totalFiles fill be 0 for non-BlockDataMap Type. ex: lucene, bloom datamap. use old flow.
    +        break;
    +      }
    +    }
    +    int numOfThreadsForPruning = getNumOfThreadsForPruning();
    +    int filesPerEachThread = totalFiles / numOfThreadsForPruning;
    +    if (numOfThreadsForPruning == 1 || filesPerEachThread == 1
    +        || segments.size() < numOfThreadsForPruning || totalFiles
    +        < CarbonCommonConstants.CARBON_DRIVER_PRUNING_MULTI_THREAD_ENABLE_FILES_COUNT) {
    +      // use multi-thread, only if the files are more than 0.1 million.
    +      // As 0.1 million files block pruning can take only 1 second.
    +      // Doing multi-thread for smaller values is not recommended as
    +      // driver should have minimum threads opened to support multiple concurrent queries.
    +      return pruneWithFilter(segments, filterExp, partitions, blocklets, dataMaps);
    +    }
    +    // handle by multi-thread
    +    return pruneWithFilterMultiThread(segments, filterExp, partitions, blocklets, dataMaps,
    +        totalFiles);
    +  }
    +
    +  private List<ExtendedBlocklet> pruneWithoutFilter(List<Segment> segments,
    +      List<PartitionSpec> partitions, List<ExtendedBlocklet> blocklets) throws IOException {
    +    for (Segment segment : segments) {
    +      List<Blocklet> allBlocklets = blockletDetailsFetcher.getAllBlocklets(segment, partitions);
    +      blocklets.addAll(
    +          addSegmentId(blockletDetailsFetcher.getExtendedBlocklets(allBlocklets, segment),
    +              segment.toString()));
    +    }
    +    return blocklets;
    +  }
    +
    +  private List<ExtendedBlocklet> pruneWithFilter(List<Segment> segments,
    +      FilterResolverIntf filterExp, List<PartitionSpec> partitions,
    +      List<ExtendedBlocklet> blocklets, Map<Segment, List<DataMap>> dataMaps) throws IOException {
         for (Segment segment : segments) {
           List<Blocklet> pruneBlocklets = new ArrayList<>();
    -      // if filter is not passed then return all the blocklets
    -      if (filterExp == null) {
    -        pruneBlocklets = blockletDetailsFetcher.getAllBlocklets(segment, partitions);
    -      } else {
    -        segmentProperties = segmentPropertiesFetcher.getSegmentProperties(segment);
    -        for (DataMap dataMap : dataMaps.get(segment)) {
    -          pruneBlocklets.addAll(dataMap.prune(filterExp, segmentProperties, partitions));
    +      SegmentProperties segmentProperties = segmentPropertiesFetcher.getSegmentProperties(segment);
    +      for (DataMap dataMap : dataMaps.get(segment)) {
    +        pruneBlocklets.addAll(dataMap.prune(filterExp, segmentProperties, partitions));
    +      }
    +      blocklets.addAll(
    +          addSegmentId(blockletDetailsFetcher.getExtendedBlocklets(pruneBlocklets, segment),
    +              segment.toString()));
    +    }
    +    return blocklets;
    +  }
    +
    +  private List<ExtendedBlocklet> pruneWithFilterMultiThread(List<Segment> segments,
    +      final FilterResolverIntf filterExp, final List<PartitionSpec> partitions,
    +      List<ExtendedBlocklet> blocklets, final Map<Segment, List<DataMap>> dataMaps,
    +      int totalFiles) {
    +    int numOfThreadsForPruning = getNumOfThreadsForPruning();
    +    int filesPerEachThread = (int) Math.ceil((double)totalFiles / numOfThreadsForPruning);
    +    int prev = 0;
    +    int filesCount = 0;
    +    int processedFileCount = 0;
    +    List<List<Segment>> segmentList = new ArrayList<>();
    --- End diff --
    
    I feel it is  better splitting should happen as per datamaps not segments. One segment can have million files in case of big load, so please try parallel execution of datamap pruning at datamap level


---

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Failed  with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9757/



---

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

Posted by kumarvishal09 <gi...@git.apache.org>.

Github user kumarvishal09 commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    LGTM


---

[GitHub] carbondata issue #2936: [WIP] Parallelize block pruning of default datamap i...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9747/



---

[GitHub] carbondata pull request #2936: [CARBONDATA-3118] Parallelize block pruning o...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/2936


---

[GitHub] carbondata pull request #2936: [CARBONDATA-3118] Parallelize block pruning o...

Posted by ajantha-bhat <gi...@git.apache.org>.

Github user ajantha-bhat commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2936#discussion_r237173725
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ---
    @@ -1399,6 +1399,17 @@ private CarbonCommonConstants() {
     
       public static final String CARBON_PUSH_ROW_FILTERS_FOR_VECTOR_DEFAULT = "false";
     
    +  /**
    +   * max driver threads used for block pruning [1 to 4 threads]
    +   */
    +  @CarbonProperty public static final String CARBON_MAX_DRIVER_THREADS_FOR_BLOCK_PRUNING =
    +      "carbon.max.driver.threads.for.block.pruning";
    +
    +  public static final String CARBON_MAX_DRIVER_THREADS_FOR_BLOCK_PRUNING_DEFAULT = "4";
    +
    +  // block prune in multi-thread if files size more than 100K files.
    +  public static final int CARBON_DRIVER_PRUNING_MULTI_THREAD_ENABLE_FILES_COUNT = 100000;
    --- End diff --
    
    because driver doing multi-thread  default may impact concurrent queries, also by testing observed that 100k datamap takes 1 second.  If block pruning taking more than a second then only multi-thead 


---

[GitHub] carbondata pull request #2936: [CARBONDATA-3118] Parallelize block pruning o...

Posted by ajantha-bhat <gi...@git.apache.org>.

Github user ajantha-bhat commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2936#discussion_r235846200
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java ---
    @@ -120,37 +132,166 @@ public BlockletDetailsFetcher getBlockletDetailsFetcher() {
        * @param filterExp
        * @return
        */
    -  public List<ExtendedBlocklet> prune(List<Segment> segments, FilterResolverIntf filterExp,
    -      List<PartitionSpec> partitions) throws IOException {
    -    List<ExtendedBlocklet> blocklets = new ArrayList<>();
    -    SegmentProperties segmentProperties;
    -    Map<Segment, List<DataMap>> dataMaps = dataMapFactory.getDataMaps(segments);
    +  public List<ExtendedBlocklet> prune(List<Segment> segments, final FilterResolverIntf filterExp,
    +      final List<PartitionSpec> partitions) throws IOException {
    +    final List<ExtendedBlocklet> blocklets = new ArrayList<>();
    +    final Map<Segment, List<DataMap>> dataMaps = dataMapFactory.getDataMaps(segments);
    +    // for non-filter queries
    +    if (filterExp == null) {
    +      // if filter is not passed, then return all the blocklets.
    +      return pruneWithoutFilter(segments, partitions, blocklets);
    +    }
    +    // for filter queries
    +    int totalFiles = 0;
    +    boolean isBlockDataMapType = true;
    +    for (Segment segment : segments) {
    +      for (DataMap dataMap : dataMaps.get(segment)) {
    +        if (!(dataMap instanceof BlockDataMap)) {
    --- End diff --
    
    This one, I have to figure out, number of entries in all kinds of datamap and need to test those scenario. I will handle in next PR 


---

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1515/



---

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9773/



---

[GitHub] carbondata issue #2936: [WIP] Parallelize block pruning of default datamap i...

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    please describe this PR


---

[GitHub] carbondata issue #2936: [WIP] Parallelize block pruning of default datamap i...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1489/



---

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1498/



---

[GitHub] carbondata pull request #2936: [CARBONDATA-3118] Parallelize block pruning o...

Posted by xuchuanyin <gi...@git.apache.org>.

Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2936#discussion_r236565449
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ---
    @@ -1399,6 +1399,17 @@ private CarbonCommonConstants() {
     
       public static final String CARBON_PUSH_ROW_FILTERS_FOR_VECTOR_DEFAULT = "false";
     
    +  /**
    +   * max driver threads used for block pruning [1 to 4 threads]
    +   */
    +  @CarbonProperty public static final String CARBON_MAX_DRIVER_THREADS_FOR_BLOCK_PRUNING =
    +      "carbon.max.driver.threads.for.block.pruning";
    +
    +  public static final String CARBON_MAX_DRIVER_THREADS_FOR_BLOCK_PRUNING_DEFAULT = "4";
    +
    +  // block prune in multi-thread if files size more than 100K files.
    +  public static final int CARBON_DRIVER_PRUNING_MULTI_THREAD_ENABLE_FILES_COUNT = 100000;
    --- End diff --
    
    Why add this constraint?


---

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9767/



---

[GitHub] carbondata pull request #2936: [CARBONDATA-3118] Parallelize block pruning o...

Posted by ajantha-bhat <gi...@git.apache.org>.

Github user ajantha-bhat commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2936#discussion_r235615112
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java ---
    @@ -120,37 +132,166 @@ public BlockletDetailsFetcher getBlockletDetailsFetcher() {
        * @param filterExp
        * @return
        */
    -  public List<ExtendedBlocklet> prune(List<Segment> segments, FilterResolverIntf filterExp,
    -      List<PartitionSpec> partitions) throws IOException {
    -    List<ExtendedBlocklet> blocklets = new ArrayList<>();
    -    SegmentProperties segmentProperties;
    -    Map<Segment, List<DataMap>> dataMaps = dataMapFactory.getDataMaps(segments);
    +  public List<ExtendedBlocklet> prune(List<Segment> segments, final FilterResolverIntf filterExp,
    +      final List<PartitionSpec> partitions) throws IOException {
    +    final List<ExtendedBlocklet> blocklets = new ArrayList<>();
    +    final Map<Segment, List<DataMap>> dataMaps = dataMapFactory.getDataMaps(segments);
    +    // for non-filter queries
    +    if (filterExp == null) {
    +      // if filter is not passed, then return all the blocklets.
    +      return pruneWithoutFilter(segments, partitions, blocklets);
    --- End diff --
    
    yes, Already tested this. for 100k files with filter takes around 1 seconds. But without filter is 50 ms.  Very less. Hence not handled.
    
    for filters, pruning was taking time.


---

[GitHub] carbondata issue #2936: [WIP] Parallelize block pruning of default datamap i...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1473/



---

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1529/



---

[GitHub] carbondata issue #2936: [CARBONDATA-3118] Parallelize block pruning of defau...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2936
  
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1725/



---