You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/07/19 12:58:08 UTC

[GitHub] [doris] dujl commented on pull request #10843: [feature] (multi-catalog) read parquet file by start/offset

dujl commented on PR #10843:
URL: https://github.com/apache/doris/pull/10843#issuecomment-1189021264

   @wsjz @morningman  our parquet align strategy is not same as parquet community.
   parquet community check whether the rowgroup's midPoint in the scan range.
   if the row group's midpoint in the scan range,  will add the rowGroup to scan list.
   Suggest that we align with the parquet community
   
   For parquet
   ```
         long midPoint = startIndex + totalSize / 2;
         if (filter.contains(midPoint)) {
           newRowGroups.add(rowGroup);
         }
   ```
   ```
       public boolean contains(long offset) {
         return offset >= this.startOffset && offset < this.endOffset;
       }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org