You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@drill.apache.org by GitBox <gi...@apache.org> on 2018/06/29 02:16:11 UTC

[GitHub] sachouche commented on a change in pull request #1330: DRILL-6147: Adding Columnar Parquet Batch Sizing functionality

sachouche commented on a change in pull request #1330: DRILL-6147: Adding Columnar Parquet Batch Sizing functionality
URL: https://github.com/apache/drill/pull/1330#discussion_r198930977
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java
 ##########
 @@ -315,6 +315,13 @@ private ExecConstants() {
   public static final String PARQUET_FLAT_READER_BULK = "store.parquet.flat.reader.bulk";
   public static final OptionValidator PARQUET_FLAT_READER_BULK_VALIDATOR = new BooleanValidator(PARQUET_FLAT_READER_BULK);
 
+  // Controls the flat parquet reader batching constraints (number of record and memory limit)
+  public static final String PARQUET_FLAT_BATCH_NUM_RECORDS = "store.parquet.flat.batch.num_records";
 
 Review comment:
   - First of all, these constraints are meant for internal use
   - Providing a constraint on the number of rows allows us a) to cap this number (e.g., less than 64k-1 to avoid overflowing vectors with offsets or nullables) and b) to all allow the performance team tune the best number of rows per batch; for example, the memory constraint could be 32/16MB but yet a batch of 8k rows is more than enough for a good performance. The higher memory is to handle wide selection..

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services