You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by GitBox <gi...@apache.org> on 2022/01/31 06:24:00 UTC

[GitHub] [drill] paul-rogers commented on a change in pull request #2441: DRILL-8115: Adds LIMIT pushdown support in the scan framework

paul-rogers commented on a change in pull request #2441:
URL: https://github.com/apache/drill/pull/2441#discussion_r795373443



##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/v3/lifecycle/ReaderLifecycle.java
##########
@@ -282,8 +302,8 @@ private void endBatch() {
     if (tableLoader.batchCount() == 1 || prevTableSchemaVersion < tableLoader.schemaVersion()) {
       reviseOutputProjection(tableLoader.outputSchema());
     }
-    buildOutputBatch(readerOutput);
-    scanLifecycle.tallyBatch();
+    int rowCount = buildOutputBatch(readerOutput);

Review comment:
       In general, I tried to use `int` for the row count *within* a batch, and `long` for the row count *across* batches. Any one batch can have no more than 64K rows. But, a result set could potentially have 100B or 1T rows total across a large number of batches.
   
   So, the limit for a scan (or for a reader) should be a `long`. Now, I'm not sure why anyone would use `LIMIT 4000000000`, but better safe than sorry.
   
   If you see the use of `long` for batch counts, or `int` for result set counts, please call it out as it is probably an error.

##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/mock/ExtendedMockBatchReader.java
##########
@@ -140,7 +133,7 @@ public boolean next() {
       writer.save();
     }
 
-    return true;
+    return !loader.atLimit();

Review comment:
       That is correct. This particular test uses a subset of EVF: just the `ResultSetLoader`, which is responsible for enforcing the per-reader limit. In the case of this mock reader for testing, there is only one reader, so enforcing the per-reader limit is equivalent to enforcing the scan limit.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org