You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/09/19 23:35:16 UTC

[GitHub] [iceberg] rdblue commented on a change in pull request #3148: [ARROW] Vectorized Parquet Reads - Make ArrowReader's iterator idempotent

rdblue commented on a change in pull request #3148:
URL: https://github.com/apache/iceberg/pull/3148#discussion_r711821316



##########
File path: arrow/src/test/java/org/apache/iceberg/arrow/vectorized/ArrowReaderTest.java
##########
@@ -353,6 +371,29 @@ private void readAndCheckArrowResult(
     assertEquals(expectedTotalRows, totalRows);
   }
 
+  private void readAndCheckHasNextIsIdempotent(
+      TableScan scan,
+      int numRowsPerRoot,
+      int expectedTotalRows,
+      int numExtraCallsToHasNext) throws IOException {
+    int totalRows = 0;
+    try (VectorizedTableScanIterable itr = new VectorizedTableScanIterable(scan, numRowsPerRoot, false)) {
+      CloseableIterator<ColumnarBatch> iterator = itr.iterator();
+      while (iterator.hasNext()) {
+        // Call hasNext() a few extra times.
+        // This should not affect the total number of rows read.
+        for (int i = 0; i < numExtraCallsToHasNext; i++) {
+          assertTrue(iterator.hasNext());
+        }
+
+        ColumnarBatch batch = iterator.next();
+        VectorSchemaRoot root = batch.createVectorSchemaRootFromVectors();
+        totalRows += root.getRowCount();

Review comment:
       How does this test that the iterator is idempotent? It looks like this just tests that the batch size is correct. I think that this should also call `checkAllVectorValues` to ensure that the expected rows are the ones produced rather than relying on the total number of rows.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org