You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "aokolnychyi (via GitHub)" <gi...@apache.org> on 2023/05/24 01:10:56 UTC

[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7692: Core: Add writer for unordered position deletes

aokolnychyi commented on code in PR #7692:
URL: https://github.com/apache/iceberg/pull/7692#discussion_r1203260409


##########
spark/v3.4/spark/src/jmh/java/org/apache/iceberg/spark/source/WritersBenchmark.java:
##########
@@ -363,6 +389,60 @@ public void writeUnpartitionedClusteredPositionDeleteWriter(Blackhole blackhole)
     blackhole.consume(writer);
   }
 
+  @Benchmark
+  @Threads(1)
+  public void writeUnpartitionedFanoutPositionDeleteWriter(Blackhole blackhole) throws IOException {
+    FileIO io = table().io();
+
+    OutputFileFactory fileFactory = newFileFactory();
+    SparkFileWriterFactory writerFactory =
+        SparkFileWriterFactory.builderFor(table()).dataFileFormat(fileFormat()).build();
+
+    FanoutPositionOnlyDeleteWriter<InternalRow> writer =
+        new FanoutPositionOnlyDeleteWriter<>(
+            writerFactory, fileFactory, io, TARGET_FILE_SIZE_IN_BYTES);
+
+    PositionDelete<InternalRow> positionDelete = PositionDelete.create();
+    try (FanoutPositionOnlyDeleteWriter<InternalRow> closeableWriter = writer) {
+      for (InternalRow row : positionDeleteRows) {
+        String path = row.getString(0);
+        long pos = row.getLong(1);
+        positionDelete.set(path, pos, null);
+        closeableWriter.write(positionDelete, unpartitionedSpec, null);
+      }
+    }
+
+    blackhole.consume(writer);
+  }
+
+  @Benchmark
+  @Threads(1)
+  public void writeUnpartitionedFanoutPositionDeleteWriterShuffled(Blackhole blackhole)

Review Comment:
   We should expect 5-15% overhead for the new buffering writer, which can still be beneficial for the job if we skip local ordering for inserts and potentially avoid spilling. This benchmark also does not take into account the cost to order records, it only tests the write performance. We will use this writer only if fanout is enabled. We should also explore Puffin delete files that would persist bitmaps directly.
   
   ```
   Benchmark                                                                                                      Mode  Cnt           Score            Error   Units
   ParquetWritersBenchmark.writeUnpartitionedClusteredPositionDeleteWriter                                          ss    5           6.004 ±          0.185    s/op
   ParquetWritersBenchmark.writeUnpartitionedFanoutPositionDeleteWriter                                             ss    5           6.503 ±          0.171    s/op
   ParquetWritersBenchmark.writeUnpartitionedFanoutPositionDeleteWriterShuffled                                     ss    5           6.616 ±          0.204    s/op
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org