You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by "singhpk234 (via GitHub)" <gi...@apache.org> on 2023/04/24 20:17:15 UTC

[GitHub] [iceberg] singhpk234 opened a new pull request, #7422: Spark 3.4: Support rate limit in Spark Streaming

singhpk234 opened a new pull request, #7422:
URL: https://github.com/apache/iceberg/pull/7422

   ### About the change 
   
   Forward port https://github.com/apache/iceberg/pull/4479 to spark 3.4
   
   cc @jackye1995 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] singhpk234 commented on pull request #7422: Spark 3.4: Support rate limit in Spark Streaming

Posted by "singhpk234 (via GitHub)" <gi...@apache.org>.

singhpk234 commented on PR #7422:
URL: https://github.com/apache/iceberg/pull/7422#issuecomment-1520774194

   Output of : 
   
   `git diff --no-index spark/v3.3/spark/src/ spark/v3.4/spark/src --name-only`
   
   
   spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkPositionDeltaOperation.java
   spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkPositionDeltaWrite.java
   spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkPositionDeltaWriteBuilder.java
   spark/v3.4/spark/src/main/java/org/apache/spark/sql/catalyst/analysis/NoSuchProcedureException.java
   /dev/null
   /dev/null
   /dev/null
   /dev/null
   /dev/null
   /dev/null
   /dev/null
   spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/TestFunctionCatalog.java
   spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/source/TestRequiredDistributionAndOrdering.java
   spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkDataWrite.java
   spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/sql/TestDropTable.java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] szehon-ho commented on pull request #7422: Spark 3.4: Support rate limit in Spark Streaming

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.

szehon-ho commented on PR #7422:
URL: https://github.com/apache/iceberg/pull/7422#issuecomment-1528682529

   Yea this test fail for my pr run as well.. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7422: Spark 3.4: Support rate limit in Spark Streaming

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.

aokolnychyi commented on code in PR #7422:
URL: https://github.com/apache/iceberg/pull/7422#discussion_r1175813831


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkMicroBatchStream.java:
##########
@@ -80,6 +85,8 @@ public class SparkMicroBatchStream implements MicroBatchStream {
   private final boolean skipDelete;
   private final boolean skipOverwrite;
   private final Long fromTimestamp;
+  private final Integer maxFilesPerMicroBatch;

Review Comment:
   Same question here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] jackye1995 merged pull request #7422: Spark 3.4: Support rate limit in Spark Streaming

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.

jackye1995 merged PR #7422:
URL: https://github.com/apache/iceberg/pull/7422


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] wypoon commented on pull request #7422: Spark 3.4: Support rate limit in Spark Streaming

Posted by "wypoon (via GitHub)" <gi...@apache.org>.

wypoon commented on PR #7422:
URL: https://github.com/apache/iceberg/pull/7422#issuecomment-1528828127

   Thanks @singhpk234.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] jackye1995 commented on pull request #7422: Spark 3.4: Support rate limit in Spark Streaming

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.

jackye1995 commented on PR #7422:
URL: https://github.com/apache/iceberg/pull/7422#issuecomment-1526058695

   @singhpk234 could you rebase the PR based on the refactoring?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] singhpk234 commented on pull request #7422: Spark 3.4: Support rate limit in Spark Streaming

Posted by "singhpk234 (via GitHub)" <gi...@apache.org>.

singhpk234 commented on PR #7422:
URL: https://github.com/apache/iceberg/pull/7422#issuecomment-1528696580

   Added a pr for the fix.
   
   cc @szehon-ho @wypoon 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] jackye1995 commented on pull request #7422: Spark 3.4: Support rate limit in Spark Streaming

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.

jackye1995 commented on PR #7422:
URL: https://github.com/apache/iceberg/pull/7422#issuecomment-1527910444

   I think the comments are all addressed, will go ahead and merge it. Thanks everyone for the review!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] singhpk234 commented on a diff in pull request #7422: Spark 3.4: Support rate limit in Spark Streaming

Posted by "singhpk234 (via GitHub)" <gi...@apache.org>.

singhpk234 commented on code in PR #7422:
URL: https://github.com/apache/iceberg/pull/7422#discussion_r1176744004


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/SparkReadConf.java:
##########
@@ -255,6 +255,22 @@ public Long endTimestamp() {
     return confParser.longConf().option(SparkReadOptions.END_TIMESTAMP).parseOptional();
   }
 
+  public Integer maxFilesPerMicroBatch() {

Review Comment:
   Added a pr for the same https://github.com/apache/iceberg/pull/7429/files



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] singhpk234 commented on a diff in pull request #7422: Spark 3.4: Support rate limit in Spark Streaming

Posted by "singhpk234 (via GitHub)" <gi...@apache.org>.

singhpk234 commented on code in PR #7422:
URL: https://github.com/apache/iceberg/pull/7422#discussion_r1179319081


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkMicroBatchStream.java:
##########
@@ -295,6 +309,139 @@ private static StreamingOffset determineStartingOffset(Table table, Long fromTim
     }
   }
 
+  @Override
+  @SuppressWarnings("checkstyle:CyclomaticComplexity")
+  public Offset latestOffset(Offset startOffset, ReadLimit limit) {
+    // calculate end offset get snapshotId from the startOffset
+    Preconditions.checkArgument(
+        startOffset instanceof StreamingOffset,
+        "Invalid start offset: %s is not a StreamingOffset",
+        startOffset);
+
+    table.refresh();
+    if (table.currentSnapshot() == null) {
+      return StreamingOffset.START_OFFSET;
+    }
+
+    if (table.currentSnapshot().timestampMillis() < fromTimestamp) {
+      return StreamingOffset.START_OFFSET;
+    }
+
+    // end offset can expand to multiple snapshots
+    StreamingOffset startingOffset = (StreamingOffset) startOffset;
+
+    if (startOffset.equals(StreamingOffset.START_OFFSET)) {
+      startingOffset = determineStartingOffset(table, fromTimestamp);
+    }
+
+    Snapshot curSnapshot = table.snapshot(startingOffset.snapshotId());
+    validateCurrentSnapshotExists(curSnapshot, startingOffset);
+
+    int startPosOfSnapOffset = (int) startingOffset.position();
+
+    boolean scanAllFiles = startingOffset.shouldScanAllFiles();
+
+    boolean shouldContinueReading = true;
+    int curFilesAdded = 0;
+    int curRecordCount = 0;
+    int curPos = 0;
+
+    // Note : we produce nextOffset with pos as non-inclusive
+    while (shouldContinueReading) {
+      // generate manifest index for the curSnapshot
+      List<Pair<ManifestFile, Integer>> indexedManifests =
+          MicroBatches.skippedManifestIndexesFromSnapshot(
+              table.io(), curSnapshot, startPosOfSnapOffset, scanAllFiles);
+      // this is under assumption we will be able to add at-least 1 file in the new offset
+      for (int idx = 0; idx < indexedManifests.size() && shouldContinueReading; idx++) {
+        // be rest assured curPos >= startFileIndex
+        curPos = indexedManifests.get(idx).second();
+        try (CloseableIterable<FileScanTask> taskIterable =
+                MicroBatches.openManifestFile(
+                    table.io(),
+                    table.specs(),
+                    caseSensitive,
+                    curSnapshot,
+                    indexedManifests.get(idx).first(),
+                    scanAllFiles);
+            CloseableIterator<FileScanTask> taskIter = taskIterable.iterator()) {
+          while (taskIter.hasNext()) {
+            FileScanTask task = taskIter.next();
+            if (curPos >= startPosOfSnapOffset) {
+              // TODO : use readLimit provided in function param, the readLimits are derived from
+              // these 2 properties.
+              if ((curFilesAdded + 1) > maxFilesPerMicroBatch
+                  || (curRecordCount + task.file().recordCount()) > maxRecordsPerMicroBatch) {
+                shouldContinueReading = false;
+                break;
+              }
+

Review Comment:
   ACK let me take this as follow-up of this pr immediately and add a tracking issue as well meanwhile. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #7422: Spark 3.4: Support rate limit in Spark Streaming

Posted by "amogh-jahagirdar (via GitHub)" <gi...@apache.org>.

amogh-jahagirdar commented on code in PR #7422:
URL: https://github.com/apache/iceberg/pull/7422#discussion_r1178650056


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkMicroBatchStream.java:
##########
@@ -295,6 +309,139 @@ private static StreamingOffset determineStartingOffset(Table table, Long fromTim
     }
   }
 
+  @Override
+  @SuppressWarnings("checkstyle:CyclomaticComplexity")
+  public Offset latestOffset(Offset startOffset, ReadLimit limit) {
+    // calculate end offset get snapshotId from the startOffset
+    Preconditions.checkArgument(
+        startOffset instanceof StreamingOffset,
+        "Invalid start offset: %s is not a StreamingOffset",
+        startOffset);
+
+    table.refresh();
+    if (table.currentSnapshot() == null) {
+      return StreamingOffset.START_OFFSET;
+    }
+
+    if (table.currentSnapshot().timestampMillis() < fromTimestamp) {
+      return StreamingOffset.START_OFFSET;
+    }
+
+    // end offset can expand to multiple snapshots
+    StreamingOffset startingOffset = (StreamingOffset) startOffset;
+
+    if (startOffset.equals(StreamingOffset.START_OFFSET)) {
+      startingOffset = determineStartingOffset(table, fromTimestamp);
+    }
+
+    Snapshot curSnapshot = table.snapshot(startingOffset.snapshotId());
+    validateCurrentSnapshotExists(curSnapshot, startingOffset);
+
+    int startPosOfSnapOffset = (int) startingOffset.position();
+
+    boolean scanAllFiles = startingOffset.shouldScanAllFiles();
+
+    boolean shouldContinueReading = true;
+    int curFilesAdded = 0;
+    int curRecordCount = 0;
+    int curPos = 0;
+
+    // Note : we produce nextOffset with pos as non-inclusive
+    while (shouldContinueReading) {
+      // generate manifest index for the curSnapshot
+      List<Pair<ManifestFile, Integer>> indexedManifests =
+          MicroBatches.skippedManifestIndexesFromSnapshot(
+              table.io(), curSnapshot, startPosOfSnapOffset, scanAllFiles);
+      // this is under assumption we will be able to add at-least 1 file in the new offset
+      for (int idx = 0; idx < indexedManifests.size() && shouldContinueReading; idx++) {
+        // be rest assured curPos >= startFileIndex
+        curPos = indexedManifests.get(idx).second();
+        try (CloseableIterable<FileScanTask> taskIterable =
+                MicroBatches.openManifestFile(
+                    table.io(),
+                    table.specs(),
+                    caseSensitive,
+                    curSnapshot,
+                    indexedManifests.get(idx).first(),
+                    scanAllFiles);
+            CloseableIterator<FileScanTask> taskIter = taskIterable.iterator()) {
+          while (taskIter.hasNext()) {
+            FileScanTask task = taskIter.next();
+            if (curPos >= startPosOfSnapOffset) {
+              // TODO : use readLimit provided in function param, the readLimits are derived from
+              // these 2 properties.
+              if ((curFilesAdded + 1) > maxFilesPerMicroBatch
+                  || (curRecordCount + task.file().recordCount()) > maxRecordsPerMicroBatch) {
+                shouldContinueReading = false;
+                break;
+              }
+

Review Comment:
   Since this is a forward port this should be fine for now but probably worth creating an issue to track the ToDo to use the provided `ReadLimit`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] singhpk234 commented on a diff in pull request #7422: Spark 3.4: Support rate limit in Spark Streaming

Posted by "singhpk234 (via GitHub)" <gi...@apache.org>.

singhpk234 commented on code in PR #7422:
URL: https://github.com/apache/iceberg/pull/7422#discussion_r1175903900


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/SparkReadConf.java:
##########
@@ -255,6 +255,22 @@ public Long endTimestamp() {
     return confParser.longConf().option(SparkReadOptions.END_TIMESTAMP).parseOptional();
   }
 
+  public Integer maxFilesPerMicroBatch() {

Review Comment:
   Agree with you, ideally it's not required, I think might have just picked up from previous conf's like below : 
   https://github.com/apache/iceberg/blob/e2c7e77310329b2113d4122eda4bd0d46e1d252d/spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/SparkReadConf.java#L242-L248
   
   
   Let me create a pr to clean up the conf's introduced above as well as existing conf's to just return primitive types, if we always return a default.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] singhpk234 commented on a diff in pull request #7422: Spark 3.4: Support rate limit in Spark Streaming

Posted by "singhpk234 (via GitHub)" <gi...@apache.org>.

singhpk234 commented on code in PR #7422:
URL: https://github.com/apache/iceberg/pull/7422#discussion_r1176744004


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/SparkReadConf.java:
##########
@@ -255,6 +255,22 @@ public Long endTimestamp() {
     return confParser.longConf().option(SparkReadOptions.END_TIMESTAMP).parseOptional();
   }
 
+  public Integer maxFilesPerMicroBatch() {

Review Comment:
   Added a [pr-7429](https://github.com/apache/iceberg/pull/7429/files) for the same https://github.com/apache/iceberg/pull/7429/files



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] wypoon commented on pull request #7422: Spark 3.4: Support rate limit in Spark Streaming

Posted by "wypoon (via GitHub)" <gi...@apache.org>.

wypoon commented on PR #7422:
URL: https://github.com/apache/iceberg/pull/7422#issuecomment-1528507367

   @singhpk234 @jackye1995 
   After I pulled from master, org.apache.iceberg.spark.source.TestStructuredStreamingRead3.testReadStreamOnIcebergTableWithMultipleSnapshots_WithNumberOfRows_1 is failing for me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7422: Spark 3.4: Support rate limit in Spark Streaming

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.

aokolnychyi commented on code in PR #7422:
URL: https://github.com/apache/iceberg/pull/7422#discussion_r1175813062


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/SparkReadConf.java:
##########
@@ -255,6 +255,22 @@ public Long endTimestamp() {
     return confParser.longConf().option(SparkReadOptions.END_TIMESTAMP).parseOptional();
   }
 
+  public Integer maxFilesPerMicroBatch() {

Review Comment:
   Why return `Integer` if the value can't be null?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] singhpk234 commented on a diff in pull request #7422: Spark 3.4: Support rate limit in Spark Streaming

Posted by "singhpk234 (via GitHub)" <gi...@apache.org>.

singhpk234 commented on code in PR #7422:
URL: https://github.com/apache/iceberg/pull/7422#discussion_r1175903900


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/SparkReadConf.java:
##########
@@ -255,6 +255,22 @@ public Long endTimestamp() {
     return confParser.longConf().option(SparkReadOptions.END_TIMESTAMP).parseOptional();
   }
 
+  public Integer maxFilesPerMicroBatch() {

Review Comment:
   Agree with you, ideally it's not required, I think might have just picked up from previous conf's like below : 
   https://github.com/apache/iceberg/blob/e2c7e77310329b2113d4122eda4bd0d46e1d252d/spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/SparkReadConf.java#L242-L248
   
   
   Let me create a pr to clean up the conf's introduced above as well as existing conf's to just return primitive types, if it's not optional.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #7422: Spark 3.4: Support rate limit in Spark Streaming

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.

jackye1995 commented on code in PR #7422:
URL: https://github.com/apache/iceberg/pull/7422#discussion_r1178595519


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/SparkReadConf.java:
##########
@@ -255,6 +255,22 @@ public Long endTimestamp() {
     return confParser.longConf().option(SparkReadOptions.END_TIMESTAMP).parseOptional();
   }
 
+  public Integer maxFilesPerMicroBatch() {

Review Comment:
   I wonder if this is the right approach. Using boxed reference for many configs simply avoids the need to use extreme values like int/long min/max. I feel it makes more sense to keep that behavior and remove the usage of `Long.MIN_VALUE` here? Any thoughts?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #7422: Spark 3.4: Support rate limit in Spark Streaming

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.

jackye1995 commented on code in PR #7422:
URL: https://github.com/apache/iceberg/pull/7422#discussion_r1178596565


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/SparkReadConf.java:
##########
@@ -255,6 +255,22 @@ public Long endTimestamp() {
     return confParser.longConf().option(SparkReadOptions.END_TIMESTAMP).parseOptional();
   }
 
+  public Integer maxFilesPerMicroBatch() {

Review Comment:
   I see, after reading the linked PR, looks like we are using those int/long min/max all the time. In that case it's probably fine to change them to unboxed primitives since it's established pattern, and the value is always set. I approved that PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org