You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/07/24 01:23:11 UTC

[GitHub] [iceberg] aokolnychyi opened a new pull request #2858: Core: Support multiple specs in OutputFileFactory

aokolnychyi opened a new pull request #2858:
URL: https://github.com/apache/iceberg/pull/2858


   This PR extends `OutputFileFactory` to support multiple specs. When we write delete files, we should write them using the the same partition spec as data files they reference. This means we will need to write to multiple specs in a single transaction. Instead of having an output factory per spec, I think it is better to support multiple specs in a single factory.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #2858: Core: Support multiple specs in OutputFileFactory

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #2858:
URL: https://github.com/apache/iceberg/pull/2858#discussion_r676354406



##########
File path: spark2/src/main/java/org/apache/iceberg/spark/source/Writer.java
##########
@@ -263,7 +263,7 @@ public String toString() {
     public DataWriter<InternalRow> createDataWriter(int partitionId, long taskId, long epochId) {
       Table table = tableBroadcast.value();
 
-      OutputFileFactory fileFactory = new OutputFileFactory(table, format, partitionId, taskId);
+      OutputFileFactory fileFactory = OutputFileFactory.builderFor(table, format, partitionId, taskId).build();

Review comment:
       Nit: Instead of keeping the constructor arguments, since we're using a builder, would it make sense to use a builder pattern constructor, such as the following?
   
   Since these are all required fields, this might introduce more complexity than it is worth, but some of the fields _seem_ like things we might be able to determine other ways (such as partitionId). But I'm not quite sure if that's correct or not.
   
   ```
   OutputFileFactor
       .builderFor(table)
       .withFormat(format)
       .withPartitonId(partitionId)
       .withTaskId(taskId)
       .build();
   ```
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2858: Core: Support multiple specs in OutputFileFactory

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2858:
URL: https://github.com/apache/iceberg/pull/2858#discussion_r676043003



##########
File path: core/src/main/java/org/apache/iceberg/io/OutputFileFactory.java
##########
@@ -95,25 +100,71 @@ public OutputFileFactory(PartitionSpec spec, FileFormat format, LocationProvider
     this.operationId = operationId;
   }
 
+  public static Builder builderFor(Table table, FileFormat format, int partitionId, long taskId) {

Review comment:
       It is a bit debatable what we consider as required arguments. I think these 4 are always set.
   
   We may consider defaulting the format to the current table format for data files but we have to ensure it is the same format we use in data writers. Given that we have write options that control formats and we are about to start writing delete files, it seems safer to always require an explicit format.
   
   We can default partition and task IDs to 1 just for tests. These values must be always set in real scenarios.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2858: Core: Support multiple specs in OutputFileFactory

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2858:
URL: https://github.com/apache/iceberg/pull/2858#discussion_r676745707



##########
File path: spark2/src/main/java/org/apache/iceberg/spark/source/Writer.java
##########
@@ -263,7 +263,7 @@ public String toString() {
     public DataWriter<InternalRow> createDataWriter(int partitionId, long taskId, long epochId) {
       Table table = tableBroadcast.value();
 
-      OutputFileFactory fileFactory = new OutputFileFactory(table, format, partitionId, taskId);
+      OutputFileFactory fileFactory = OutputFileFactory.builderFor(table, format, partitionId, taskId).build();

Review comment:
       I think the intent is to make table, partition, and task required. Otherwise, it isn't obvious that the builder will fail at runtime if you don't call `withTaskId`. It makes sense to me to avoid runtime issues by adding the required arguments to `builderFor` and anything that can be customized as a builder method.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #2858: Core: Support multiple specs in OutputFileFactory

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2858:
URL: https://github.com/apache/iceberg/pull/2858#issuecomment-887054364


   Merged. Thanks for the reviews, everyone!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #2858: Core: Support multiple specs in OutputFileFactory

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #2858:
URL: https://github.com/apache/iceberg/pull/2858#issuecomment-885981616


   cc @pvary @RussellSpitzer @rdblue @openinx @jun-he @yyanyy @jackye1995 @flyrain @kbendick @karuppayya 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2858: Core: Support multiple specs in OutputFileFactory

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2858:
URL: https://github.com/apache/iceberg/pull/2858#discussion_r676043003



##########
File path: core/src/main/java/org/apache/iceberg/io/OutputFileFactory.java
##########
@@ -95,25 +100,71 @@ public OutputFileFactory(PartitionSpec spec, FileFormat format, LocationProvider
     this.operationId = operationId;
   }
 
+  public static Builder builderFor(Table table, FileFormat format, int partitionId, long taskId) {

Review comment:
       It is a bit debatable what we consider as required arguments. I think these 4 are always set.
   
   We may consider defaulting the format to the current table format for data files but we have to ensure it is the same format we use in data writers. That's why it safer to always require an explicit value.
   
   We can default partition and task IDs to 1 just for tests. These values must be always set in real scenarios.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #2858: Core: Support multiple specs in OutputFileFactory

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #2858:
URL: https://github.com/apache/iceberg/pull/2858#discussion_r676354406



##########
File path: spark2/src/main/java/org/apache/iceberg/spark/source/Writer.java
##########
@@ -263,7 +263,7 @@ public String toString() {
     public DataWriter<InternalRow> createDataWriter(int partitionId, long taskId, long epochId) {
       Table table = tableBroadcast.value();
 
-      OutputFileFactory fileFactory = new OutputFileFactory(table, format, partitionId, taskId);
+      OutputFileFactory fileFactory = OutputFileFactory.builderFor(table, format, partitionId, taskId).build();

Review comment:
       Nit: Instead of keeping the constructor arguments, since we're using a builder, would it make sense to use a builder pattern constructor, such as the following?
   
   Since these are all required fields, this might introduce more complexity than it is worth, but some of the fields _seem_ like things we might be able to determine other ways (such as partitionId). But I'm not quite sure if that's correct or not.
   
   ```java
   OutputFileFactor
       .builderFor(table)
       .withFormat(format)
       .withPartitonId(partitionId)
       .withTaskId(taskId)
       .build();
   ```
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #2858: Core: Support multiple specs in OutputFileFactory

Posted by GitBox <gi...@apache.org>.
rdblue merged pull request #2858:
URL: https://github.com/apache/iceberg/pull/2858


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on a change in pull request #2858: Core: Support multiple specs in OutputFileFactory

Posted by GitBox <gi...@apache.org>.
jun-he commented on a change in pull request #2858:
URL: https://github.com/apache/iceberg/pull/2858#discussion_r676260942



##########
File path: core/src/main/java/org/apache/iceberg/io/OutputFileFactory.java
##########
@@ -95,25 +100,71 @@ public OutputFileFactory(PartitionSpec spec, FileFormat format, LocationProvider
     this.operationId = operationId;
   }
 
+  public static Builder builderFor(Table table, FileFormat format, int partitionId, long taskId) {
+    return new Builder(table, format, partitionId, taskId);
+  }
+
   private String generateFilename() {
     return format.addExtension(
         String.format("%05d-%d-%s-%05d", partitionId, taskId, operationId, fileCount.incrementAndGet()));
   }
 
   /**
-   * Generates EncryptedOutputFile for UnpartitionedWriter.
+   * Generates an {@link EncryptedOutputFile} for unpartitioned writes.
    */
   public EncryptedOutputFile newOutputFile() {
     OutputFile file = io.newOutputFile(locations.newDataLocation(generateFilename()));
     return encryptionManager.encrypt(file);
   }
 
   /**
-   * Generates EncryptedOutputFile for PartitionedWriter.
+   * Generates an {@link EncryptedOutputFile} for partitioned writes in the default spec.
    */
   public EncryptedOutputFile newOutputFile(StructLike partition) {
+    return newOutputFile(defaultSpec, partition);
+  }
+
+  /**
+   * Generates an {@link EncryptedOutputFile} for partitioned writes in a given spec.
+   */
+  public EncryptedOutputFile newOutputFile(PartitionSpec spec, StructLike partition) {

Review comment:
       nit: wondering if there are tests covering this method and also the scenario writing to multiple specs in a single transaction?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #2858: Core: Support multiple specs in OutputFileFactory

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #2858:
URL: https://github.com/apache/iceberg/pull/2858#issuecomment-887058173


   Thanks, folks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #2858: Core: Support multiple specs in OutputFileFactory

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #2858:
URL: https://github.com/apache/iceberg/pull/2858#issuecomment-887008108


   @jun-he @flyrain, I added a test for multiple specs. More tests will be in following PRs when we start writing and committing files that belong to multiple specs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2858: Core: Support multiple specs in OutputFileFactory

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2858:
URL: https://github.com/apache/iceberg/pull/2858#discussion_r676858898



##########
File path: core/src/main/java/org/apache/iceberg/io/OutputFileFactory.java
##########
@@ -95,25 +100,71 @@ public OutputFileFactory(PartitionSpec spec, FileFormat format, LocationProvider
     this.operationId = operationId;
   }
 
+  public static Builder builderFor(Table table, FileFormat format, int partitionId, long taskId) {

Review comment:
       Hopefully this will standardize that util method so that only this class is needed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on a change in pull request #2858: Core: Support multiple specs in OutputFileFactory

Posted by GitBox <gi...@apache.org>.
jun-he commented on a change in pull request #2858:
URL: https://github.com/apache/iceberg/pull/2858#discussion_r676260942



##########
File path: core/src/main/java/org/apache/iceberg/io/OutputFileFactory.java
##########
@@ -95,25 +100,71 @@ public OutputFileFactory(PartitionSpec spec, FileFormat format, LocationProvider
     this.operationId = operationId;
   }
 
+  public static Builder builderFor(Table table, FileFormat format, int partitionId, long taskId) {
+    return new Builder(table, format, partitionId, taskId);
+  }
+
   private String generateFilename() {
     return format.addExtension(
         String.format("%05d-%d-%s-%05d", partitionId, taskId, operationId, fileCount.incrementAndGet()));
   }
 
   /**
-   * Generates EncryptedOutputFile for UnpartitionedWriter.
+   * Generates an {@link EncryptedOutputFile} for unpartitioned writes.
    */
   public EncryptedOutputFile newOutputFile() {
     OutputFile file = io.newOutputFile(locations.newDataLocation(generateFilename()));
     return encryptionManager.encrypt(file);
   }
 
   /**
-   * Generates EncryptedOutputFile for PartitionedWriter.
+   * Generates an {@link EncryptedOutputFile} for partitioned writes in the default spec.
    */
   public EncryptedOutputFile newOutputFile(StructLike partition) {
+    return newOutputFile(defaultSpec, partition);
+  }
+
+  /**
+   * Generates an {@link EncryptedOutputFile} for partitioned writes in a given spec.
+   */
+  public EncryptedOutputFile newOutputFile(PartitionSpec spec, StructLike partition) {

Review comment:
       nit: wondering if there are tests covering the scenario writing to multiple specs in a single transaction using this method?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #2858: Core: Support multiple specs in OutputFileFactory

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #2858:
URL: https://github.com/apache/iceberg/pull/2858#discussion_r676802730



##########
File path: core/src/main/java/org/apache/iceberg/io/OutputFileFactory.java
##########
@@ -95,25 +100,71 @@ public OutputFileFactory(PartitionSpec spec, FileFormat format, LocationProvider
     this.operationId = operationId;
   }
 
+  public static Builder builderFor(Table table, FileFormat format, int partitionId, long taskId) {

Review comment:
       +1 for having default file format from the table properties, so that when `write-format` option is not set it can default to the table default instead of finding it again and pass it to the call.
   
   And btw it's probably a good idea to have a util method to get the file format from the properties map, since we are doing it everywhere, but no need to have that in this PR.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #2858: Core: Support multiple specs in OutputFileFactory

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2858:
URL: https://github.com/apache/iceberg/pull/2858#issuecomment-886943603


   This PR is specific to the output file factory. We can add tests for committing multiple specs when those features are added.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2858: Core: Support multiple specs in OutputFileFactory

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2858:
URL: https://github.com/apache/iceberg/pull/2858#discussion_r676923441



##########
File path: core/src/main/java/org/apache/iceberg/io/OutputFileFactory.java
##########
@@ -95,25 +100,71 @@ public OutputFileFactory(PartitionSpec spec, FileFormat format, LocationProvider
     this.operationId = operationId;
   }
 
+  public static Builder builderFor(Table table, FileFormat format, int partitionId, long taskId) {

Review comment:
       I moved the format from the list of required args. Let's consider adding a utility method once we have clarity on how the writer factory will look like and how it will be used.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2858: Core: Support multiple specs in OutputFileFactory

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2858:
URL: https://github.com/apache/iceberg/pull/2858#discussion_r676924105



##########
File path: spark2/src/main/java/org/apache/iceberg/spark/source/Writer.java
##########
@@ -263,7 +263,7 @@ public String toString() {
     public DataWriter<InternalRow> createDataWriter(int partitionId, long taskId, long epochId) {
       Table table = tableBroadcast.value();
 
-      OutputFileFactory fileFactory = new OutputFileFactory(table, format, partitionId, taskId);
+      OutputFileFactory fileFactory = OutputFileFactory.builderFor(table, format, partitionId, taskId).build();

Review comment:
       Correct, the intent was to indicate that these configs are required even in the API.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on pull request #2858: Core: Support multiple specs in OutputFileFactory

Posted by GitBox <gi...@apache.org>.
pvary commented on pull request #2858:
URL: https://github.com/apache/iceberg/pull/2858#issuecomment-886000852


   I am OK with the Hive changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #2858: Core: Support multiple specs in OutputFileFactory

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #2858:
URL: https://github.com/apache/iceberg/pull/2858#discussion_r676354406



##########
File path: spark2/src/main/java/org/apache/iceberg/spark/source/Writer.java
##########
@@ -263,7 +263,7 @@ public String toString() {
     public DataWriter<InternalRow> createDataWriter(int partitionId, long taskId, long epochId) {
       Table table = tableBroadcast.value();
 
-      OutputFileFactory fileFactory = new OutputFileFactory(table, format, partitionId, taskId);
+      OutputFileFactory fileFactory = OutputFileFactory.builderFor(table, format, partitionId, taskId).build();

Review comment:
       Nit: Instead of keeping the constructor arguments, since we're using a builder, would it make sense to use a builder pattern constructor, such as the following?
   
   Since these are all required fields, this might introduce more complexity than it is worth, but some of the fields _seem_ like things we might be able to determine other ways (such as partitionId), at least in specific engines. But I'm not quite sure if that's correct or not or how exactly that would look at the moment.
   
   ```java
   OutputFileFactory
       .builderFor(table)
       .withFormat(format)
       .withPartitonId(partitionId)
       .withTaskId(taskId)
       .build();
   ```
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #2858: Core: Support multiple specs in OutputFileFactory

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #2858:
URL: https://github.com/apache/iceberg/pull/2858#discussion_r676354406



##########
File path: spark2/src/main/java/org/apache/iceberg/spark/source/Writer.java
##########
@@ -263,7 +263,7 @@ public String toString() {
     public DataWriter<InternalRow> createDataWriter(int partitionId, long taskId, long epochId) {
       Table table = tableBroadcast.value();
 
-      OutputFileFactory fileFactory = new OutputFileFactory(table, format, partitionId, taskId);
+      OutputFileFactory fileFactory = OutputFileFactory.builderFor(table, format, partitionId, taskId).build();

Review comment:
       Nit: Instead of keeping the constructor arguments, since we're using a builder, would it make sense to use a builder pattern constructor, such as the following?
   
   Since these are all required fields, this might introduce more complexity than it is worth, but some of the fields _seem_ like things we might be able to determine other ways (such as partitionId). But I'm not quite sure if that's correct or not.
   
   ```java
   OutputFileFactory
       .builderFor(table)
       .withFormat(format)
       .withPartitonId(partitionId)
       .withTaskId(taskId)
       .build();
   ```
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2858: Core: Support multiple specs in OutputFileFactory

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2858:
URL: https://github.com/apache/iceberg/pull/2858#discussion_r676189525



##########
File path: core/src/main/java/org/apache/iceberg/io/OutputFileFactory.java
##########
@@ -95,25 +100,71 @@ public OutputFileFactory(PartitionSpec spec, FileFormat format, LocationProvider
     this.operationId = operationId;
   }
 
+  public static Builder builderFor(Table table, FileFormat format, int partitionId, long taskId) {

Review comment:
       What about removing format and defaulting that using the table property? Then if an engine wants to override it can, but otherwise it is automatic.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org