You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/12/21 14:35:15 UTC

[GitHub] [beam] iemejia commented on a change in pull request #13585: [BEAM-10160] Change API to enable to append to Iceberg

iemejia commented on a change in pull request #13585:
URL: https://github.com/apache/beam/pull/13585#discussion_r546732773



##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
##########
@@ -1741,9 +1740,8 @@ public void populateDisplayData(DisplayData.Builder builder) {
     }
 
     @Override
-    public PDone expand(PCollection<T> input) {
-      input.apply(inner);
-      return PDone.in(input.getPipeline());
+    public WriteFilesResult<?> expand(PCollection<T> input) {

Review comment:
       This solution looks ok, but I am wary of the consequences of changing the return type. It looks like we introduced TypedWrite on Beam for the achieve the same goal without breaking backwards compatibility.
   
   The 'modern' preferred way to write files on Beam is via `FileIO.write()` which already returns `WriteFilesResult`.
   Can you check if you we can achieve the intended results by relying on FileIO.write() + AvroIO.sink() ? or is there anything missing?
   
   CC: @jkff in case you have some extra detail to add.

##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSchemaIOProvider.java
##########
@@ -109,12 +107,12 @@ public Schema schema() {
     }
 
     @Override
-    public PTransform<PCollection<Row>, POutput> buildWriter() {
+    public PTransform<PCollection<Row>, WriteFilesResult<?>> buildWriter() {

Review comment:
       This looks good, here the change has less issues since this class is `@Internal`
   CC @TheNeuralBit for awarenes.

##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java
##########
@@ -90,7 +89,8 @@
 /**
  * A {@link PTransform} that writes to a {@link FileBasedSink}. A write begins with a sequential
  * global initialization of a sink, followed by a parallel write, and ends with a sequential
- * finalization of the write. The output of a write is {@link PDone}.
+ * finalization of the write. The output of a write is {@link WriteFilesResult} with the files

Review comment:
       :+1: 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org