You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2023/01/12 10:52:27 UTC

[GitHub] [iceberg] youngxinler opened a new pull request, #6571: java api doc add write data example

youngxinler opened a new pull request, #6571:
URL: https://github.com/apache/iceberg/pull/6571

   about #6510  .
   java api doc in iceberg doc, that not descripte how to use java api write data to iceberg table. that may be confusing for people who want to use java api to write data.
   this PR to add the example about how to write data using java api.
   
   @nastra can you help with the review?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] youngxinler commented on pull request #6571: Docs: java api doc add write data example

Posted by GitBox <gi...@apache.org>.
youngxinler commented on PR #6571:
URL: https://github.com/apache/iceberg/pull/6571#issuecomment-1396490947

   @jackye1995 @rdblue Can I trouble you if you have time to do a review?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] s-akhtar-baig commented on a diff in pull request #6571: Data: java api add GenericTaskWriter and add write demo to Doc.

Posted by "s-akhtar-baig (via GitHub)" <gi...@apache.org>.
s-akhtar-baig commented on code in PR #6571:
URL: https://github.com/apache/iceberg/pull/6571#discussion_r1197031341


##########
data/src/main/java/org/apache/iceberg/data/GenericTaskWriter.java:
##########
@@ -0,0 +1,82 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.data;
+
+import java.io.IOException;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.FileFormat;
+import org.apache.iceberg.PartitionKey;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.StructLike;
+import org.apache.iceberg.io.BaseTaskWriter;
+import org.apache.iceberg.io.FileAppenderFactory;
+import org.apache.iceberg.io.FileIO;
+import org.apache.iceberg.io.OutputFileFactory;
+import org.apache.iceberg.io.PartitionedFanoutWriter;
+import org.apache.iceberg.io.UnpartitionedWriter;
+
+public class GenericTaskWriter<T extends StructLike> extends BaseTaskWriter<T> {
+  private final BaseTaskWriter<T> taskWriter;
+
+  public GenericTaskWriter(
+      PartitionSpec spec,
+      FileFormat format,
+      FileAppenderFactory<T> appenderFactory,
+      OutputFileFactory fileFactory,
+      FileIO io,
+      long targetFileSize) {
+    super(spec, format, appenderFactory, fileFactory, io, targetFileSize);
+    if (spec.isPartitioned()) {
+      final InternalRecordWrapper recordWrapper =
+          new InternalRecordWrapper(spec.schema().asStruct());
+      final PartitionKey partitionKey = new PartitionKey(spec, spec.schema());
+
+      taskWriter =
+          new PartitionedFanoutWriter<T>(

Review Comment:
   Would it make more sense to provide the option to choose from PartitionedWriter and PartitionedFanoutWriter here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6571: Data: java api add GenericTaskWriter and add write demo to Doc.

Posted by "amogh-jahagirdar (via GitHub)" <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #6571:
URL: https://github.com/apache/iceberg/pull/6571#discussion_r1095384483


##########
docs/java-api.md:
##########
@@ -147,6 +147,69 @@ t.newAppend().appendFile(data).commit();
 t.commitTransaction();
 ```
 
+### WriteData

Review Comment:
   WriteData -> Writing Data



##########
data/src/test/java/org/apache/iceberg/data/TestGenericTaskWriter.java:
##########
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.data;
+
+import java.io.File;
+import java.io.IOException;
+import java.time.OffsetDateTime;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import org.apache.commons.compress.utils.Lists;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.iceberg.*;
+import org.apache.iceberg.catalog.Catalog;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.hadoop.HadoopCatalog;
+import org.apache.iceberg.io.OutputFileFactory;
+import org.apache.iceberg.types.Types;
+import org.apache.iceberg.util.StructLikeSet;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+@RunWith(Parameterized.class)
+public class TestGenericTaskWriter {
+  @Rule public final TemporaryFolder temp = new TemporaryFolder();
+
+  private final Schema SCHEMA =
+      new Schema(
+          Types.NestedField.required(1, "level", Types.StringType.get()),
+          Types.NestedField.required(2, "event_time", Types.TimestampType.withZone()),
+          Types.NestedField.required(3, "message", Types.StringType.get()),
+          Types.NestedField.optional(
+              4, "call_stack", Types.ListType.ofRequired(5, Types.StringType.get())));
+
+  private final PartitionSpec SPEC =
+      PartitionSpec.builderFor(SCHEMA).hour("event_time").identity("level").build();
+  private Table table;
+  private List<Record> testRecords = Lists.newArrayList();
+  private final String testFileFormat;
+
+  @Parameterized.Parameters(name = "FileFormat = {0}")
+  public static Object[][] parameters() {
+    return new Object[][] {{"avro"}, {"orc"}, {"parquet"}};
+  }
+
+  public TestGenericTaskWriter(String fileFormat) throws IOException {
+    this.testFileFormat = fileFormat;
+  }
+
+  @Before
+  public void createTable() throws IOException {
+    File testWareHouse = temp.newFolder();
+    if (testWareHouse.exists()) {
+      Assert.assertTrue(testWareHouse.delete());
+    }

Review Comment:
   Nit style: newlines after if/else blocks



##########
data/src/test/java/org/apache/iceberg/data/TestGenericTaskWriter.java:
##########
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.data;
+
+import java.io.File;
+import java.io.IOException;
+import java.time.OffsetDateTime;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import org.apache.commons.compress.utils.Lists;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.iceberg.*;
+import org.apache.iceberg.catalog.Catalog;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.hadoop.HadoopCatalog;
+import org.apache.iceberg.io.OutputFileFactory;
+import org.apache.iceberg.types.Types;
+import org.apache.iceberg.util.StructLikeSet;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+@RunWith(Parameterized.class)
+public class TestGenericTaskWriter {
+  @Rule public final TemporaryFolder temp = new TemporaryFolder();
+
+  private final Schema SCHEMA =
+      new Schema(
+          Types.NestedField.required(1, "level", Types.StringType.get()),
+          Types.NestedField.required(2, "event_time", Types.TimestampType.withZone()),
+          Types.NestedField.required(3, "message", Types.StringType.get()),
+          Types.NestedField.optional(
+              4, "call_stack", Types.ListType.ofRequired(5, Types.StringType.get())));
+
+  private final PartitionSpec SPEC =
+      PartitionSpec.builderFor(SCHEMA).hour("event_time").identity("level").build();
+  private Table table;
+  private List<Record> testRecords = Lists.newArrayList();
+  private final String testFileFormat;
+
+  @Parameterized.Parameters(name = "FileFormat = {0}")
+  public static Object[][] parameters() {
+    return new Object[][] {{"avro"}, {"orc"}, {"parquet"}};
+  }
+
+  public TestGenericTaskWriter(String fileFormat) throws IOException {
+    this.testFileFormat = fileFormat;
+  }
+
+  @Before
+  public void createTable() throws IOException {
+    File testWareHouse = temp.newFolder();
+    if (testWareHouse.exists()) {
+      Assert.assertTrue(testWareHouse.delete());
+    }
+    Catalog catalog = new HadoopCatalog(new Configuration(), testWareHouse.getPath());
+    this.table =
+        catalog.createTable(
+            TableIdentifier.of("logging", "logs"),
+            SCHEMA,
+            SPEC,
+            Collections.singletonMap("write.format.default", testFileFormat));
+  }
+
+  // test write and java API write data code demo
+  private void writeRecords() throws IOException {
+    GenericAppenderFactory appenderFactory =
+        new GenericAppenderFactory(table.schema(), table.spec());
+    int partitionId = 1, taskId = 1;
+    FileFormat fileFormat =
+        FileFormat.valueOf(
+            table.properties().getOrDefault("write.format.default", "parquet").toUpperCase());
+    OutputFileFactory outputFileFactory =
+        OutputFileFactory.builderFor(table, partitionId, taskId).format(fileFormat).build();
+    // TaskWriter write records into file. (the same is ok for unpartition table)
+    long targetFileSizeInBytes = 50L * 1024 * 1024;
+    GenericTaskWriter<Record> genericTaskWriter =
+        new GenericTaskWriter(
+            table.spec(),
+            fileFormat,
+            appenderFactory,
+            outputFileFactory,
+            table.io(),
+            targetFileSizeInBytes);
+
+    GenericRecord genericRecord = GenericRecord.create(table.schema());
+    // assume write 1000 records
+    for (int i = 0; i < 1000; i++) {
+      GenericRecord record = genericRecord.copy();
+      record.setField("level", i % 6 == 0 ? "error" : "info");
+      record.setField("event_time", OffsetDateTime.now());
+      record.setField("message", "Iceberg is a great table format");
+      record.setField("call_stack", Collections.singletonList("NullPointerException"));
+      genericTaskWriter.write(record);
+      // just for test, remove from doc code
+      this.testRecords.add(record);
+    }
+    // after the data file is written above,
+    // the written data file is submitted to the metadata of the table through Table API.
+    AppendFiles appendFiles = table.newAppend();
+    for (DataFile dataFile : genericTaskWriter.dataFiles()) {
+      appendFiles.appendFile(dataFile);
+    }
+    // submit data file.

Review Comment:
   Same as above, I don't think we need this inline comment.



##########
data/src/test/java/org/apache/iceberg/data/TestGenericTaskWriter.java:
##########
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.data;
+
+import java.io.File;
+import java.io.IOException;
+import java.time.OffsetDateTime;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import org.apache.commons.compress.utils.Lists;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.iceberg.*;
+import org.apache.iceberg.catalog.Catalog;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.hadoop.HadoopCatalog;
+import org.apache.iceberg.io.OutputFileFactory;
+import org.apache.iceberg.types.Types;
+import org.apache.iceberg.util.StructLikeSet;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+@RunWith(Parameterized.class)
+public class TestGenericTaskWriter {
+  @Rule public final TemporaryFolder temp = new TemporaryFolder();
+
+  private final Schema SCHEMA =
+      new Schema(
+          Types.NestedField.required(1, "level", Types.StringType.get()),
+          Types.NestedField.required(2, "event_time", Types.TimestampType.withZone()),
+          Types.NestedField.required(3, "message", Types.StringType.get()),
+          Types.NestedField.optional(
+              4, "call_stack", Types.ListType.ofRequired(5, Types.StringType.get())));
+
+  private final PartitionSpec SPEC =
+      PartitionSpec.builderFor(SCHEMA).hour("event_time").identity("level").build();
+  private Table table;
+  private List<Record> testRecords = Lists.newArrayList();
+  private final String testFileFormat;
+
+  @Parameterized.Parameters(name = "FileFormat = {0}")
+  public static Object[][] parameters() {
+    return new Object[][] {{"avro"}, {"orc"}, {"parquet"}};
+  }
+
+  public TestGenericTaskWriter(String fileFormat) throws IOException {
+    this.testFileFormat = fileFormat;
+  }
+
+  @Before
+  public void createTable() throws IOException {
+    File testWareHouse = temp.newFolder();
+    if (testWareHouse.exists()) {
+      Assert.assertTrue(testWareHouse.delete());
+    }
+    Catalog catalog = new HadoopCatalog(new Configuration(), testWareHouse.getPath());
+    this.table =
+        catalog.createTable(
+            TableIdentifier.of("logging", "logs"),
+            SCHEMA,
+            SPEC,
+            Collections.singletonMap("write.format.default", testFileFormat));
+  }
+
+  // test write and java API write data code demo
+  private void writeRecords() throws IOException {
+    GenericAppenderFactory appenderFactory =
+        new GenericAppenderFactory(table.schema(), table.spec());
+    int partitionId = 1, taskId = 1;
+    FileFormat fileFormat =
+        FileFormat.valueOf(
+            table.properties().getOrDefault("write.format.default", "parquet").toUpperCase());
+    OutputFileFactory outputFileFactory =
+        OutputFileFactory.builderFor(table, partitionId, taskId).format(fileFormat).build();
+    // TaskWriter write records into file. (the same is ok for unpartition table)
+    long targetFileSizeInBytes = 50L * 1024 * 1024;
+    GenericTaskWriter<Record> genericTaskWriter =
+        new GenericTaskWriter(
+            table.spec(),
+            fileFormat,
+            appenderFactory,
+            outputFileFactory,
+            table.io(),
+            targetFileSizeInBytes);
+
+    GenericRecord genericRecord = GenericRecord.create(table.schema());
+    // assume write 1000 records
+    for (int i = 0; i < 1000; i++) {
+      GenericRecord record = genericRecord.copy();
+      record.setField("level", i % 6 == 0 ? "error" : "info");
+      record.setField("event_time", OffsetDateTime.now());
+      record.setField("message", "Iceberg is a great table format");
+      record.setField("call_stack", Collections.singletonList("NullPointerException"));
+      genericTaskWriter.write(record);
+      // just for test, remove from doc code

Review Comment:
   IMO I don't think we need this inline comment 



##########
docs/java-api.md:
##########
@@ -147,6 +147,69 @@ t.newAppend().appendFile(data).commit();
 t.commitTransaction();
 ```
 
+### WriteData
+
+The java API can write data into iceberg table.
+
+First write data to the data file, then submit the data file, the data you write will take effect in the table.
+
+For example, add 1000 rows of data to the table.

Review Comment:
   I think we can just say "The following example will demonstrate how to write data files into Iceberg tables". The other details can just be inline comments imo.



##########
docs/java-api.md:
##########
@@ -147,6 +147,69 @@ t.newAppend().appendFile(data).commit();
 t.commitTransaction();
 ```
 
+### WriteData
+
+The java API can write data into iceberg table.
+
+First write data to the data file, then submit the data file, the data you write will take effect in the table.
+
+For example, add 1000 rows of data to the table.
+
+The structure of this table is the same as the demo table in java-api-quickstart
+
+```java
+/**
+ * Schema schema = new Schema(
+ *       Types.NestedField.required(1, "level", Types.StringType.get()),
+ *       Types.NestedField.required(2, "event_time", Types.TimestampType.withZone()),
+ *       Types.NestedField.required(3, "message", Types.StringType.get()),
+ *       Types.NestedField.optional(4, "call_stack", Types.ListType.ofRequired(5, Types.StringType.get()))
+ *   );
+ * PartitionSpec spec = PartitionSpec.builderFor(schema)
+ *       .hour("event_time")
+ *       .identity("level")
+ *       .build();
+ */
+
+GenericAppenderFactory appenderFactory =
+    new GenericAppenderFactory(table.schema(), table.spec());
+int partitionId = 1, taskId = 1;
+FileFormat fileFormat =
+    FileFormat.valueOf(
+        table.properties().getOrDefault("write.format.default", "parquet").toUpperCase());
+OutputFileFactory outputFileFactory =
+    OutputFileFactory.builderFor(table, partitionId, taskId).format(fileFormat).build();

Review Comment:
   Nit for readability, I think this needs more spacing it's a bit dense to read



##########
docs/java-api.md:
##########
@@ -147,6 +147,69 @@ t.newAppend().appendFile(data).commit();
 t.commitTransaction();
 ```
 
+### WriteData
+
+The java API can write data into iceberg table.
+
+First write data to the data file, then submit the data file, the data you write will take effect in the table.
+
+For example, add 1000 rows of data to the table.
+
+The structure of this table is the same as the demo table in java-api-quickstart
+
+```java
+/**
+ * Schema schema = new Schema(
+ *       Types.NestedField.required(1, "level", Types.StringType.get()),
+ *       Types.NestedField.required(2, "event_time", Types.TimestampType.withZone()),
+ *       Types.NestedField.required(3, "message", Types.StringType.get()),
+ *       Types.NestedField.optional(4, "call_stack", Types.ListType.ofRequired(5, Types.StringType.get()))
+ *   );
+ * PartitionSpec spec = PartitionSpec.builderFor(schema)
+ *       .hour("event_time")
+ *       .identity("level")
+ *       .build();
+ */
+
+GenericAppenderFactory appenderFactory =
+    new GenericAppenderFactory(table.schema(), table.spec());
+int partitionId = 1, taskId = 1;
+FileFormat fileFormat =
+    FileFormat.valueOf(
+        table.properties().getOrDefault("write.format.default", "parquet").toUpperCase());
+OutputFileFactory outputFileFactory =
+    OutputFileFactory.builderFor(table, partitionId, taskId).format(fileFormat).build();
+// TaskWriter write records into file. (the same is ok for unpartition table)
+long targetFileSizeInBytes = 50L * 1024 * 1024;
+GenericTaskWriter<Record> genericTaskWriter =
+    new GenericTaskWriter(
+        table.spec(),
+        fileFormat,
+        appenderFactory,
+        outputFileFactory,
+        table.io(),
+        targetFileSizeInBytes);
+
+GenericRecord genericRecord = GenericRecord.create(table.schema());
+// assume write 1000 records

Review Comment:
   Either `write 1000 records` or we may just leave it off. Since it's an example I think it's fine to keep it.



##########
docs/java-api.md:
##########
@@ -147,6 +147,69 @@ t.newAppend().appendFile(data).commit();
 t.commitTransaction();
 ```
 
+### WriteData
+
+The java API can write data into iceberg table.
+
+First write data to the data file, then submit the data file, the data you write will take effect in the table.
+
+For example, add 1000 rows of data to the table.
+
+The structure of this table is the same as the demo table in java-api-quickstart
+
+```java
+/**
+ * Schema schema = new Schema(
+ *       Types.NestedField.required(1, "level", Types.StringType.get()),
+ *       Types.NestedField.required(2, "event_time", Types.TimestampType.withZone()),
+ *       Types.NestedField.required(3, "message", Types.StringType.get()),
+ *       Types.NestedField.optional(4, "call_stack", Types.ListType.ofRequired(5, Types.StringType.get()))
+ *   );
+ * PartitionSpec spec = PartitionSpec.builderFor(schema)
+ *       .hour("event_time")
+ *       .identity("level")
+ *       .build();
+ */
+
+GenericAppenderFactory appenderFactory =
+    new GenericAppenderFactory(table.schema(), table.spec());
+int partitionId = 1, taskId = 1;
+FileFormat fileFormat =
+    FileFormat.valueOf(
+        table.properties().getOrDefault("write.format.default", "parquet").toUpperCase());
+OutputFileFactory outputFileFactory =
+    OutputFileFactory.builderFor(table, partitionId, taskId).format(fileFormat).build();
+// TaskWriter write records into file. (the same is ok for unpartition table)
+long targetFileSizeInBytes = 50L * 1024 * 1024;
+GenericTaskWriter<Record> genericTaskWriter =
+    new GenericTaskWriter(
+        table.spec(),
+        fileFormat,
+        appenderFactory,
+        outputFileFactory,
+        table.io(),
+        targetFileSizeInBytes);
+
+GenericRecord genericRecord = GenericRecord.create(table.schema());
+// assume write 1000 records
+for (int i = 0; i < 1000; i++) {
+    GenericRecord record = genericRecord.copy();
+    record.setField("level", i % 6 == 0 ? "error" : "info");
+    record.setField("event_time", OffsetDateTime.now());
+    record.setField("message", "Iceberg is a great table format");
+    record.setField("call_stack", Collections.singletonList("NullPointerException"));
+    genericTaskWriter.write(record);
+}

Review Comment:
   Nit: newline after the loop



##########
data/src/main/java/org/apache/iceberg/data/GenericTaskWriter.java:
##########
@@ -0,0 +1,82 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.data;
+
+import java.io.IOException;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.FileFormat;
+import org.apache.iceberg.PartitionKey;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.StructLike;
+import org.apache.iceberg.io.BaseTaskWriter;
+import org.apache.iceberg.io.FileAppenderFactory;
+import org.apache.iceberg.io.FileIO;
+import org.apache.iceberg.io.OutputFileFactory;
+import org.apache.iceberg.io.PartitionedFanoutWriter;
+import org.apache.iceberg.io.UnpartitionedWriter;
+
+public class GenericTaskWriter<T extends StructLike> extends BaseTaskWriter<T> {

Review Comment:
   I understand this wraps the writer depending on if it's partitioned or not but since this is public just want to make sure it's really needed. Going through the code it seems like every engine (Spark/Flink etc) implements their own writer and specifies their own partitioning. But effectively they all follow the same principle which is implement PartitionWriter<EnginesRowRepresentation> which applies the partition function on the row representation.
   
   The logic across the engine also looks duplicated with the same if(partition){use the partitioned writer) else {unpartitioned writer} so I think it's nice this PR has a generic writer which could be leverage. So this change makes sense to me. Maybe for partitioned writes we also allow passing in a custom partition function but we don't need to add that until we see a need.



##########
docs/java-api.md:
##########
@@ -147,6 +147,69 @@ t.newAppend().appendFile(data).commit();
 t.commitTransaction();
 ```
 
+### WriteData
+
+The java API can write data into iceberg table.
+
+First write data to the data file, then submit the data file, the data you write will take effect in the table.
+
+For example, add 1000 rows of data to the table.
+
+The structure of this table is the same as the demo table in java-api-quickstart
+
+```java
+/**
+ * Schema schema = new Schema(
+ *       Types.NestedField.required(1, "level", Types.StringType.get()),
+ *       Types.NestedField.required(2, "event_time", Types.TimestampType.withZone()),
+ *       Types.NestedField.required(3, "message", Types.StringType.get()),
+ *       Types.NestedField.optional(4, "call_stack", Types.ListType.ofRequired(5, Types.StringType.get()))
+ *   );
+ * PartitionSpec spec = PartitionSpec.builderFor(schema)
+ *       .hour("event_time")
+ *       .identity("level")
+ *       .build();
+ */
+
+GenericAppenderFactory appenderFactory =
+    new GenericAppenderFactory(table.schema(), table.spec());
+int partitionId = 1, taskId = 1;
+FileFormat fileFormat =
+    FileFormat.valueOf(
+        table.properties().getOrDefault("write.format.default", "parquet").toUpperCase());
+OutputFileFactory outputFileFactory =
+    OutputFileFactory.builderFor(table, partitionId, taskId).format(fileFormat).build();
+// TaskWriter write records into file. (the same is ok for unpartition table)
+long targetFileSizeInBytes = 50L * 1024 * 1024;
+GenericTaskWriter<Record> genericTaskWriter =
+    new GenericTaskWriter(
+        table.spec(),
+        fileFormat,
+        appenderFactory,
+        outputFileFactory,
+        table.io(),
+        targetFileSizeInBytes);
+
+GenericRecord genericRecord = GenericRecord.create(table.schema());
+// assume write 1000 records
+for (int i = 0; i < 1000; i++) {
+    GenericRecord record = genericRecord.copy();
+    record.setField("level", i % 6 == 0 ? "error" : "info");
+    record.setField("event_time", OffsetDateTime.now());
+    record.setField("message", "Iceberg is a great table format");
+    record.setField("call_stack", Collections.singletonList("NullPointerException"));
+    genericTaskWriter.write(record);
+}
+// after the data file is written above,
+// the written data file is submitted to the metadata of the table through Table API.
+AppendFiles appendFiles = table.newAppend();
+for (DataFile dataFile : genericTaskWriter.dataFiles()) {
+    appendFiles.appendFile(dataFile);
+}

Review Comment:
   I think the inline comment could be something like 
   
   // Call the AppendFiles API on each of the data files
   
   
   and then instead of 
   
   `submit data file` we do `Commit the AppendFiles operation`



##########
docs/java-api.md:
##########
@@ -147,6 +147,69 @@ t.newAppend().appendFile(data).commit();
 t.commitTransaction();
 ```
 
+### WriteData
+
+The java API can write data into iceberg table.

Review Comment:
   "The Java API can be used to write data into Iceberg tables"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6571: Docs: java api doc add write data example

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on code in PR #6571:
URL: https://github.com/apache/iceberg/pull/6571#discussion_r1081543402


##########
docs/java-api.md:
##########
@@ -147,6 +147,53 @@ t.newAppend().appendFile(data).commit();
 t.commitTransaction();
 ```
 
+### WriteData
+
+The java api can write data into iceberg table.
+
+First write data to the data file, then submit the data file, the data you write will take effect in the table.
+
+For example, add 1000 pieces of data to the table.
+
+```java
+GenericAppenderFactory appenderFactory = new GenericAppenderFactory(table.schema(), table.spec());
+
+int partitionId = 1, taskId = 1;
+OutputFileFactory outputFileFactory = OutputFileFactory.builderFor(table, partitionId, taskId).format(FileFormat.PARQUET).build();
+final PartitionKey partitionKey = new PartitionKey(table.spec(), table.spec().schema());
+final InternalRecordWrapper recordWrapper = new InternalRecordWrapper(table.schema().asStruct());
+
+// partitionedFanoutWriter will auto partitioned record and create the partitioned writer
+PartitionedFanoutWriter<Record> partitionedFanoutWriter = new PartitionedFanoutWriter<Record>(table.spec(), FileFormat.PARQUET, appenderFactory, outputFileFactory, table.io(), TARGET_FILE_SIZE_IN_BYTES) {
+    @Override
+    protected PartitionKey partition(Record record) {
+        partitionKey.partition(recordWrapper.wrap(record));
+        return partitionKey;
+    }
+};
+
+GenericRecord genericRecord = GenericRecord.create(table.schema());
+
+// assume write 1000 records
+for (int i = 0; i < 1000; i++) {
+    GenericRecord record = genericRecord.copy();
+    record.setField("level",  i % 6 == 0 ? "error" : "info");

Review Comment:
   we might want to specify the schema of the records you insert before the code block so people can understand it more easily



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] youngxinler commented on a diff in pull request #6571: Data: java api add GenericTaskWriter and add write demo to Doc.

Posted by "youngxinler (via GitHub)" <gi...@apache.org>.
youngxinler commented on code in PR #6571:
URL: https://github.com/apache/iceberg/pull/6571#discussion_r1090226814


##########
docs/java-api.md:
##########
@@ -147,6 +147,59 @@ t.newAppend().appendFile(data).commit();
 t.commitTransaction();
 ```
 
+### WriteData
+
+The java api can write data into iceberg table.
+
+First write data to the data file, then submit the data file, the data you write will take effect in the table.
+
+For example, add 1000 pieces of data to the table.
+
+```java
+/**
+ * The structure of this table is the same as the demo table in java-api-quickstart
+ *
+ * Schema schema = new Schema(
+ *       Types.NestedField.required(1, "level", Types.StringType.get()),
+ *       Types.NestedField.required(2, "event_time", Types.TimestampType.withZone()),
+ *       Types.NestedField.required(3, "message", Types.StringType.get()),
+ *       Types.NestedField.optional(4, "call_stack", Types.ListType.ofRequired(5, Types.StringType.get()))
+ *   );
+ * PartitionSpec spec = PartitionSpec.builderFor(schema)
+ *       .hour("event_time")
+ *       .identity("level")
+ *       .build();
+ */
+
+GenericAppenderFactory appenderFactory = new GenericAppenderFactory(table.schema(), table.spec());

Review Comment:
   > What if we put this as a test, and directly reference the test as example in the doc page? By doing that the code can be up to date with latest API and checkstyle standards.
   
   that sounds good,  i change test code and doc demo code make them same.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] youngxinler commented on a diff in pull request #6571: Data: java api add GenericTaskWriter and add write demo to Doc.

Posted by "youngxinler (via GitHub)" <gi...@apache.org>.
youngxinler commented on code in PR #6571:
URL: https://github.com/apache/iceberg/pull/6571#discussion_r1083466424


##########
data/src/main/java/org/apache/iceberg/data/GenericTaskWriter.java:
##########
@@ -0,0 +1,79 @@
+/*
+ *
+ *  * Licensed to the Apache Software Foundation (ASF) under one
+ *  * or more contributor license agreements.  See the NOTICE file
+ *  * distributed with this work for additional information
+ *  * regarding copyright ownership.  The ASF licenses this file
+ *  * to you under the Apache License, Version 2.0 (the
+ *  * "License"); you may not use this file except in compliance
+ *  * with the License.  You may obtain a copy of the License at
+ *  *
+ *  *   http://www.apache.org/licenses/LICENSE-2.0
+ *  *
+ *  * Unless required by applicable law or agreed to in writing,
+ *  * software distributed under the License is distributed on an
+ *  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ *  * KIND, either express or implied.  See the License for the
+ *  * specific language governing permissions and limitations
+ *  * under the License.
+ *
+ */
+
+package org.apache.iceberg.data;
+
+import java.io.IOException;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.FileFormat;
+import org.apache.iceberg.PartitionKey;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.StructLike;
+import org.apache.iceberg.io.BaseTaskWriter;
+import org.apache.iceberg.io.FileAppenderFactory;
+import org.apache.iceberg.io.FileIO;
+import org.apache.iceberg.io.OutputFileFactory;
+import org.apache.iceberg.io.PartitionedFanoutWriter;
+import org.apache.iceberg.io.UnpartitionedWriter;
+
+public class GenericTaskWriter<T extends StructLike> extends BaseTaskWriter<T> {
+    private final BaseTaskWriter<T> taskWriter;
+
+    public GenericTaskWriter(PartitionSpec spec,
+                             FileFormat format,
+                             FileAppenderFactory<T> appenderFactory,
+                             OutputFileFactory fileFactory,
+                             FileIO io,
+                             long targetFileSize) {
+        super(spec, format, appenderFactory, fileFactory, io, targetFileSize);
+        if (spec.isPartitioned()) {
+            final InternalRecordWrapper recordWrapper = new InternalRecordWrapper(spec.schema().asStruct());

Review Comment:
   for InternalRecordWrapper, i just use spec.schema().asStruct() as struct, because the wrapper just for partition field data.
   If using table.schema().asStruct(), this will add an additional parameter to the GenericTaskWriter.
   I tested it and it's OK.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ztaylor797 commented on a diff in pull request #6571: Data: java api add GenericTaskWriter and add write demo to Doc.

Posted by "ztaylor797 (via GitHub)" <gi...@apache.org>.
ztaylor797 commented on code in PR #6571:
URL: https://github.com/apache/iceberg/pull/6571#discussion_r1133304983


##########
data/src/test/java/org/apache/iceberg/data/TestGenericTaskWriter.java:
##########
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.data;
+
+import java.io.File;
+import java.io.IOException;
+import java.time.OffsetDateTime;
+import java.util.Collections;
+import java.util.List;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.iceberg.AppendFiles;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.FileFormat;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.catalog.Catalog;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.hadoop.HadoopCatalog;
+import org.apache.iceberg.io.OutputFileFactory;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.types.Types;
+import org.apache.iceberg.util.StructLikeSet;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+@RunWith(Parameterized.class)
+public class TestGenericTaskWriter {
+  @Rule public final TemporaryFolder temp = new TemporaryFolder();
+
+  private final Schema schema =
+      new Schema(
+          Types.NestedField.required(1, "level", Types.StringType.get()),
+          Types.NestedField.required(2, "event_time", Types.TimestampType.withZone()),
+          Types.NestedField.required(3, "message", Types.StringType.get()),
+          Types.NestedField.optional(
+              4, "call_stack", Types.ListType.ofRequired(5, Types.StringType.get())));
+
+  private final PartitionSpec spec =
+      PartitionSpec.builderFor(schema).hour("event_time").identity("level").build();
+
+  private Table table;
+  private List<Record> testRecords = Lists.newArrayList();
+  private final String testFileFormat;
+
+  @Parameterized.Parameters(name = "FileFormat = {0}")
+  public static Object[][] parameters() {
+    return new Object[][] {{"avro"}, {"orc"}, {"parquet"}};
+  }
+
+  public TestGenericTaskWriter(String fileFormat) throws IOException {
+    this.testFileFormat = fileFormat;
+  }
+
+  @Before
+  public void createTable() throws IOException {
+    File testWareHouse = temp.newFolder();
+    if (testWareHouse.exists()) {
+      Assert.assertTrue(testWareHouse.delete());
+    }
+
+    Catalog catalog = new HadoopCatalog(new Configuration(), testWareHouse.getPath());
+    this.table =
+        catalog.createTable(
+            TableIdentifier.of("logging", "logs"),
+            schema,
+            spec,
+            Collections.singletonMap("write.format.default", testFileFormat));
+  }
+
+  // test write and java API write data code demo
+  private void writeRecords() throws IOException {
+    GenericAppenderFactory appenderFactory =
+        new GenericAppenderFactory(table.schema(), table.spec());
+
+    FileFormat fileFormat =
+        FileFormat.valueOf(
+            table.properties().getOrDefault("write.format.default", "parquet").toUpperCase());
+
+    int partitionId = 1;

Review Comment:
   I appreciate the info, turns out I was using the wrong constructor on this:
   ```
       GenericAppenderFactory appenderFactory =
           new GenericAppenderFactory(table.schema(), table.spec());
   ```
   Had omitted the second argument. Works great now, thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6571: Data: java api add GenericTaskWriter and add write demo to Doc.

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6571:
URL: https://github.com/apache/iceberg/pull/6571#discussion_r1090118896


##########
docs/java-api.md:
##########
@@ -147,6 +147,59 @@ t.newAppend().appendFile(data).commit();
 t.commitTransaction();
 ```
 
+### WriteData
+
+The java api can write data into iceberg table.
+
+First write data to the data file, then submit the data file, the data you write will take effect in the table.
+
+For example, add 1000 pieces of data to the table.
+
+```java
+/**
+ * The structure of this table is the same as the demo table in java-api-quickstart

Review Comment:
   Can we move these to sentences instead of javadoc?



##########
docs/java-api.md:
##########
@@ -147,6 +147,59 @@ t.newAppend().appendFile(data).commit();
 t.commitTransaction();
 ```
 
+### WriteData
+
+The java api can write data into iceberg table.
+
+First write data to the data file, then submit the data file, the data you write will take effect in the table.
+
+For example, add 1000 pieces of data to the table.

Review Comment:
   nit: rows instead of pieces?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] youngxinler commented on a diff in pull request #6571: Docs: java api doc add write data example

Posted by "youngxinler (via GitHub)" <gi...@apache.org>.
youngxinler commented on code in PR #6571:
URL: https://github.com/apache/iceberg/pull/6571#discussion_r1083463816


##########
docs/java-api.md:
##########
@@ -147,6 +147,53 @@ t.newAppend().appendFile(data).commit();
 t.commitTransaction();
 ```
 
+### WriteData
+
+The java api can write data into iceberg table.
+
+First write data to the data file, then submit the data file, the data you write will take effect in the table.
+
+For example, add 1000 pieces of data to the table.
+
+```java
+GenericAppenderFactory appenderFactory = new GenericAppenderFactory(table.schema(), table.spec());
+
+int partitionId = 1, taskId = 1;
+OutputFileFactory outputFileFactory = OutputFileFactory.builderFor(table, partitionId, taskId).format(FileFormat.PARQUET).build();
+final PartitionKey partitionKey = new PartitionKey(table.spec(), table.spec().schema());
+final InternalRecordWrapper recordWrapper = new InternalRecordWrapper(table.schema().asStruct());
+
+// partitionedFanoutWriter will auto partitioned record and create the partitioned writer
+PartitionedFanoutWriter<Record> partitionedFanoutWriter = new PartitionedFanoutWriter<Record>(table.spec(), FileFormat.PARQUET, appenderFactory, outputFileFactory, table.io(), TARGET_FILE_SIZE_IN_BYTES) {

Review Comment:
   > I think we might need example for both partitioned and not partitioned
   
   done with GenericTaskWriter.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6571: Data: java api add GenericTaskWriter and add write demo to Doc.

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6571:
URL: https://github.com/apache/iceberg/pull/6571#discussion_r1090118998


##########
docs/java-api.md:
##########
@@ -147,6 +147,59 @@ t.newAppend().appendFile(data).commit();
 t.commitTransaction();
 ```
 
+### WriteData
+
+The java api can write data into iceberg table.

Review Comment:
   nit: API instead of api?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] youngxinler commented on pull request #6571: Data: java api add GenericTaskWriter and add write demo to Doc.

Posted by "youngxinler (via GitHub)" <gi...@apache.org>.
youngxinler commented on PR #6571:
URL: https://github.com/apache/iceberg/pull/6571#issuecomment-1407538619

   @jackye1995 @JonasJ-ap   Can I trouble you if you have time to do a review? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ZuzannaSadowska commented on pull request #6571: Data: java api add GenericTaskWriter and add write demo to Doc.

Posted by "ZuzannaSadowska (via GitHub)" <gi...@apache.org>.
ZuzannaSadowska commented on PR #6571:
URL: https://github.com/apache/iceberg/pull/6571#issuecomment-1674613926

   This was really helpful, thank you @youngxinler! Is there any progress on this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] youngxinler commented on pull request #6571: Data: java api add GenericTaskWriter and add write demo to Doc.

Posted by "youngxinler (via GitHub)" <gi...@apache.org>.
youngxinler commented on PR #6571:
URL: https://github.com/apache/iceberg/pull/6571#issuecomment-1422170497

   @jackye1995 @amogh-jahagirdar  about this PR, please let me know if it can be merged into the master or if there is anything that needs to be done further?  I would be happy to see your opinions.  thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] youngxinler commented on a diff in pull request #6571: Data: java api add GenericTaskWriter and add write demo to Doc.

Posted by "youngxinler (via GitHub)" <gi...@apache.org>.
youngxinler commented on code in PR #6571:
URL: https://github.com/apache/iceberg/pull/6571#discussion_r1090226814


##########
docs/java-api.md:
##########
@@ -147,6 +147,59 @@ t.newAppend().appendFile(data).commit();
 t.commitTransaction();
 ```
 
+### WriteData
+
+The java api can write data into iceberg table.
+
+First write data to the data file, then submit the data file, the data you write will take effect in the table.
+
+For example, add 1000 pieces of data to the table.
+
+```java
+/**
+ * The structure of this table is the same as the demo table in java-api-quickstart
+ *
+ * Schema schema = new Schema(
+ *       Types.NestedField.required(1, "level", Types.StringType.get()),
+ *       Types.NestedField.required(2, "event_time", Types.TimestampType.withZone()),
+ *       Types.NestedField.required(3, "message", Types.StringType.get()),
+ *       Types.NestedField.optional(4, "call_stack", Types.ListType.ofRequired(5, Types.StringType.get()))
+ *   );
+ * PartitionSpec spec = PartitionSpec.builderFor(schema)
+ *       .hour("event_time")
+ *       .identity("level")
+ *       .build();
+ */
+
+GenericAppenderFactory appenderFactory = new GenericAppenderFactory(table.schema(), table.spec());

Review Comment:
   > What if we put this as a test, and directly reference the test as example in the doc page? By doing that the code can be up to date with latest API and checkstyle standards.
   
   thanks for @jackye1995 review, that sounds good,  i change test code and doc demo code make them same.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] youngxinler commented on a diff in pull request #6571: Data: java api add GenericTaskWriter and add write demo to Doc.

Posted by "youngxinler (via GitHub)" <gi...@apache.org>.
youngxinler commented on code in PR #6571:
URL: https://github.com/apache/iceberg/pull/6571#discussion_r1090226643


##########
docs/java-api.md:
##########
@@ -147,6 +147,59 @@ t.newAppend().appendFile(data).commit();
 t.commitTransaction();
 ```
 
+### WriteData
+
+The java api can write data into iceberg table.
+
+First write data to the data file, then submit the data file, the data you write will take effect in the table.
+
+For example, add 1000 pieces of data to the table.
+
+```java
+/**
+ * The structure of this table is the same as the demo table in java-api-quickstart
+ *
+ * Schema schema = new Schema(
+ *       Types.NestedField.required(1, "level", Types.StringType.get()),
+ *       Types.NestedField.required(2, "event_time", Types.TimestampType.withZone()),
+ *       Types.NestedField.required(3, "message", Types.StringType.get()),
+ *       Types.NestedField.optional(4, "call_stack", Types.ListType.ofRequired(5, Types.StringType.get()))
+ *   );
+ * PartitionSpec spec = PartitionSpec.builderFor(schema)
+ *       .hour("event_time")
+ *       .identity("level")
+ *       .build();
+ */
+
+GenericAppenderFactory appenderFactory = new GenericAppenderFactory(table.schema(), table.spec());

Review Comment:
   that sounds good,  i change test code and doc demo code make them same.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] youngxinler commented on a diff in pull request #6571: Data: java api add GenericTaskWriter and add write demo to Doc.

Posted by "youngxinler (via GitHub)" <gi...@apache.org>.
youngxinler commented on code in PR #6571:
URL: https://github.com/apache/iceberg/pull/6571#discussion_r1133082193


##########
data/src/test/java/org/apache/iceberg/data/TestGenericTaskWriter.java:
##########
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.data;
+
+import java.io.File;
+import java.io.IOException;
+import java.time.OffsetDateTime;
+import java.util.Collections;
+import java.util.List;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.iceberg.AppendFiles;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.FileFormat;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.catalog.Catalog;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.hadoop.HadoopCatalog;
+import org.apache.iceberg.io.OutputFileFactory;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.types.Types;
+import org.apache.iceberg.util.StructLikeSet;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+@RunWith(Parameterized.class)
+public class TestGenericTaskWriter {
+  @Rule public final TemporaryFolder temp = new TemporaryFolder();
+
+  private final Schema schema =
+      new Schema(
+          Types.NestedField.required(1, "level", Types.StringType.get()),
+          Types.NestedField.required(2, "event_time", Types.TimestampType.withZone()),
+          Types.NestedField.required(3, "message", Types.StringType.get()),
+          Types.NestedField.optional(
+              4, "call_stack", Types.ListType.ofRequired(5, Types.StringType.get())));
+
+  private final PartitionSpec spec =
+      PartitionSpec.builderFor(schema).hour("event_time").identity("level").build();
+
+  private Table table;
+  private List<Record> testRecords = Lists.newArrayList();
+  private final String testFileFormat;
+
+  @Parameterized.Parameters(name = "FileFormat = {0}")
+  public static Object[][] parameters() {
+    return new Object[][] {{"avro"}, {"orc"}, {"parquet"}};
+  }
+
+  public TestGenericTaskWriter(String fileFormat) throws IOException {
+    this.testFileFormat = fileFormat;
+  }
+
+  @Before
+  public void createTable() throws IOException {
+    File testWareHouse = temp.newFolder();
+    if (testWareHouse.exists()) {
+      Assert.assertTrue(testWareHouse.delete());
+    }
+
+    Catalog catalog = new HadoopCatalog(new Configuration(), testWareHouse.getPath());
+    this.table =
+        catalog.createTable(
+            TableIdentifier.of("logging", "logs"),
+            schema,
+            spec,
+            Collections.singletonMap("write.format.default", testFileFormat));
+  }
+
+  // test write and java API write data code demo
+  private void writeRecords() throws IOException {
+    GenericAppenderFactory appenderFactory =
+        new GenericAppenderFactory(table.schema(), table.spec());
+
+    FileFormat fileFormat =
+        FileFormat.valueOf(
+            table.properties().getOrDefault("write.format.default", "parquet").toUpperCase());
+
+    int partitionId = 1;

Review Comment:
   partitionId First part of the file name.
   taskId Second part of the file name.
   operationId Third part of the file name.
   used to generate file path. Can you show me the code you used to write it? PartitionSpec should be set table.spec().



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6571: Docs: java api doc add write data example

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on code in PR #6571:
URL: https://github.com/apache/iceberg/pull/6571#discussion_r1081542409


##########
docs/java-api.md:
##########
@@ -147,6 +147,53 @@ t.newAppend().appendFile(data).commit();
 t.commitTransaction();
 ```
 
+### WriteData
+
+The java api can write data into iceberg table.
+
+First write data to the data file, then submit the data file, the data you write will take effect in the table.
+
+For example, add 1000 pieces of data to the table.
+
+```java
+GenericAppenderFactory appenderFactory = new GenericAppenderFactory(table.schema(), table.spec());
+
+int partitionId = 1, taskId = 1;
+OutputFileFactory outputFileFactory = OutputFileFactory.builderFor(table, partitionId, taskId).format(FileFormat.PARQUET).build();
+final PartitionKey partitionKey = new PartitionKey(table.spec(), table.spec().schema());

Review Comment:
   Thanks for adding this, I actually get asked a few times how to write Iceberg data without using a compute engine. This answers exactly that question. 
   
   However, it feels quite hacky to demonstrate this as the solution for people to use, especially the part that we have to use `InternalRecordWrapper` and then create a new class of the abstract `PartitionedFanoutWriter` on the fly. In that case, could we just add that as an actual implementation of `PartitionedFanoutWriter` and contribute that to the codebase? 
   
   Also I believe there is something similar like `TestTaskWriter` in the tests. Maybe what we should do is bring that out of tests so people can actually use it.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] youngxinler commented on a diff in pull request #6571: Data: java api add GenericTaskWriter and add write demo to Doc.

Posted by "youngxinler (via GitHub)" <gi...@apache.org>.
youngxinler commented on code in PR #6571:
URL: https://github.com/apache/iceberg/pull/6571#discussion_r1095520865


##########
data/src/main/java/org/apache/iceberg/data/GenericTaskWriter.java:
##########
@@ -0,0 +1,82 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.data;
+
+import java.io.IOException;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.FileFormat;
+import org.apache.iceberg.PartitionKey;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.StructLike;
+import org.apache.iceberg.io.BaseTaskWriter;
+import org.apache.iceberg.io.FileAppenderFactory;
+import org.apache.iceberg.io.FileIO;
+import org.apache.iceberg.io.OutputFileFactory;
+import org.apache.iceberg.io.PartitionedFanoutWriter;
+import org.apache.iceberg.io.UnpartitionedWriter;
+
+public class GenericTaskWriter<T extends StructLike> extends BaseTaskWriter<T> {

Review Comment:
   Thanks for review. @amogh-jahagirdar About why `GenericTaskWriter` is really needed.  Discussion above could explain some reasons.
   IMO: If users need to write data into partitioned table,  they maybe need to understand `PartitionedFanoutWriter` and `InternalGenericRow`,  because genericRecord  object write directly will cause some partitioned fields cast error. About #6510 I had run into this problem, and it really confused me.  `GenericTaskWriter` reduces the difficulty for JVM applications to write to the iceberg table for users.
   I have the same idea about custom partition function, add it when needed. 
   
   Regarding the above code comment, I have solved it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ztaylor797 commented on a diff in pull request #6571: Data: java api add GenericTaskWriter and add write demo to Doc.

Posted by "ztaylor797 (via GitHub)" <gi...@apache.org>.
ztaylor797 commented on code in PR #6571:
URL: https://github.com/apache/iceberg/pull/6571#discussion_r1132799425


##########
data/src/test/java/org/apache/iceberg/data/TestGenericTaskWriter.java:
##########
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.data;
+
+import java.io.File;
+import java.io.IOException;
+import java.time.OffsetDateTime;
+import java.util.Collections;
+import java.util.List;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.iceberg.AppendFiles;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.FileFormat;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.catalog.Catalog;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.hadoop.HadoopCatalog;
+import org.apache.iceberg.io.OutputFileFactory;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.types.Types;
+import org.apache.iceberg.util.StructLikeSet;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+@RunWith(Parameterized.class)
+public class TestGenericTaskWriter {
+  @Rule public final TemporaryFolder temp = new TemporaryFolder();
+
+  private final Schema schema =
+      new Schema(
+          Types.NestedField.required(1, "level", Types.StringType.get()),
+          Types.NestedField.required(2, "event_time", Types.TimestampType.withZone()),
+          Types.NestedField.required(3, "message", Types.StringType.get()),
+          Types.NestedField.optional(
+              4, "call_stack", Types.ListType.ofRequired(5, Types.StringType.get())));
+
+  private final PartitionSpec spec =
+      PartitionSpec.builderFor(schema).hour("event_time").identity("level").build();
+
+  private Table table;
+  private List<Record> testRecords = Lists.newArrayList();
+  private final String testFileFormat;
+
+  @Parameterized.Parameters(name = "FileFormat = {0}")
+  public static Object[][] parameters() {
+    return new Object[][] {{"avro"}, {"orc"}, {"parquet"}};
+  }
+
+  public TestGenericTaskWriter(String fileFormat) throws IOException {
+    this.testFileFormat = fileFormat;
+  }
+
+  @Before
+  public void createTable() throws IOException {
+    File testWareHouse = temp.newFolder();
+    if (testWareHouse.exists()) {
+      Assert.assertTrue(testWareHouse.delete());
+    }
+
+    Catalog catalog = new HadoopCatalog(new Configuration(), testWareHouse.getPath());
+    this.table =
+        catalog.createTable(
+            TableIdentifier.of("logging", "logs"),
+            schema,
+            spec,
+            Collections.singletonMap("write.format.default", testFileFormat));
+  }
+
+  // test write and java API write data code demo
+  private void writeRecords() throws IOException {
+    GenericAppenderFactory appenderFactory =
+        new GenericAppenderFactory(table.schema(), table.spec());
+
+    FileFormat fileFormat =
+        FileFormat.valueOf(
+            table.properties().getOrDefault("write.format.default", "parquet").toUpperCase());
+
+    int partitionId = 1;

Review Comment:
   @youngxinler What is the expected behavior here with partitionId being hard-coded?
   
   I have tried to implement your code here in my own project, but it results in the partitions reporting as undefined instead of properly fanning out, at least when querying from Trino.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] JonasJ-ap commented on a diff in pull request #6571: Data: java api add GenericTaskWriter and add write demo to Doc.

Posted by "JonasJ-ap (via GitHub)" <gi...@apache.org>.
JonasJ-ap commented on code in PR #6571:
URL: https://github.com/apache/iceberg/pull/6571#discussion_r1087385526


##########
docs/java-api.md:
##########
@@ -147,6 +147,58 @@ t.newAppend().appendFile(data).commit();
 t.commitTransaction();
 ```
 
+### WriteData
+
+The java api can write data into iceberg table.
+
+First write data to the data file, then submit the data file, the data you write will take effect in the table.
+
+For example, add 1000 pieces of data to the table.
+
+```java
+/**
+ * The structure of this table is the same as the demo table in java-api-quickstart
+ *
+ * Schema schema = new Schema(
+ *       Types.NestedField.required(1, "level", Types.StringType.get()),
+ *       Types.NestedField.required(2, "event_time", Types.TimestampType.withZone()),
+ *       Types.NestedField.required(3, "message", Types.StringType.get()),
+ *       Types.NestedField.optional(4, "call_stack", Types.ListType.ofRequired(5, Types.StringType.get()))
+ *   );
+ * PartitionSpec spec = PartitionSpec.builderFor(schema)
+ *       .hour("event_time")
+ *       .identity("level")
+ *       .build();
+ */
+
+GenericAppenderFactory appenderFactory = new GenericAppenderFactory(table.schema(), table.spec());
+int partitionId = 1, taskId = 1;
+FileFormat fileFormat = FileFormat.valueOf(table.properties().getOrDefault("write.format.default", "parquet").toUpperCase());
+OutputFileFactory outputFileFactory = OutputFileFactory.builderFor(table, partitionId, taskId).format(fileFormat).build();
+// TaskWriter write records into files. (the same is ok for unpartiton table)
+GenericTaskWriter<Record> genericTaskWriter = new GenericTaskWriter(table.spec(), fileFormat, appenderFactory, outputFileFactory, table.io(), 50L * 1024 * 1024);

Review Comment:
   Shall we make `50L * 1024 * 1024` a local variable named `targetFileSizeInBytes` so that readers can easily understand the meaning of this argument?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] youngxinler commented on a diff in pull request #6571: Data: java api add GenericTaskWriter and add write demo to Doc.

Posted by "youngxinler (via GitHub)" <gi...@apache.org>.
youngxinler commented on code in PR #6571:
URL: https://github.com/apache/iceberg/pull/6571#discussion_r1083465798


##########
data/src/main/java/org/apache/iceberg/data/GenericTaskWriter.java:
##########
@@ -0,0 +1,79 @@
+/*
+ *
+ *  * Licensed to the Apache Software Foundation (ASF) under one
+ *  * or more contributor license agreements.  See the NOTICE file
+ *  * distributed with this work for additional information
+ *  * regarding copyright ownership.  The ASF licenses this file
+ *  * to you under the Apache License, Version 2.0 (the
+ *  * "License"); you may not use this file except in compliance
+ *  * with the License.  You may obtain a copy of the License at
+ *  *
+ *  *   http://www.apache.org/licenses/LICENSE-2.0
+ *  *
+ *  * Unless required by applicable law or agreed to in writing,
+ *  * software distributed under the License is distributed on an
+ *  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ *  * KIND, either express or implied.  See the License for the
+ *  * specific language governing permissions and limitations
+ *  * under the License.
+ *
+ */
+
+package org.apache.iceberg.data;
+
+import java.io.IOException;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.FileFormat;
+import org.apache.iceberg.PartitionKey;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.StructLike;
+import org.apache.iceberg.io.BaseTaskWriter;
+import org.apache.iceberg.io.FileAppenderFactory;
+import org.apache.iceberg.io.FileIO;
+import org.apache.iceberg.io.OutputFileFactory;
+import org.apache.iceberg.io.PartitionedFanoutWriter;
+import org.apache.iceberg.io.UnpartitionedWriter;
+
+public class GenericTaskWriter<T extends StructLike> extends BaseTaskWriter<T> {

Review Comment:
   GenericTaskWriter is here,  it use UnpartitionedWriter  or PartitionedFanoutWriter to write data.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] youngxinler commented on a diff in pull request #6571: Data: java api add GenericTaskWriter and add write demo to Doc.

Posted by "youngxinler (via GitHub)" <gi...@apache.org>.
youngxinler commented on code in PR #6571:
URL: https://github.com/apache/iceberg/pull/6571#discussion_r1087654680


##########
docs/java-api.md:
##########
@@ -147,6 +147,58 @@ t.newAppend().appendFile(data).commit();
 t.commitTransaction();
 ```
 
+### WriteData
+
+The java api can write data into iceberg table.
+
+First write data to the data file, then submit the data file, the data you write will take effect in the table.
+
+For example, add 1000 pieces of data to the table.
+
+```java
+/**
+ * The structure of this table is the same as the demo table in java-api-quickstart
+ *
+ * Schema schema = new Schema(
+ *       Types.NestedField.required(1, "level", Types.StringType.get()),
+ *       Types.NestedField.required(2, "event_time", Types.TimestampType.withZone()),
+ *       Types.NestedField.required(3, "message", Types.StringType.get()),
+ *       Types.NestedField.optional(4, "call_stack", Types.ListType.ofRequired(5, Types.StringType.get()))
+ *   );
+ * PartitionSpec spec = PartitionSpec.builderFor(schema)
+ *       .hour("event_time")
+ *       .identity("level")
+ *       .build();
+ */
+
+GenericAppenderFactory appenderFactory = new GenericAppenderFactory(table.schema(), table.spec());
+int partitionId = 1, taskId = 1;
+FileFormat fileFormat = FileFormat.valueOf(table.properties().getOrDefault("write.format.default", "parquet").toUpperCase());
+OutputFileFactory outputFileFactory = OutputFileFactory.builderFor(table, partitionId, taskId).format(fileFormat).build();
+// TaskWriter write records into files. (the same is ok for unpartiton table)
+GenericTaskWriter<Record> genericTaskWriter = new GenericTaskWriter(table.spec(), fileFormat, appenderFactory, outputFileFactory, table.io(), 50L * 1024 * 1024);

Review Comment:
   > Shall we make `50L * 1024 * 1024` a local variable named `targetFileSizeInBytes` so that readers can easily understand the meaning of this argument?
   
   good advice.  i add this field.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6571: Data: java api add GenericTaskWriter and add write demo to Doc.

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6571:
URL: https://github.com/apache/iceberg/pull/6571#discussion_r1090118756


##########
docs/java-api.md:
##########
@@ -147,6 +147,59 @@ t.newAppend().appendFile(data).commit();
 t.commitTransaction();
 ```
 
+### WriteData
+
+The java api can write data into iceberg table.
+
+First write data to the data file, then submit the data file, the data you write will take effect in the table.
+
+For example, add 1000 pieces of data to the table.
+
+```java
+/**
+ * The structure of this table is the same as the demo table in java-api-quickstart
+ *
+ * Schema schema = new Schema(
+ *       Types.NestedField.required(1, "level", Types.StringType.get()),
+ *       Types.NestedField.required(2, "event_time", Types.TimestampType.withZone()),
+ *       Types.NestedField.required(3, "message", Types.StringType.get()),
+ *       Types.NestedField.optional(4, "call_stack", Types.ListType.ofRequired(5, Types.StringType.get()))
+ *   );
+ * PartitionSpec spec = PartitionSpec.builderFor(schema)
+ *       .hour("event_time")
+ *       .identity("level")
+ *       .build();
+ */
+
+GenericAppenderFactory appenderFactory = new GenericAppenderFactory(table.schema(), table.spec());
+int partitionId = 1, taskId = 1;
+FileFormat fileFormat = FileFormat.valueOf(table.properties().getOrDefault("write.format.default", "parquet").toUpperCase());
+OutputFileFactory outputFileFactory = OutputFileFactory.builderFor(table, partitionId, taskId).format(fileFormat).build();
+// TaskWriter write records into files. (the same is ok for unpartiton table)
+long targetFileSizeInBytes = 50L * 1024 * 1024;
+GenericTaskWriter<Record> genericTaskWriter = new GenericTaskWriter(table.spec(), fileFormat, appenderFactory, outputFileFactory, table.io(), targetFileSizeInBytes);
+
+GenericRecord genericRecord = GenericRecord.create(table.schema());
+// assume write 1000 records
+for (int i = 0; i < 1000; i++) {
+    GenericRecord record = genericRecord.copy();
+    record.setField("level",  i % 6 == 0 ? "error" : "info");
+    record.setField("event_time", OffsetDateTime.now());
+    record.setField("message", "Iceberg is a great table format");
+    record.setField("call_stack", Collections.singletonList("NullPointerException"));
+    genericTaskWriter.write(record);
+}
+// after the data file is written above,
+// the written data file is submitted to the metadata of the table through Table's Api.
+AppendFiles appendFiles = table.newAppend();
+for (DataFile dataFile : genericTaskWriter.dataFiles()) {
+    appendFiles.appendFile(dataFile);
+}
+// The submitted file generates a snapshot and submit it.
+Snapshot res = appendFiles.apply(); // Of course, you can skip apply to commit directly.

Review Comment:
   in that case I think we can remove this line to avoid confusion? Because the snapshot is not required to commit.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] youngxinler commented on a diff in pull request #6571: Data: java api add GenericTaskWriter and add write demo to Doc.

Posted by "youngxinler (via GitHub)" <gi...@apache.org>.
youngxinler commented on code in PR #6571:
URL: https://github.com/apache/iceberg/pull/6571#discussion_r1133082100


##########
data/src/test/java/org/apache/iceberg/data/TestGenericTaskWriter.java:
##########
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.data;
+
+import java.io.File;
+import java.io.IOException;
+import java.time.OffsetDateTime;
+import java.util.Collections;
+import java.util.List;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.iceberg.AppendFiles;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.FileFormat;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.catalog.Catalog;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.hadoop.HadoopCatalog;
+import org.apache.iceberg.io.OutputFileFactory;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.types.Types;
+import org.apache.iceberg.util.StructLikeSet;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+@RunWith(Parameterized.class)
+public class TestGenericTaskWriter {
+  @Rule public final TemporaryFolder temp = new TemporaryFolder();
+
+  private final Schema schema =
+      new Schema(
+          Types.NestedField.required(1, "level", Types.StringType.get()),
+          Types.NestedField.required(2, "event_time", Types.TimestampType.withZone()),
+          Types.NestedField.required(3, "message", Types.StringType.get()),
+          Types.NestedField.optional(
+              4, "call_stack", Types.ListType.ofRequired(5, Types.StringType.get())));
+
+  private final PartitionSpec spec =
+      PartitionSpec.builderFor(schema).hour("event_time").identity("level").build();
+
+  private Table table;
+  private List<Record> testRecords = Lists.newArrayList();
+  private final String testFileFormat;
+
+  @Parameterized.Parameters(name = "FileFormat = {0}")
+  public static Object[][] parameters() {
+    return new Object[][] {{"avro"}, {"orc"}, {"parquet"}};
+  }
+
+  public TestGenericTaskWriter(String fileFormat) throws IOException {
+    this.testFileFormat = fileFormat;
+  }
+
+  @Before
+  public void createTable() throws IOException {
+    File testWareHouse = temp.newFolder();
+    if (testWareHouse.exists()) {
+      Assert.assertTrue(testWareHouse.delete());
+    }
+
+    Catalog catalog = new HadoopCatalog(new Configuration(), testWareHouse.getPath());
+    this.table =
+        catalog.createTable(
+            TableIdentifier.of("logging", "logs"),
+            schema,
+            spec,
+            Collections.singletonMap("write.format.default", testFileFormat));
+  }
+
+  // test write and java API write data code demo
+  private void writeRecords() throws IOException {
+    GenericAppenderFactory appenderFactory =
+        new GenericAppenderFactory(table.schema(), table.spec());
+
+    FileFormat fileFormat =
+        FileFormat.valueOf(
+            table.properties().getOrDefault("write.format.default", "parquet").toUpperCase());
+
+    int partitionId = 1;

Review Comment:
      * @param partitionId First part of the file name
      * @param taskId Second part of the file name
      * @param operationId Third part of the file name
   used to generate file path.  Can you show me the code you used to write it?  `PartitionSpec` should be set table.spec().



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] youngxinler commented on pull request #6571: Data: java api add GenericTaskWriter and add write demo to Doc.

Posted by "youngxinler (via GitHub)" <gi...@apache.org>.
youngxinler commented on PR #6571:
URL: https://github.com/apache/iceberg/pull/6571#issuecomment-1413084268

   @pauetpupa Glad this helped you,  about SortOrder i don't know much,  i think GenericTaskWriter not support write order, it use RollingFileWriter to write data directly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] youngxinler commented on pull request #6571: Data: java api add GenericTaskWriter and add write demo to Doc.

Posted by "youngxinler (via GitHub)" <gi...@apache.org>.
youngxinler commented on PR #6571:
URL: https://github.com/apache/iceberg/pull/6571#issuecomment-1413086378

   @jackye1995 Can I trouble you if you have time to do a review?  about Data: java api add GenericTaskWriter .  thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pauetpupa commented on pull request #6571: Data: java api add GenericTaskWriter and add write demo to Doc.

Posted by "pauetpupa (via GitHub)" <gi...@apache.org>.
pauetpupa commented on PR #6571:
URL: https://github.com/apache/iceberg/pull/6571#issuecomment-1412417816

   This demo really helped me, thank you so much!!!
   But at this point I have one question:
   
   How can I modify a little bit the code (I don't know if the `genericTaskWriter` or what) to allow writing records that into already closed files?
   In my case, I receive multiple values unsorted so in the case of ingesting them using spark I enable the fanout (or I sort them first). Is there a way using the the java api to do the same? or should I parse all the values first and sort them?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6571: Data: java api add GenericTaskWriter and add write demo to Doc.

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #6571:
URL: https://github.com/apache/iceberg/pull/6571#discussion_r1090123147


##########
docs/java-api.md:
##########
@@ -147,6 +147,59 @@ t.newAppend().appendFile(data).commit();
 t.commitTransaction();
 ```
 
+### WriteData
+
+The java api can write data into iceberg table.
+
+First write data to the data file, then submit the data file, the data you write will take effect in the table.
+
+For example, add 1000 pieces of data to the table.
+
+```java
+/**
+ * The structure of this table is the same as the demo table in java-api-quickstart
+ *
+ * Schema schema = new Schema(
+ *       Types.NestedField.required(1, "level", Types.StringType.get()),
+ *       Types.NestedField.required(2, "event_time", Types.TimestampType.withZone()),
+ *       Types.NestedField.required(3, "message", Types.StringType.get()),
+ *       Types.NestedField.optional(4, "call_stack", Types.ListType.ofRequired(5, Types.StringType.get()))
+ *   );
+ * PartitionSpec spec = PartitionSpec.builderFor(schema)
+ *       .hour("event_time")
+ *       .identity("level")
+ *       .build();
+ */
+
+GenericAppenderFactory appenderFactory = new GenericAppenderFactory(table.schema(), table.spec());

Review Comment:
   What if we put this as a test, and directly reference the test as example in the doc page? By doing that the code can be up to date with latest API and checkstyle standards.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] youngxinler commented on a diff in pull request #6571: Docs: java api doc add write data example

Posted by "youngxinler (via GitHub)" <gi...@apache.org>.
youngxinler commented on code in PR #6571:
URL: https://github.com/apache/iceberg/pull/6571#discussion_r1083463953


##########
docs/java-api.md:
##########
@@ -147,6 +147,53 @@ t.newAppend().appendFile(data).commit();
 t.commitTransaction();
 ```
 
+### WriteData
+
+The java api can write data into iceberg table.
+
+First write data to the data file, then submit the data file, the data you write will take effect in the table.
+
+For example, add 1000 pieces of data to the table.
+
+```java
+GenericAppenderFactory appenderFactory = new GenericAppenderFactory(table.schema(), table.spec());
+
+int partitionId = 1, taskId = 1;
+OutputFileFactory outputFileFactory = OutputFileFactory.builderFor(table, partitionId, taskId).format(FileFormat.PARQUET).build();
+final PartitionKey partitionKey = new PartitionKey(table.spec(), table.spec().schema());
+final InternalRecordWrapper recordWrapper = new InternalRecordWrapper(table.schema().asStruct());
+
+// partitionedFanoutWriter will auto partitioned record and create the partitioned writer
+PartitionedFanoutWriter<Record> partitionedFanoutWriter = new PartitionedFanoutWriter<Record>(table.spec(), FileFormat.PARQUET, appenderFactory, outputFileFactory, table.io(), TARGET_FILE_SIZE_IN_BYTES) {
+    @Override
+    protected PartitionKey partition(Record record) {
+        partitionKey.partition(recordWrapper.wrap(record));
+        return partitionKey;
+    }
+};
+
+GenericRecord genericRecord = GenericRecord.create(table.schema());
+
+// assume write 1000 records
+for (int i = 0; i < 1000; i++) {
+    GenericRecord record = genericRecord.copy();
+    record.setField("level",  i % 6 == 0 ? "error" : "info");

Review Comment:
   > we might want to specify the schema of the records you insert before the code block so people can understand it more easily
   
   The table structure has been described in the code comments



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] youngxinler commented on a diff in pull request #6571: Docs: java api doc add write data example

Posted by "youngxinler (via GitHub)" <gi...@apache.org>.
youngxinler commented on code in PR #6571:
URL: https://github.com/apache/iceberg/pull/6571#discussion_r1083463683


##########
docs/java-api.md:
##########
@@ -147,6 +147,53 @@ t.newAppend().appendFile(data).commit();
 t.commitTransaction();
 ```
 
+### WriteData
+
+The java api can write data into iceberg table.
+
+First write data to the data file, then submit the data file, the data you write will take effect in the table.
+
+For example, add 1000 pieces of data to the table.
+
+```java
+GenericAppenderFactory appenderFactory = new GenericAppenderFactory(table.schema(), table.spec());
+
+int partitionId = 1, taskId = 1;
+OutputFileFactory outputFileFactory = OutputFileFactory.builderFor(table, partitionId, taskId).format(FileFormat.PARQUET).build();
+final PartitionKey partitionKey = new PartitionKey(table.spec(), table.spec().schema());

Review Comment:
   sorry for reply later. that's a good idea, I also don't think it's good to let users directly touch InternalRecordWrapper and PartitionedFanoutWriter, for TestTaskWriter, I read its writing logic.
   so add a new TaskWriter interface implementation class,  name is GenericTaskWriter.  It supports writing of non-partitioned and partitioned tables,   What do you think of this, I have committed to this branch. @jackye1995 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] youngxinler commented on a diff in pull request #6571: Docs: java api doc add write data example

Posted by "youngxinler (via GitHub)" <gi...@apache.org>.
youngxinler commented on code in PR #6571:
URL: https://github.com/apache/iceberg/pull/6571#discussion_r1083463759


##########
docs/java-api.md:
##########
@@ -147,6 +147,53 @@ t.newAppend().appendFile(data).commit();
 t.commitTransaction();
 ```
 
+### WriteData
+
+The java api can write data into iceberg table.
+
+First write data to the data file, then submit the data file, the data you write will take effect in the table.
+
+For example, add 1000 pieces of data to the table.
+
+```java
+GenericAppenderFactory appenderFactory = new GenericAppenderFactory(table.schema(), table.spec());
+
+int partitionId = 1, taskId = 1;
+OutputFileFactory outputFileFactory = OutputFileFactory.builderFor(table, partitionId, taskId).format(FileFormat.PARQUET).build();
+final PartitionKey partitionKey = new PartitionKey(table.spec(), table.spec().schema());
+final InternalRecordWrapper recordWrapper = new InternalRecordWrapper(table.schema().asStruct());
+
+// partitionedFanoutWriter will auto partitioned record and create the partitioned writer
+PartitionedFanoutWriter<Record> partitionedFanoutWriter = new PartitionedFanoutWriter<Record>(table.spec(), FileFormat.PARQUET, appenderFactory, outputFileFactory, table.io(), TARGET_FILE_SIZE_IN_BYTES) {

Review Comment:
   done with GenericTaskWriter.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6571: Docs: java api doc add write data example

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on code in PR #6571:
URL: https://github.com/apache/iceberg/pull/6571#discussion_r1081542733


##########
docs/java-api.md:
##########
@@ -147,6 +147,53 @@ t.newAppend().appendFile(data).commit();
 t.commitTransaction();
 ```
 
+### WriteData
+
+The java api can write data into iceberg table.
+
+First write data to the data file, then submit the data file, the data you write will take effect in the table.
+
+For example, add 1000 pieces of data to the table.
+
+```java
+GenericAppenderFactory appenderFactory = new GenericAppenderFactory(table.schema(), table.spec());
+
+int partitionId = 1, taskId = 1;
+OutputFileFactory outputFileFactory = OutputFileFactory.builderFor(table, partitionId, taskId).format(FileFormat.PARQUET).build();
+final PartitionKey partitionKey = new PartitionKey(table.spec(), table.spec().schema());
+final InternalRecordWrapper recordWrapper = new InternalRecordWrapper(table.schema().asStruct());
+
+// partitionedFanoutWriter will auto partitioned record and create the partitioned writer
+PartitionedFanoutWriter<Record> partitionedFanoutWriter = new PartitionedFanoutWriter<Record>(table.spec(), FileFormat.PARQUET, appenderFactory, outputFileFactory, table.io(), TARGET_FILE_SIZE_IN_BYTES) {

Review Comment:
   I think we might need example for both partitioned and not partitioned



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org