You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iceberg.apache.org by Taher Koitawala <ta...@gmail.com> on 2022/06/27 05:38:55 UTC

FileWriterFactory Vs FileAppenderFactory

Hi All,
         I am trying to create a Java service with an Iceberg writer that
writes data over to FS after reading from various sources. I  came across
these two interfaces and cannot tell when to implement which one.

Both the FileWriterFactory and FileAppenderFactory have an Equality Delete
Writer method and PositionDeleteWriter. Apart from that,
FileAppenderFactory has the newDataWriter method also found in
FileWriterFactory.

Please can you give more clarity on which one to implement for Parquet
writes?
Also would appreciate how to use appending to existing files which will be
pushed over to s3 later. I suppose I will not be able to append to an s3
file.

Regards,
Taher Koitawala

Re: FileWriterFactory Vs FileAppenderFactory

Posted by Ryan Blue <bl...@tabular.io>.
Taher,

I typically use the helpers in the `Parquet` class to create Parquet files.
That's probably the easiest way to create individual files.

`FileWriterFactory` and `FileAppenderFactory` are ways to provide object
model support to common write patterns. Flink and Spark both use different
in-memory models, so they create different factories so that common writers
can consume rows in their in-memory models. What you probably want is to
create a table using Iceberg generics or Avro objects, so `Parquet` is the
easy path for that.

Ryan

On Sun, Jun 26, 2022 at 10:39 PM Taher Koitawala <ta...@gmail.com> wrote:

> Hi All,
>          I am trying to create a Java service with an Iceberg writer that
> writes data over to FS after reading from various sources. I  came across
> these two interfaces and cannot tell when to implement which one.
>
> Both the FileWriterFactory and FileAppenderFactory have an Equality Delete
> Writer method and PositionDeleteWriter. Apart from that,
> FileAppenderFactory has the newDataWriter method also found in
> FileWriterFactory.
>
> Please can you give more clarity on which one to implement for Parquet
> writes?
> Also would appreciate how to use appending to existing files which will be
> pushed over to s3 later. I suppose I will not be able to append to an s3
> file.
>
> Regards,
> Taher Koitawala
>


-- 
Ryan Blue
Tabular