You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/04/03 19:57:09 UTC

[GitHub] [incubator-iceberg] rdblue commented on issue #856: [WIP] Flink Iceberg sink

rdblue commented on issue #856: [WIP] Flink Iceberg sink
URL: https://github.com/apache/incubator-iceberg/pull/856#issuecomment-608630389

> The idea is blocked by BaseTable being not serializable, making the logic have to do more, such as new Catalog and load table.

Why do you need to serialize `BaseTable`? You should only need to serialize the table's schema, partition spec, and `FileIO` instance.

We don't want `BaseTable` to be `Serializable` to prevent problems:

1. `Table` and `BaseTable` expose high-level operations that we don't want to be called from tasks. If it were easy to pass a table instance to tasks, it may also seem easy to commit data files from tasks in parallel. But that's a pattern that we want to discourage.
2. `BaseTable` wraps `TableOperations`, which is plugged in by catalog and can be customized. Since this is likely to be implemented outside of Iceberg, we don't want to require serialization that would make it harder to build.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org