You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/02/05 19:04:47 UTC

[GitHub] [iceberg] stevenzwu edited a comment on issue #2208: IcebergTableSink to write data into multiple iceberg tables

stevenzwu edited a comment on issue #2208:
URL: https://github.com/apache/iceberg/issues/2208#issuecomment-774178386

Yeah, a single Kafka producer/sink supports writing to multiple Kafka topics as long as they are all on the same Kafka cluster. It is a comfortable situation for Kafka. However, it is not without some penalty though, as it will affect data batching and impact disk I/O on the broker side.

It is very expensive (and maybe impractical) for a single Iceberg sink to support growing and large number tables. The writers would need to keep many open files. That could lead to memory pressure for writer tasks. When it is time to checkpoint and commit, the writers need to flush and upload files for hundreds of tables and the committer needs to commit hundreds of tables. That would be very slow. I would suggest doing the demux before the sink jobs to Iceberg.

Also if you have single Kafka topic holding different and growing number of datasets, you also loose the benefit of schema validation when ingesting data to Kafka. Having separate Kafka topic and schema validation for each dataset may also help with the data quality.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org