You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/02/05 13:02:13 UTC

[GitHub] [iceberg] caseylucas commented on issue #2208: IcebergTableSink to write data into multiple iceberg tables

caseylucas commented on issue #2208:
URL: https://github.com/apache/iceberg/issues/2208#issuecomment-774019528


   
   I work with Asha, so chiming in (and seeking guidance):
   
   Assumptions:
   1. There will N destination Iceberg tables that have *different schemas*.
   2. N will grow over time and may be large - probably unreasonable to have a growing number of Kafka topics.
   
   So the scenario looks like:
   ```
   1. data for all N destinations (packaged up with some discriminator)
   --->
   2. single Kafka topic
   --->
   3. Flink job (reading from Kafka source)
   --->
   4. N Iceberg tables
   ```
   
   In step 3 & 4, I assume there would need to be some demultiplexing - selecting the
   appropriate Iceberg table dynamically.  There are some options for doing this with
   Kafka (https://stackoverflow.com/questions/45391877/apache-flink-dynamic-number-of-sinks) -
   allowing runtime selection of a destination topic
   (https://ci.apache.org/projects/flink/flink-docs-release-1.3/api/java/org/apache/flink/streaming/util/serialization/KeyedSerializationSchema.html),
   but the Iceberg Sink doesn't appear to currently support this scenario.
   
   Being new to Iceberg and Flink, I'm wondering if this demultiplexing needs to be implemented
   in the Iceberg Flink sink or maybe there is some built in Flink support or concept that we're
   missing.
   
   If demultiplexing needs to be built into the Flink sink, are there any issues that should be
   considered before we go down the path of implementing such a strategy?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org