You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Robert Cullen <ci...@gmail.com> on 2021/09/10 17:01:51 UTC
Create a lookup table in a StreamExecutionEnvironment
I have a developer that wants to create a lookup table in Kafka with data
that will be used later when sinking with S3. The lookup table will have
folder information that will be used as a Bucket Assigner in the
StreamingFileSink.
I thought using the Table API to generate the lookup table and build a
collection for caching. Then query the cache with an ID that is being
streamed. But it appears that you cannot mix APIs (Streaming, Table).
Any ideas? thanks!
--
Robert Cullen
240-475-4490
Re: Create a lookup table in a StreamExecutionEnvironment
Posted by JING ZHANG <be...@gmail.com>.
Hi Robert,
First of all, the built-in Kafka connector source is not a
`LookupTableSource`. If we use Kafka as a lookup table, we need to
implement a user-defined source [1].
Secondly, about how to define a user-defined lookup table source for Kafka,
I'm not an expert in Kafka, please correct me if I'm wrong.
Does Kafka support lookup records by a lookup key, like other Key-Value
storage (HBase/Redis/...)?
If yes, you could refer to the implementation of `HBaseDynamicTableSource`
to implement the user-defined lookup table source. You could choose to
lookup the external storage for every lookup request, or introduce a cache
to avoid too frequently RPC.
Else, you could refer to the implementation of `HiveLookupTableSource` to
implement the user-defined lookup table source. Which cache all data of
dimension tables in the Cache, update the cache at intervals.
Thirdly, You could create a lookup table by using the Table API directly or
by switching to SQL DDL. Please refer [2] for more information and demos.
Hope it is helpful to you.
[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.13/zh/docs/dev/table/sourcessinks/
[2]
https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/table/common/#connector-tables
Best,
JING ZHANG
Robert Cullen <ci...@gmail.com> 于2021年9月11日周六 上午1:02写道:
> I have a developer that wants to create a lookup table in Kafka with data
> that will be used later when sinking with S3. The lookup table will have
> folder information that will be used as a Bucket Assigner in the
> StreamingFileSink.
>
> I thought using the Table API to generate the lookup table and build a
> collection for caching. Then query the cache with an ID that is being
> streamed. But it appears that you cannot mix APIs (Streaming, Table).
>
> Any ideas? thanks!
>
> --
> Robert Cullen
> 240-475-4490
>