You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Robert Cullen <ci...@gmail.com> on 2021/09/10 17:01:51 UTC

Create a lookup table in a StreamExecutionEnvironment

I have a developer that wants to create a lookup table in Kafka with data
that will be used later when sinking with S3.  The lookup table will have
folder information that will be used as a Bucket Assigner in the
StreamingFileSink.

I thought using the Table API to generate the lookup table and build a
collection for caching.  Then query the cache with an ID that is being
streamed.  But it appears that you cannot mix APIs (Streaming, Table).

Any ideas?  thanks!

-- 
Robert Cullen
240-475-4490

Re: Create a lookup table in a StreamExecutionEnvironment

Posted by JING ZHANG <be...@gmail.com>.

Hi Robert,
First of all, the built-in Kafka connector source is not a
`LookupTableSource`. If we use Kafka as a lookup table, we need to
implement a user-defined source [1].

Secondly, about how to define a user-defined lookup table source for Kafka,
I'm not an expert in Kafka, please correct me if I'm wrong.
Does Kafka support lookup records by a lookup key, like other Key-Value
storage (HBase/Redis/...)?
If yes, you could refer to the implementation of `HBaseDynamicTableSource`
to implement the user-defined lookup table source. You could choose to
lookup the external storage for every lookup request, or introduce a cache
to avoid too frequently RPC.
Else, you could refer to the implementation of `HiveLookupTableSource` to
implement the user-defined lookup table source. Which cache all data of
dimension tables in the Cache, update the cache at intervals.

Thirdly, You could create a lookup table by using the Table API directly or
by switching to SQL DDL. Please refer [2] for more information and demos.

Hope it is helpful to you.

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.13/zh/docs/dev/table/sourcessinks/
[2]
https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/table/common/#connector-tables

Best,
JING ZHANG

Robert Cullen <ci...@gmail.com> 于2021年9月11日周六 上午1:02写道：

> I have a developer that wants to create a lookup table in Kafka with data
> that will be used later when sinking with S3.  The lookup table will have
> folder information that will be used as a Bucket Assigner in the
> StreamingFileSink.
>
> I thought using the Table API to generate the lookup table and build a
> collection for caching.  Then query the cache with an ID that is being
> streamed.  But it appears that you cannot mix APIs (Streaming, Table).
>
> Any ideas?  thanks!
>
> --
> Robert Cullen
> 240-475-4490
>