You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Jason Yi <93...@gmail.com> on 2022/01/20 17:43:44 UTC
Question - Filesystem connector for lookup table
Hello,
I have data sets in s3 and want to use them as lookup tables in Flink. I
defined tables with the filesystem connector and joined the tables to a
table, defined with the Kinesis connector, in my Flink application. I
expected its output to be written to s3, but no data was written to a sink
table.
According to the Flink doc (
https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/overview/#supported-connectors),
filesystem is available for a lookup source. I wonder if this is true.
If the filesystem connector is not available for lookup tables, is there
any alternative way to use data from s3 as a lookup table in Flink?
Flink version: 1.14.0 (on EMR 6.5)
Kinesis source table: a watermark was defined.
Lookup data: CSV data in s3.
Sink table: Hudi connector
Please let me know if I'm missing anything.
Thanks in advance.
Jason.
Re: Question - Filesystem connector for lookup table
Posted by Martijn Visser <ma...@ververica.com>.
Hi Jason,
The best option would indeed be to make the dimension data available in
something like a database which you can access via JDBC, HBase or Hive.
Those do support lookups.
Best regards,
Martijn
On Thu, 20 Jan 2022 at 22:11, Jason Yi <93...@gmail.com> wrote:
> Thanks for the quick response.
>
> Is there any best or suggested practice for the use case of when we have
> data sets in a filesystem that we want to use in Flink as reference data
> (like dimension data)?
>
> - Would making dimension data a Hive table or loading it into a table
> in RDBMS (like MySQL) be the best option for the use case?
> - Or should we consider having a stage area where output of Flink
> would be stored, and then consider having another application (like Spark)
> to join Flink's output to dimension data?
>
> Jason.
>
> On Thu, Jan 20, 2022 at 12:23 PM Martijn Visser <ma...@ververica.com>
> wrote:
>
>> Hi Jason,
>>
>> It's not (properly) supported and we should update the documentation.
>>
>> There is no out of the box possibility to use a file from filesystem as a
>> lookup table as far as I know.
>>
>> Best regards,
>>
>> Martijn
>>
>> Op do 20 jan. 2022 om 18:44 schreef Jason Yi <93...@gmail.com>
>>
>>> Hello,
>>>
>>> I have data sets in s3 and want to use them as lookup tables in Flink. I
>>> defined tables with the filesystem connector and joined the tables to a
>>> table, defined with the Kinesis connector, in my Flink application. I
>>> expected its output to be written to s3, but no data was written to a sink
>>> table.
>>>
>>> According to the Flink doc (
>>> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/overview/#supported-connectors),
>>> filesystem is available for a lookup source. I wonder if this is true.
>>>
>>> If the filesystem connector is not available for lookup tables, is there
>>> any alternative way to use data from s3 as a lookup table in Flink?
>>>
>>> Flink version: 1.14.0 (on EMR 6.5)
>>> Kinesis source table: a watermark was defined.
>>> Lookup data: CSV data in s3.
>>> Sink table: Hudi connector
>>>
>>> Please let me know if I'm missing anything.
>>>
>>> Thanks in advance.
>>> Jason.
>>>
>> --
>>
>> Martijn Visser | Product Manager
>>
>> martijn@ververica.com
>>
>> <https://www.ververica.com/>
>>
>>
>> Follow us @VervericaData
>>
>> --
>>
>> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
>> Conference
>>
>> Stream Processing | Event Driven | Real Time
>>
>>
Re: Question - Filesystem connector for lookup table
Posted by Jason Yi <93...@gmail.com>.
Thanks for the quick response.
Is there any best or suggested practice for the use case of when we have
data sets in a filesystem that we want to use in Flink as reference data
(like dimension data)?
- Would making dimension data a Hive table or loading it into a table in
RDBMS (like MySQL) be the best option for the use case?
- Or should we consider having a stage area where output of Flink would
be stored, and then consider having another application (like Spark) to
join Flink's output to dimension data?
Jason.
On Thu, Jan 20, 2022 at 12:23 PM Martijn Visser <ma...@ververica.com>
wrote:
> Hi Jason,
>
> It's not (properly) supported and we should update the documentation.
>
> There is no out of the box possibility to use a file from filesystem as a
> lookup table as far as I know.
>
> Best regards,
>
> Martijn
>
> Op do 20 jan. 2022 om 18:44 schreef Jason Yi <93...@gmail.com>
>
>> Hello,
>>
>> I have data sets in s3 and want to use them as lookup tables in Flink. I
>> defined tables with the filesystem connector and joined the tables to a
>> table, defined with the Kinesis connector, in my Flink application. I
>> expected its output to be written to s3, but no data was written to a sink
>> table.
>>
>> According to the Flink doc (
>> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/overview/#supported-connectors),
>> filesystem is available for a lookup source. I wonder if this is true.
>>
>> If the filesystem connector is not available for lookup tables, is there
>> any alternative way to use data from s3 as a lookup table in Flink?
>>
>> Flink version: 1.14.0 (on EMR 6.5)
>> Kinesis source table: a watermark was defined.
>> Lookup data: CSV data in s3.
>> Sink table: Hudi connector
>>
>> Please let me know if I'm missing anything.
>>
>> Thanks in advance.
>> Jason.
>>
> --
>
> Martijn Visser | Product Manager
>
> martijn@ververica.com
>
> <https://www.ververica.com/>
>
>
> Follow us @VervericaData
>
> --
>
> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
> Conference
>
> Stream Processing | Event Driven | Real Time
>
>
Re: Question - Filesystem connector for lookup table
Posted by Martijn Visser <ma...@ververica.com>.
Hi Jason,
It's not (properly) supported and we should update the documentation.
There is no out of the box possibility to use a file from filesystem as a
lookup table as far as I know.
Best regards,
Martijn
Op do 20 jan. 2022 om 18:44 schreef Jason Yi <93...@gmail.com>
> Hello,
>
> I have data sets in s3 and want to use them as lookup tables in Flink. I
> defined tables with the filesystem connector and joined the tables to a
> table, defined with the Kinesis connector, in my Flink application. I
> expected its output to be written to s3, but no data was written to a sink
> table.
>
> According to the Flink doc (
> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/overview/#supported-connectors),
> filesystem is available for a lookup source. I wonder if this is true.
>
> If the filesystem connector is not available for lookup tables, is there
> any alternative way to use data from s3 as a lookup table in Flink?
>
> Flink version: 1.14.0 (on EMR 6.5)
> Kinesis source table: a watermark was defined.
> Lookup data: CSV data in s3.
> Sink table: Hudi connector
>
> Please let me know if I'm missing anything.
>
> Thanks in advance.
> Jason.
>
--
Martijn Visser | Product Manager
martijn@ververica.com
<https://www.ververica.com/>
Follow us @VervericaData
--
Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference
Stream Processing | Event Driven | Real Time