You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Jason Yi <93...@gmail.com> on 2022/01/20 17:43:44 UTC

Question - Filesystem connector for lookup table

Hello,

I have data sets in s3 and want to use them as lookup tables in Flink. I
defined tables with the filesystem connector and joined the tables to a
table, defined with the Kinesis connector, in my Flink application. I
expected its output to be written to s3, but no data was written to a sink
table.

According to the Flink doc (
https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/overview/#supported-connectors),
filesystem is available for a lookup source. I wonder if this is true.

If the filesystem connector is not available for lookup tables, is there
any alternative way to use data from s3 as a lookup table in Flink?

Flink version: 1.14.0 (on EMR 6.5)
Kinesis source table: a watermark was defined.
Lookup data: CSV data in s3.
Sink table: Hudi connector

Please let me know if I'm missing anything.

Thanks in advance.
Jason.

Re: Question - Filesystem connector for lookup table

Posted by Martijn Visser <ma...@ververica.com>.

Hi Jason,

The best option would indeed be to make the dimension data available in
something like a database which you can access via JDBC, HBase or Hive.
Those do support lookups.

Best regards,

Martijn

On Thu, 20 Jan 2022 at 22:11, Jason Yi <93...@gmail.com> wrote:

> Thanks for the quick response.
>
> Is there any best or suggested practice for the use case of when we have
> data sets in a filesystem that we want to use in Flink as reference data
> (like dimension data)?
>
>    - Would making dimension data a Hive table or loading it into a table
>    in RDBMS (like MySQL) be the best option for the use case?
>    - Or should we consider having a stage area where output of Flink
>    would be stored, and then consider having another application (like Spark)
>    to join Flink's output to dimension data?
>
> Jason.
>
> On Thu, Jan 20, 2022 at 12:23 PM Martijn Visser <ma...@ververica.com>
> wrote:
>
>> Hi Jason,
>>
>> It's not (properly) supported and we should update the documentation.
>>
>> There is no out of the box possibility to use a file from filesystem as a
>> lookup table as far as I know.
>>
>> Best regards,
>>
>> Martijn
>>
>> Op do 20 jan. 2022 om 18:44 schreef Jason Yi <93...@gmail.com>
>>
>>> Hello,
>>>
>>> I have data sets in s3 and want to use them as lookup tables in Flink. I
>>> defined tables with the filesystem connector and joined the tables to a
>>> table, defined with the Kinesis connector, in my Flink application. I
>>> expected its output to be written to s3, but no data was written to a sink
>>> table.
>>>
>>> According to the Flink doc (
>>> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/overview/#supported-connectors),
>>> filesystem is available for a lookup source. I wonder if this is true.
>>>
>>> If the filesystem connector is not available for lookup tables, is there
>>> any alternative way to use data from s3 as a lookup table in Flink?
>>>
>>> Flink version: 1.14.0 (on EMR 6.5)
>>> Kinesis source table: a watermark was defined.
>>> Lookup data: CSV data in s3.
>>> Sink table: Hudi connector
>>>
>>> Please let me know if I'm missing anything.
>>>
>>> Thanks in advance.
>>> Jason.
>>>
>> --
>>
>> Martijn Visser | Product Manager
>>
>> martijn@ververica.com
>>
>> <https://www.ververica.com/>
>>
>>
>> Follow us @VervericaData
>>
>> --
>>
>> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
>> Conference
>>
>> Stream Processing | Event Driven | Real Time
>>
>>

Re: Question - Filesystem connector for lookup table

Posted by Jason Yi <93...@gmail.com>.

Thanks for the quick response.

Is there any best or suggested practice for the use case of when we have
data sets in a filesystem that we want to use in Flink as reference data
(like dimension data)?

   - Would making dimension data a Hive table or loading it into a table in
   RDBMS (like MySQL) be the best option for the use case?
   - Or should we consider having a stage area where output of Flink would
   be stored, and then consider having another application (like Spark) to
   join Flink's output to dimension data?

Jason.

On Thu, Jan 20, 2022 at 12:23 PM Martijn Visser <ma...@ververica.com>
wrote:

> Hi Jason,
>
> It's not (properly) supported and we should update the documentation.
>
> There is no out of the box possibility to use a file from filesystem as a
> lookup table as far as I know.
>
> Best regards,
>
> Martijn
>
> Op do 20 jan. 2022 om 18:44 schreef Jason Yi <93...@gmail.com>
>
>> Hello,
>>
>> I have data sets in s3 and want to use them as lookup tables in Flink. I
>> defined tables with the filesystem connector and joined the tables to a
>> table, defined with the Kinesis connector, in my Flink application. I
>> expected its output to be written to s3, but no data was written to a sink
>> table.
>>
>> According to the Flink doc (
>> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/overview/#supported-connectors),
>> filesystem is available for a lookup source. I wonder if this is true.
>>
>> If the filesystem connector is not available for lookup tables, is there
>> any alternative way to use data from s3 as a lookup table in Flink?
>>
>> Flink version: 1.14.0 (on EMR 6.5)
>> Kinesis source table: a watermark was defined.
>> Lookup data: CSV data in s3.
>> Sink table: Hudi connector
>>
>> Please let me know if I'm missing anything.
>>
>> Thanks in advance.
>> Jason.
>>
> --
>
> Martijn Visser | Product Manager
>
> martijn@ververica.com
>
> <https://www.ververica.com/>
>
>
> Follow us @VervericaData
>
> --
>
> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
> Conference
>
> Stream Processing | Event Driven | Real Time
>
>

Re: Question - Filesystem connector for lookup table

Posted by Martijn Visser <ma...@ververica.com>.

Hi Jason,

It's not (properly) supported and we should update the documentation.

There is no out of the box possibility to use a file from filesystem as a
lookup table as far as I know.

Best regards,

Martijn

Op do 20 jan. 2022 om 18:44 schreef Jason Yi <93...@gmail.com>

> Hello,
>
> I have data sets in s3 and want to use them as lookup tables in Flink. I
> defined tables with the filesystem connector and joined the tables to a
> table, defined with the Kinesis connector, in my Flink application. I
> expected its output to be written to s3, but no data was written to a sink
> table.
>
> According to the Flink doc (
> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/overview/#supported-connectors),
> filesystem is available for a lookup source. I wonder if this is true.
>
> If the filesystem connector is not available for lookup tables, is there
> any alternative way to use data from s3 as a lookup table in Flink?
>
> Flink version: 1.14.0 (on EMR 6.5)
> Kinesis source table: a watermark was defined.
> Lookup data: CSV data in s3.
> Sink table: Hudi connector
>
> Please let me know if I'm missing anything.
>
> Thanks in advance.
> Jason.
>
-- 

Martijn Visser | Product Manager

martijn@ververica.com

<https://www.ververica.com/>


Follow us @VervericaData

--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time