You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Jasmin Redžepović <ja...@superbet.com> on 2022/01/26 13:43:23 UTC

Reading performance - Kafka VS FileSystem

Hello Flink committers :)

Just one short question:
How is performance of reading from Kafka source compared to reading from FileSystem source? I would be very grateful if you could provide a short explanation.

I saw in documentation that both provide exactly-once semantics for streaming, but this sentence about FileSystem got me thinking about performance: “For any repeated enumeration, the SplitEnumerator filters out previously detected files and only sends new ones to the SourceReader.”  - is this filtering slowing down reading if there are more and more files?

p.s. I’m new to the Flink

Thanks for your help and Best regards,
Jasmin


This email is confidential and intended solely for the use of the individual or entity to whom it is addressed. If you received this e-mail by mistake, please notify the sender immediately by e-mail and delete this e-mail from your system. Please be informed that if you are not the intended recipient, you should not disseminate, distribute, disclose, copy or use this e-mail in any way, the act of dissemination, distribution, disclosure, copying or taking any action in reliance on the contents of this information being strictly prohibited. This e-mail is sent by a Superbet Group company. Any views expressed by the sender of this email are not necessarily those of Superbet Group. Please note that computer viruses can be transmitted by email. You are advised to check this email and any attachments for the presence of viruses. Superbet Group cannot accept any responsibility for any viruses transmitted by this email and/or any attachments.

Re: Reading performance - Kafka VS FileSystem

Posted by Yun Tang <my...@live.com>.
Hi Jasmin,

From my knowledge, it seems no big company would adopt pure file system source as the main data source of Flink. We would in general choose a message queue, e.g Kafka, as the data source.

Best
Yun Tang
________________________________
From: Jasmin Redžepović <ja...@superbet.com>
Sent: Wednesday, January 26, 2022 23:13
To: user@flink.apache.org <us...@flink.apache.org>
Subject: Re: Reading performance - Kafka VS FileSystem

Also, what would you recommend? I have both options available:

  *   Kafka - protobuf messages
  *   S3 - here are messages copied from kafka for persistence with Kafka Connect service

On 26.01.2022., at 14:43, Jasmin Redžepović <ja...@superbet.com>> wrote:

Hello Flink committers :)

Just one short question:
How is performance of reading from Kafka source compared to reading from FileSystem source? I would be very grateful if you could provide a short explanation.

I saw in documentation that both provide exactly-once semantics for streaming, but this sentence about FileSystem got me thinking about performance: “For any repeated enumeration, the SplitEnumerator filters out previously detected files and only sends new ones to the SourceReader.”  - is this filtering slowing down reading if there are more and more files?

p.s. I’m new to the Flink

Thanks for your help and Best regards,
Jasmin



This email is confidential and intended solely for the use of the individual or entity to whom it is addressed. If you received this e-mail by mistake, please notify the sender immediately by e-mail and delete this e-mail from your system. Please be informed that if you are not the intended recipient, you should not disseminate, distribute, disclose, copy or use this e-mail in any way, the act of dissemination, distribution, disclosure, copying or taking any action in reliance on the contents of this information being strictly prohibited. This e-mail is sent by a Superbet Group company. Any views expressed by the sender of this email are not necessarily those of Superbet Group. Please note that computer viruses can be transmitted by email. You are advised to check this email and any attachments for the presence of viruses. Superbet Group cannot accept any responsibility for any viruses transmitted by this email and/or any attachments.

Re: Reading performance - Kafka VS FileSystem

Posted by Jasmin Redžepović <ja...@superbet.com>.
Also, what would you recommend? I have both options available:

  *   Kafka - protobuf messages
  *   S3 - here are messages copied from kafka for persistence with Kafka Connect service

On 26.01.2022., at 14:43, Jasmin Redžepović <ja...@superbet.com>> wrote:

Hello Flink committers :)

Just one short question:
How is performance of reading from Kafka source compared to reading from FileSystem source? I would be very grateful if you could provide a short explanation.

I saw in documentation that both provide exactly-once semantics for streaming, but this sentence about FileSystem got me thinking about performance: “For any repeated enumeration, the SplitEnumerator filters out previously detected files and only sends new ones to the SourceReader.”  - is this filtering slowing down reading if there are more and more files?

p.s. I’m new to the Flink

Thanks for your help and Best regards,
Jasmin



This email is confidential and intended solely for the use of the individual or entity to whom it is addressed. If you received this e-mail by mistake, please notify the sender immediately by e-mail and delete this e-mail from your system. Please be informed that if you are not the intended recipient, you should not disseminate, distribute, disclose, copy or use this e-mail in any way, the act of dissemination, distribution, disclosure, copying or taking any action in reliance on the contents of this information being strictly prohibited. This e-mail is sent by a Superbet Group company. Any views expressed by the sender of this email are not necessarily those of Superbet Group. Please note that computer viruses can be transmitted by email. You are advised to check this email and any attachments for the presence of viruses. Superbet Group cannot accept any responsibility for any viruses transmitted by this email and/or any attachments.