You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Hanan Yehudai <ha...@radcom.com> on 2019/11/04 13:05:56 UTC

is Flink a database ?

This seems like a controversial subject..

 on purpose 😊

I have my data lake in parquet files – should I use Flink batch mode to query historical  batch   ad Hoc queries ?
or should I use a dedicated “database”   eg Drill / Dremio  / Hive    and their likes  ?
what advantage will Flink give me for queries this type of batch data..

Re: is Flink a database ?

Posted by Piotr Nowojski <pi...@ververica.com>.

Hi :)

What do you mean by “a database”? A SQL like query engine? Flink is already that [1]. A place where you store the data? Flink kind of is that as well [2] and many users are using Flink as the source of truth, not just as a data processing framework.

With Flink Table API/SQL [1], you can easily query the data from other systems (for example read tables stored in Hive Metastore). By extension, you could do the same with DataStream API. Or DataSet API.

With each of those APIs (Table API/SQL, DataStream API, DataSet API) there come different advantages/trade offs. Table API/SQL as pretty high level, give you automatic optimisations and easy of use. DataStream API/DataSet API as being lower level, give you more fine grained control over what’s happening at the expense of requiring more knowledge from you.

As how Flink Table API/SQL compare to other systems, I guess it will be better if someone from the Table API/SQL team respond.

Piotrek

[1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/ <https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/>
[2] https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/state.html <https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/state.html>

> On 4 Nov 2019, at 14:05, Hanan Yehudai <ha...@radcom.com> wrote:
> 
> This seems like a controversial subject.. 
> 
>  on purpose 😊
>  
> I have my data lake in parquet files – should I use Flink batch mode to query historical  batch   ad Hoc queries ? 
> or should I use a dedicated “database”   eg Drill / Dremio  / Hive    and their likes  ?
> what advantage will Flink give me for queries this type of batch data..