You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by wangsan <wa...@163.com> on 2017/11/20 12:27:29 UTC

Hive integration in table API and SQL

Hi all，

I am currently learning table API and SQL in Flink. I noticed that Flink does not support Hive tables as table source, and even JDBC table source are not provided. There are cases we do need to join a stream table with static Hive or other database tables to get more specific attributes, so how can I implements this functionality. Do I need to implement my own dataset connectors to load data from external tables using JDBC and register the dataset as table, or should I provide an external catalog?

Thanks,
wangsan

Re: Hive integration in table API and SQL

Posted by Timo Walther <tw...@apache.org>.

Hi,

no, combining batch and streaming environments is not possible at the 
moment. However, most operations in batch can be done in streaming 
fashion as well. I would recommend to use the DataStream API as it 
provides the most flexibility in your use case.

Regards,
Timo

Am 11/21/17 um 4:41 AM schrieb wangsan:
> Hi Timo,
>
> Thanks for your reply. I do notice that the document says "A 
> |Table| is always bound to a specific |TableEnvironment|. It is not 
> possible to combine tables of different TableEnvironments in the same 
> query, e.g., to join or union them.” Does that mean there is no way I 
> can make operations, like join, on a streaming table and a batch table ?
>
> Best,
> wangsan
>
>> On 20 Nov 2017, at 9:15 PM, Timo Walther <twalthr@apache.org 
>> <ma...@apache.org>> wrote:
>>
>> Timo
>

Re: Hive integration in table API and SQL

Posted by wangsan <wa...@163.com>.

Hi Timo,

Thanks for your reply. I do notice that the document says "A Table is always bound to a specific TableEnvironment. It is not possible to combine tables of different TableEnvironments in the same query, e.g., to join or union them.” Does that mean there is no way I can make operations, like join, on a streaming table and a batch table ?

Best,
wangsan

> On 20 Nov 2017, at 9:15 PM, Timo Walther <tw...@apache.org> wrote:
> 
> Timo

Re: Hive integration in table API and SQL

Posted by Timo Walther <tw...@apache.org>.

Hi Wangsan,

yes, the Hive integration is limited so far. However, we provide an 
external catalog feature [0] that allows you to implement custom logic 
to retrieve Hive tables. I think it is not possible to do all you 
operations in Flink's SQL API right now. For now, I think you need to 
combine DataStream and SQL. E.g. the Hive lookups should happen in an 
asychronous fashion to reduce latency [1]. As far as I know, JDBC does 
not allow to retrieve records in a streaming fashion easily. That's why 
there is only a TableSink but no Source. Stream joining is limited so 
far. We will support window joins in the upcoming release and likely 
provide a full history joins in 1.5. The Table & SQL API is still a 
young API but the development happens quickly. If you are interested in 
contributing, feel free to wring on the dev@ mailing list.

Regards,
Timo

[0] 
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/table/common.html#register-an-external-catalog
[1] 
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/asyncio.html


Am 11/20/17 um 1:27 PM schrieb wangsan:
> Hi all，
>
> I am currently learning table API and SQL in Flink. I noticed that Flink does not support Hive tables as table source, and even JDBC table source are not provided. There are cases we do need to join a stream table with static Hive or other database tables to get more specific attributes, so how can I implements this functionality. Do I need to implement my own dataset connectors to load data from external tables using JDBC and register the dataset as table, or should I provide an external catalog?
>
> Thanks,
> wangsan