You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by wangsan <wa...@163.com> on 2017/11/20 12:27:29 UTC
Hive integration in table API and SQL
Hi all,
I am currently learning table API and SQL in Flink. I noticed that Flink does not support Hive tables as table source, and even JDBC table source are not provided. There are cases we do need to join a stream table with static Hive or other database tables to get more specific attributes, so how can I implements this functionality. Do I need to implement my own dataset connectors to load data from external tables using JDBC and register the dataset as table, or should I provide an external catalog?
Thanks,
wangsan
Re: Hive integration in table API and SQL
Posted by Timo Walther <tw...@apache.org>.
Hi,
no, combining batch and streaming environments is not possible at the
moment. However, most operations in batch can be done in streaming
fashion as well. I would recommend to use the DataStream API as it
provides the most flexibility in your use case.
Regards,
Timo
Am 11/21/17 um 4:41 AM schrieb wangsan:
> Hi Timo,
>
> Thanks for your reply. I do notice that the document says "A
> |Table| is always bound to a specific |TableEnvironment|. It is not
> possible to combine tables of different TableEnvironments in the same
> query, e.g., to join or union them.” Does that mean there is no way I
> can make operations, like join, on a streaming table and a batch table ?
>
> Best,
> wangsan
>
>> On 20 Nov 2017, at 9:15 PM, Timo Walther <twalthr@apache.org
>> <ma...@apache.org>> wrote:
>>
>> Timo
>
Re: Hive integration in table API and SQL
Posted by wangsan <wa...@163.com>.
Hi Timo,
Thanks for your reply. I do notice that the document says "A Table is always bound to a specific TableEnvironment. It is not possible to combine tables of different TableEnvironments in the same query, e.g., to join or union them.” Does that mean there is no way I can make operations, like join, on a streaming table and a batch table ?
Best,
wangsan
> On 20 Nov 2017, at 9:15 PM, Timo Walther <tw...@apache.org> wrote:
>
> Timo
Re: Hive integration in table API and SQL
Posted by Timo Walther <tw...@apache.org>.
Hi Wangsan,
yes, the Hive integration is limited so far. However, we provide an
external catalog feature [0] that allows you to implement custom logic
to retrieve Hive tables. I think it is not possible to do all you
operations in Flink's SQL API right now. For now, I think you need to
combine DataStream and SQL. E.g. the Hive lookups should happen in an
asychronous fashion to reduce latency [1]. As far as I know, JDBC does
not allow to retrieve records in a streaming fashion easily. That's why
there is only a TableSink but no Source. Stream joining is limited so
far. We will support window joins in the upcoming release and likely
provide a full history joins in 1.5. The Table & SQL API is still a
young API but the development happens quickly. If you are interested in
contributing, feel free to wring on the dev@ mailing list.
Regards,
Timo
[0]
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/table/common.html#register-an-external-catalog
[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/asyncio.html
Am 11/20/17 um 1:27 PM schrieb wangsan:
> Hi all,
>
> I am currently learning table API and SQL in Flink. I noticed that Flink does not support Hive tables as table source, and even JDBC table source are not provided. There are cases we do need to join a stream table with static Hive or other database tables to get more specific attributes, so how can I implements this functionality. Do I need to implement my own dataset connectors to load data from external tables using JDBC and register the dataset as table, or should I provide an external catalog?
>
> Thanks,
> wangsan