You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "Caizhi Weng (Jira)" <ji...@apache.org> on 2020/02/26 03:53:00 UTC

[jira] [Issue Comment Deleted] (FLINK-14807) Add Table#collect api for fetching data to client

     [ https://issues.apache.org/jira/browse/FLINK-14807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Caizhi Weng updated FLINK-14807:
--------------------------------
    Comment: was deleted

(was: Thanks for the comment [~godfreyhe]. I'll list more details about my design below.
 * How can sink tell the REST server its address and port?
 >> This is the hardest part of the design. I actually haven't come up with a very good solution. I now have three options listed below:
 ** Option A. [FLIP-27|https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface] will introduce Operator Coordinator which can perfectly solve this issue. But according to [~jqin] we won't have this feature until 1.11.
 ** Option B. We can use accumulators to tell the job manager the address and port of the socket server. But as accumulators are sent with heartbeats and the default interval between heartbeats are 10s, this will greatly impact small jobs.
 ** Option C. Extract TaskConfig from JobGraph in job manager and insert server information into it. This requires the socket server to start in the job manager instead of sink, and it seems to be quite hacky...
 * What if a job without ordering restarts?
 >> As streaming jobs are backed by checkpoints this is not a problem. For batch jobs I'm afraid we'll have to introduce a special element in the resulting iterator indicating that the previous results provided are now invalid.)

> Add Table#collect api for fetching data to client
> -------------------------------------------------
>
>                 Key: FLINK-14807
>                 URL: https://issues.apache.org/jira/browse/FLINK-14807
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table SQL / API
>    Affects Versions: 1.9.1
>            Reporter: Jeff Zhang
>            Priority: Major
>              Labels: usability
>             Fix For: 1.11.0
>
>         Attachments: table-collect.png
>
>
> Currently, it is very unconvinient for user to fetch data of flink job unless specify sink expclitly and then fetch data from this sink via its api (e.g. write to hdfs sink, then read data from hdfs). However, most of time user just want to get the data and do whatever processing he want. So it is very necessary for flink to provide api Table#collect for this purpose. 
>  
> Other apis such as Table#head, Table#print is also helpful.  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)