You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "Till Rohrmann (JIRA)" <ji...@apache.org> on 2019/02/26 15:54:00 UTC

[jira] [Closed] (FLINK-2239) print() on DataSet: stream results and print incrementally

     [ https://issues.apache.org/jira/browse/FLINK-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Till Rohrmann closed FLINK-2239.
--------------------------------
    Resolution: Won't Do

Closed because of inactivity.

> print() on DataSet: stream results and print incrementally
> ----------------------------------------------------------
>
>                 Key: FLINK-2239
>                 URL: https://issues.apache.org/jira/browse/FLINK-2239
>             Project: Flink
>          Issue Type: Improvement
>          Components: Distributed Coordination
>    Affects Versions: 0.9
>            Reporter: Maximilian Michels
>            Priority: Major
>
> Users find it counter-intuitive that {{print()}} on a DataSet internally calls {{collect()}} and fully materializes the set. This leads to out of memory errors on the client. It also leaves users with the feeling that Flink cannot handle large amount of data and that it fails frequently.
> To improve on this situation requires some major architectural changes in Flink. The easiest solution would probably be to transfer the data from the job manager to the client via the {{BlobManager}}. Alternatively, the client could directly connect to the task managers and fetch the results. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)