You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/08/01 20:30:20 UTC

[jira] [Commented] (FLINK-2090) toString of CollectionInputFormat takes long time when the collection is huge

    [ https://issues.apache.org/jira/browse/FLINK-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402759#comment-15402759 ] 

ASF GitHub Bot commented on FLINK-2090:
---------------------------------------

GitHub user mushketyk opened a pull request:

    https://github.com/apache/flink/pull/2323

    [FLINK-2090] toString of CollectionInputFormat takes long time when t…

    Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration.
    If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the [How To Contribute guide](http://flink.apache.org/how-to-contribute.html).
    In addition to going through the list, please provide a meaningful description of your changes.
    
    - [x] General
      - The pull request references the related JIRA issue ("[FLINK-XXX] Jira title text")
      - The pull request addresses only one issue
      - Each commit in the PR has a meaningful commit message (including the JIRA id)
    
    - [x] Documentation
      - Documentation has been added for new functionality
      - Old documentation affected by the pull request has been updated
      - JavaDoc for public methods has been added
    
    - [x] Tests & Build
      - Functionality added by the pull request is covered by tests
      - `mvn clean verify` has been executed successfully locally or a Travis build has passed
    
    …he collection is huge

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mushketyk/flink fast-to-string

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/2323.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2323
    
----
commit 76c5b7dd1cf12b17b7601b2d1c8ea7cc475a031c
Author: Ivan Mushketyk <iv...@gmail.com>
Date:   2016-08-01T19:39:17Z

    [FLINK-2090] toString of CollectionInputFormat takes long time when the collection is huge

----


> toString of CollectionInputFormat takes long time when the collection is huge
> -----------------------------------------------------------------------------
>
>                 Key: FLINK-2090
>                 URL: https://issues.apache.org/jira/browse/FLINK-2090
>             Project: Flink
>          Issue Type: Improvement
>            Reporter: Till Rohrmann
>            Assignee: Ivan Mushketyk
>            Priority: Minor
>
> The {{toString}} method of {{CollectionInputFormat}} calls {{toString}} on its underlying {{Collection}}. Thus, {{toString}} is called for each element of the collection. If the {{Collection}} contains many elements or the individual {{toString}} calls for each element take a long time, then the string generation can take a considerable amount of time. [~mikiobraun] noticed that when he inserted several jBLAS matrices into Flink.
> The {{toString}} method is mainly used for logging statements in {{DataSourceNode}}'s {{computeOperatorSpecificDefaultEstimates}} method and in {{JobGraphGenerator.getDescriptionForUserCode}}. I'm wondering whether it is necessary to print the complete content of the underlying {{Collection}} or if it's not enough to print only the first 3 elements in the {{toString}} method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)