You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by Xazax-hun <gi...@git.apache.org> on 2016/03/06 11:16:29 UTC

[GitHub] flink pull request: [FLINK-3322] MemoryManager creates too much GC...

GitHub user Xazax-hun opened a pull request:

    https://github.com/apache/flink/pull/1769

    [FLINK-3322] MemoryManager creates too much GC pressure with iterative jobs.

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/Xazax-hun/flink MemoryManager

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/1769.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1769
    
----
commit 84e43270415b1a83590e374b6a5440e73e91dad1
Author: Gabor Horvath <xa...@gmail.com>
Date:   2016-03-05T08:37:32Z

    [FLINK-3322] Added the test case of ggevay to reproduce the performance
    issue.

commit 3fcce08cffae95f48855b58a576603712f522e67
Author: Gabor Horvath <xa...@gmail.com>
Date:   2016-03-05T13:36:31Z

    [FLINK-3322] MemoryManager creates too much GC pressure with iterative jobs.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3322] MemoryManager creates too much GC...

Posted by Xazax-hun <gi...@git.apache.org>.
Github user Xazax-hun commented on the pull request:

    https://github.com/apache/flink/pull/1769#issuecomment-194419456
  
    I think the soft references solution is not worth investigating, and I agree that the best way to solve the problem is to make the operators smarter for the iterative jobs. Do you want to merge this pull request to temporarily solve the problem until the other solution is materialized? In case the answer is no, I think I might give this Jira back to someone else to be able to focus on serialization (in case the community accept me to work on that.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3322] MemoryManager creates too much GC...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the pull request:

    https://github.com/apache/flink/pull/1769#issuecomment-194250033
  
    I think the right way to solve that would actually not be in the MemoryManager (to cache), but in the operators that know that they will need the memory again and again and will hold onto it.
    
    For another feature in the code, I am currently changing the access to memory such that it goes through "allocators". These are used by for example the sorter, to get their memory segments from the memory manager. It would be simple to create them in the batch operators such that they don't immediately release to the mem manager, but only after the operator is disposed. That means memory segments will be held onto through all iterations.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3322] MemoryManager creates too much GC...

Posted by ggevay <gi...@git.apache.org>.
Github user ggevay commented on the pull request:

    https://github.com/apache/flink/pull/1769#issuecomment-193635557
  
    The Memory tab in Java Mission Control can probably help with investigating this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3322] MemoryManager creates too much GC...

Posted by ggevay <gi...@git.apache.org>.
Github user ggevay commented on the pull request:

    https://github.com/apache/flink/pull/1769#issuecomment-192962635
  
    > I would be curious about the soft reference implementation as in DetaIteration cases I think it is a valid situation that the job needs less and less memory.
    
    Are there any operators that dynamically adjust how much memory they take based on the amount of input data?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3322] MemoryManager creates too much GC...

Posted by mbalassi <gi...@git.apache.org>.
Github user mbalassi commented on the pull request:

    https://github.com/apache/flink/pull/1769#issuecomment-192954621
  
    I would be curious about the soft reference implementation as in DetaIteration cases I think it is a valid situation that the job needs less and less memory. Please add the licence header to the manual test file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3322] MemoryManager creates too much GC...

Posted by Xazax-hun <gi...@git.apache.org>.
Github user Xazax-hun commented on the pull request:

    https://github.com/apache/flink/pull/1769#issuecomment-193443590
  
    > I would be curious about the soft reference implementation as in DetaIteration cases I think it is a valid situation that the job needs less and less memory. Please add the licence header to the manual test file.
    
    This is my first attempt to use a "soft reference pool":
    https://github.com/Xazax-hun/flink/commit/2694910e53b2f86412f2a9c3e4d83cf1705e3c65
    
    I could not measure any performance gain compared to the code before this pull request as a baseline. Hopefully I will have some extra time tomorrow, so I can further investigate whether there is something wrong with my first implementation or the approach.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---