You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/09/19 20:24:01 UTC

[jira] [Commented] (GIRAPH-1160) Fix memory estimation in MemoryEstimatorOrcal

    [ https://issues.apache.org/jira/browse/GIRAPH-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172284#comment-16172284 ] 

ASF GitHub Bot commented on GIRAPH-1160:
----------------------------------------

GitHub user dlogothetis opened a pull request:

    https://github.com/apache/giraph/pull/49

    Fix bug in memory estimation

    Method MemoryEstimatorOracle.calculateRegression() exits if the number of valid columns to use for the regression is not the same as the total number of columns. This is wrong, the regression can still run on only the valid columns. This causes memory estimation to never be used in practice, and OOC starts spilling only when memory usage gets very high.
    
    This is fixed in https://github.com/apache/giraph/pull/34 too, but I want to make these changes one-by-one so that we can test in isolation.
    
    Tests:
    - mvn clean install
    - Snapshot tests, including snapshot test that uses OOC.
    - Run 3 production jobs and verified that this reduces data spills and jobs finish faster. The max % spilled is reduced by more than 40%.
    
    JIRA: https://issues.apache.org/jira/browse/GIRAPH-1160
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dlogothetis/giraph fix_mem_est

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/giraph/pull/49.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #49
    
----
commit f5a124beef6b65bf8f9178120fefc1360566fda6
Author: Dionysios Logothetis <di...@fb.com>
Date:   2017-09-19T14:47:56Z

    Fix bug in memory estimation

----


> Fix memory estimation in MemoryEstimatorOrcal
> ---------------------------------------------
>
>                 Key: GIRAPH-1160
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-1160
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Dionysios Logothetis
>
> Method MemoryEstimatorOracle.calculateRegression() exits if the number of valid columns to use for the regression is not the same as the total number of columns. This is wrong, the regression can run on only the valid columns. This causes the memory estimation to be very off.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)