You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@giraph.apache.org by "Alessandro Presta (JIRA)" <ji...@apache.org> on 2012/08/08 15:44:21 UTC

[jira] [Created] (GIRAPH-293) Should aggregators be checkpointed?

Alessandro Presta created GIRAPH-293:
----------------------------------------

             Summary: Should aggregators be checkpointed?
                 Key: GIRAPH-293
                 URL: https://issues.apache.org/jira/browse/GIRAPH-293
             Project: Giraph
          Issue Type: Bug
            Reporter: Alessandro Presta


As I understand, we don't include aggregators in checkpoints because they are kept in the Zookeeper.

One of our bootcampers is working on fixing TestManualCheckpoint, which currently involves starting a new job from a checkpoint from a previous job*.
If this is a functionality we want going forward, then persistent aggregators should be checkpointed.


[*] That test relies on the fact that either aggregators are checkpointed or they are always reset at each superstep. None of these is happening, but the error cancels out with the fact that we are not actually resuming from a checkpoint, but re-running the job from scratch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: [jira] [Commented] (GIRAPH-293) Should aggregators be checkpointed?

Posted by Eli Reisman <ap...@gmail.com>.

Sorry I ended up out of the loop this week, sounds good to me too!


On Wed, Sep 19, 2012 at 4:25 PM, Avery Ching (JIRA) <ji...@apache.org> wrote:

>
>     [
> https://issues.apache.org/jira/browse/GIRAPH-293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459210#comment-13459210]
>
> Avery Ching commented on GIRAPH-293:
> ------------------------------------
>
> +1.  This is Any objections Eli?
>
> > Should aggregators be checkpointed?
> > -----------------------------------
> >
> >                 Key: GIRAPH-293
> >                 URL: https://issues.apache.org/jira/browse/GIRAPH-293
> >             Project: Giraph
> >          Issue Type: Bug
> >            Reporter: Alessandro Presta
> >            Assignee: Maja Kabiljo
> >         Attachments: GIRAPH-293.patch, GIRAPH-293.patch,
> GIRAPH-293.patch, GIRAPH-293.patch
> >
> >
> > As I understand, we don't include aggregators in checkpoints because
> they are kept in the Zookeeper.
> > One of our bootcampers is working on fixing TestManualCheckpoint, which
> currently involves starting a new job from a checkpoint from a previous
> job*.
> > If this is a functionality we want going forward, then persistent
> aggregators should be checkpointed.
> > [*] That test relies on the fact that either aggregators are
> checkpointed or they are always reset at each superstep. None of these is
> happening, but the error cancels out with the fact that we are not actually
> resuming from a checkpoint, but re-running the job from scratch.
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA
> administrators
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>

[jira] [Commented] (GIRAPH-293) Should aggregators be checkpointed?

Posted by "Maja Kabiljo (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/GIRAPH-293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439475#comment-13439475 ] 

Maja Kabiljo commented on GIRAPH-293:
-------------------------------------

https://reviews.apache.org/r/6731/diff/#index_header
Sadly it can't show which code was moved from one file to another.
                
> Should aggregators be checkpointed?
> -----------------------------------
>
>                 Key: GIRAPH-293
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-293
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Alessandro Presta
>            Assignee: Maja Kabiljo
>         Attachments: GIRAPH-293.patch, GIRAPH-293.patch
>
>
> As I understand, we don't include aggregators in checkpoints because they are kept in the Zookeeper.
> One of our bootcampers is working on fixing TestManualCheckpoint, which currently involves starting a new job from a checkpoint from a previous job*.
> If this is a functionality we want going forward, then persistent aggregators should be checkpointed.
> [*] That test relies on the fact that either aggregators are checkpointed or they are always reset at each superstep. None of these is happening, but the error cancels out with the fact that we are not actually resuming from a checkpoint, but re-running the job from scratch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-293) Should aggregators be checkpointed?

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/GIRAPH-293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13468282#comment-13468282 ] 

Hudson commented on GIRAPH-293:
-------------------------------

Integrated in Giraph-trunk-Commit #216 (See [https://builds.apache.org/job/Giraph-trunk-Commit/216/])
    GIRAPH-293: Should aggregators be checkpointed? (Revision 1393264)

     Result = SUCCESS
aching : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1393264
Files : 
* /giraph/trunk/CHANGELOG
* /giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceMaster.java
* /giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java
* /giraph/trunk/src/main/java/org/apache/giraph/graph/AggregatorHandler.java
* /giraph/trunk/src/main/java/org/apache/giraph/graph/BspService.java
* /giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceMaster.java
* /giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
* /giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java
* /giraph/trunk/src/main/java/org/apache/giraph/graph/MasterAggregatorHandler.java
* /giraph/trunk/src/main/java/org/apache/giraph/graph/Vertex.java
* /giraph/trunk/src/main/java/org/apache/giraph/graph/WorkerAggregatorHandler.java
* /giraph/trunk/src/test/java/org/apache/giraph/TestAggregatorsHandling.java
* /giraph/trunk/src/test/java/org/apache/giraph/graph/TestAggregatorsHandling.java

                
> Should aggregators be checkpointed?
> -----------------------------------
>
>                 Key: GIRAPH-293
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-293
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Alessandro Presta
>            Assignee: Maja Kabiljo
>         Attachments: GIRAPH-293.patch, GIRAPH-293.patch, GIRAPH-293.patch, GIRAPH-293.patch, GIRAPH-293.patch
>
>
> As I understand, we don't include aggregators in checkpoints because they are kept in the Zookeeper.
> One of our bootcampers is working on fixing TestManualCheckpoint, which currently involves starting a new job from a checkpoint from a previous job*.
> If this is a functionality we want going forward, then persistent aggregators should be checkpointed.
> [*] That test relies on the fact that either aggregators are checkpointed or they are always reset at each superstep. None of these is happening, but the error cancels out with the fact that we are not actually resuming from a checkpoint, but re-running the job from scratch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-293) Should aggregators be checkpointed?

Posted by "Avery Ching (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/GIRAPH-293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13468279#comment-13468279 ] 

Avery Ching commented on GIRAPH-293:
------------------------------------

+1, this is great.
                
> Should aggregators be checkpointed?
> -----------------------------------
>
>                 Key: GIRAPH-293
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-293
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Alessandro Presta
>            Assignee: Maja Kabiljo
>         Attachments: GIRAPH-293.patch, GIRAPH-293.patch, GIRAPH-293.patch, GIRAPH-293.patch, GIRAPH-293.patch
>
>
> As I understand, we don't include aggregators in checkpoints because they are kept in the Zookeeper.
> One of our bootcampers is working on fixing TestManualCheckpoint, which currently involves starting a new job from a checkpoint from a previous job*.
> If this is a functionality we want going forward, then persistent aggregators should be checkpointed.
> [*] That test relies on the fact that either aggregators are checkpointed or they are always reset at each superstep. None of these is happening, but the error cancels out with the fact that we are not actually resuming from a checkpoint, but re-running the job from scratch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-293) Should aggregators be checkpointed?

Posted by "Avery Ching (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/GIRAPH-293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459210#comment-13459210 ] 

Avery Ching commented on GIRAPH-293:
------------------------------------

+1.  This is Any objections Eli? 
                
> Should aggregators be checkpointed?
> -----------------------------------
>
>                 Key: GIRAPH-293
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-293
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Alessandro Presta
>            Assignee: Maja Kabiljo
>         Attachments: GIRAPH-293.patch, GIRAPH-293.patch, GIRAPH-293.patch, GIRAPH-293.patch
>
>
> As I understand, we don't include aggregators in checkpoints because they are kept in the Zookeeper.
> One of our bootcampers is working on fixing TestManualCheckpoint, which currently involves starting a new job from a checkpoint from a previous job*.
> If this is a functionality we want going forward, then persistent aggregators should be checkpointed.
> [*] That test relies on the fact that either aggregators are checkpointed or they are always reset at each superstep. None of these is happening, but the error cancels out with the fact that we are not actually resuming from a checkpoint, but re-running the job from scratch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (GIRAPH-293) Should aggregators be checkpointed?

Posted by "Maja Kabiljo (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/GIRAPH-293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Maja Kabiljo updated GIRAPH-293:
--------------------------------

    Attachment: GIRAPH-293.patch

Rebasing again.

Can someone please take a look, after all checkpointing is broken without this if there is at least one aggregator (it's not just that aggregators aren't checkpointed, it will crash). 

If nobody finds the time to look into it, I'll have to remove aggregator handlers and put the code back to BspService classes. That way we'll get the smaller patch and uglier code ;-)
                
> Should aggregators be checkpointed?
> -----------------------------------
>
>                 Key: GIRAPH-293
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-293
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Alessandro Presta
>            Assignee: Maja Kabiljo
>         Attachments: GIRAPH-293.patch, GIRAPH-293.patch, GIRAPH-293.patch
>
>
> As I understand, we don't include aggregators in checkpoints because they are kept in the Zookeeper.
> One of our bootcampers is working on fixing TestManualCheckpoint, which currently involves starting a new job from a checkpoint from a previous job*.
> If this is a functionality we want going forward, then persistent aggregators should be checkpointed.
> [*] That test relies on the fact that either aggregators are checkpointed or they are always reset at each superstep. None of these is happening, but the error cancels out with the fact that we are not actually resuming from a checkpoint, but re-running the job from scratch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-293) Should aggregators be checkpointed?

Posted by "Maja Kabiljo (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/GIRAPH-293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435154#comment-13435154 ] 

Maja Kabiljo commented on GIRAPH-293:
-------------------------------------

Ops, I meant GIRAPH-296 and GIRAPH-297.
                
> Should aggregators be checkpointed?
> -----------------------------------
>
>                 Key: GIRAPH-293
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-293
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Alessandro Presta
>            Assignee: Maja Kabiljo
>         Attachments: GIRAPH-293.patch
>
>
> As I understand, we don't include aggregators in checkpoints because they are kept in the Zookeeper.
> One of our bootcampers is working on fixing TestManualCheckpoint, which currently involves starting a new job from a checkpoint from a previous job*.
> If this is a functionality we want going forward, then persistent aggregators should be checkpointed.
> [*] That test relies on the fact that either aggregators are checkpointed or they are always reset at each superstep. None of these is happening, but the error cancels out with the fact that we are not actually resuming from a checkpoint, but re-running the job from scratch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (GIRAPH-293) Should aggregators be checkpointed?

Posted by "Maja Kabiljo (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/GIRAPH-293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Maja Kabiljo updated GIRAPH-293:
--------------------------------

    Attachment: GIRAPH-293.patch

"-1 release audit" was because of the wrong format in which Idea describes deleted files. Uploading again, no changes in the code.
                
> Should aggregators be checkpointed?
> -----------------------------------
>
>                 Key: GIRAPH-293
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-293
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Alessandro Presta
>            Assignee: Maja Kabiljo
>         Attachments: GIRAPH-293.patch, GIRAPH-293.patch, GIRAPH-293.patch, GIRAPH-293.patch
>
>
> As I understand, we don't include aggregators in checkpoints because they are kept in the Zookeeper.
> One of our bootcampers is working on fixing TestManualCheckpoint, which currently involves starting a new job from a checkpoint from a previous job*.
> If this is a functionality we want going forward, then persistent aggregators should be checkpointed.
> [*] That test relies on the fact that either aggregators are checkpointed or they are always reset at each superstep. None of these is happening, but the error cancels out with the fact that we are not actually resuming from a checkpoint, but re-running the job from scratch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-293) Should aggregators be checkpointed?

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/GIRAPH-293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453252#comment-13453252 ] 

Eli Reisman commented on GIRAPH-293:
------------------------------------

Looks great, and a lot of work too.

So the deal with this patch is to separate out the aggregator handling code into different modules but still operating at this stage on zookeeper?

There is definitely code duplication in the master/worker handlers is this needed or will all this be changing in the handler modules as we move to network connections and away from zk? What exactly is the difference for the master and worker handling code? Could there be a common base class handler that implements the common functions worker and master handlers need? Or is the difference hard to factor out?

Anyone else have any particular problems with this code or can we commit this?

                
> Should aggregators be checkpointed?
> -----------------------------------
>
>                 Key: GIRAPH-293
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-293
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Alessandro Presta
>            Assignee: Maja Kabiljo
>         Attachments: GIRAPH-293.patch, GIRAPH-293.patch, GIRAPH-293.patch
>
>
> As I understand, we don't include aggregators in checkpoints because they are kept in the Zookeeper.
> One of our bootcampers is working on fixing TestManualCheckpoint, which currently involves starting a new job from a checkpoint from a previous job*.
> If this is a functionality we want going forward, then persistent aggregators should be checkpointed.
> [*] That test relies on the fact that either aggregators are checkpointed or they are always reset at each superstep. None of these is happening, but the error cancels out with the fact that we are not actually resuming from a checkpoint, but re-running the job from scratch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-293) Should aggregators be checkpointed?

Posted by "Giraph QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/GIRAPH-293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13454088#comment-13454088 ] 

Giraph QA commented on GIRAPH-293:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12544674/GIRAPH-293.patch
  against trunk revision 1383115.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 2 new or modified test files.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 javadoc.  The applied patch does not increase the total number of javadoc warnings.

    +1 checkstyle.  The patch generated 0 code style errors.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    -1 release audit.  The applied patch generated 1 release audit warnings.

    +1 core tests.  The patch passed unit tests in ..

Test results: https://builds.apache.org/job/PreCommit-GIRAPH-Build/39//testReport/
Release audit warnings: https://builds.apache.org/job/PreCommit-GIRAPH-Build/39//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-GIRAPH-Build/39//console

This message is automatically generated.
                
> Should aggregators be checkpointed?
> -----------------------------------
>
>                 Key: GIRAPH-293
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-293
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Alessandro Presta
>            Assignee: Maja Kabiljo
>         Attachments: GIRAPH-293.patch, GIRAPH-293.patch, GIRAPH-293.patch
>
>
> As I understand, we don't include aggregators in checkpoints because they are kept in the Zookeeper.
> One of our bootcampers is working on fixing TestManualCheckpoint, which currently involves starting a new job from a checkpoint from a previous job*.
> If this is a functionality we want going forward, then persistent aggregators should be checkpointed.
> [*] That test relies on the fact that either aggregators are checkpointed or they are always reset at each superstep. None of these is happening, but the error cancels out with the fact that we are not actually resuming from a checkpoint, but re-running the job from scratch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (GIRAPH-293) Should aggregators be checkpointed?

Posted by "Maja Kabiljo (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/GIRAPH-293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Maja Kabiljo updated GIRAPH-293:
--------------------------------

    Attachment: GIRAPH-293.patch

Rebased, but a test for this fails now because of GIRAPH-355 (which was introduced recently with GIRAPH-274).
                
> Should aggregators be checkpointed?
> -----------------------------------
>
>                 Key: GIRAPH-293
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-293
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Alessandro Presta
>            Assignee: Maja Kabiljo
>         Attachments: GIRAPH-293.patch, GIRAPH-293.patch, GIRAPH-293.patch, GIRAPH-293.patch, GIRAPH-293.patch
>
>
> As I understand, we don't include aggregators in checkpoints because they are kept in the Zookeeper.
> One of our bootcampers is working on fixing TestManualCheckpoint, which currently involves starting a new job from a checkpoint from a previous job*.
> If this is a functionality we want going forward, then persistent aggregators should be checkpointed.
> [*] That test relies on the fact that either aggregators are checkpointed or they are always reset at each superstep. None of these is happening, but the error cancels out with the fact that we are not actually resuming from a checkpoint, but re-running the job from scratch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (GIRAPH-293) Should aggregators be checkpointed?

Posted by "Maja Kabiljo (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/GIRAPH-293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Maja Kabiljo updated GIRAPH-293:
--------------------------------

    Attachment: GIRAPH-293.patch

Making aggregators work correctly with checkpointing - saving the aggregator name, class, value and whether it's persistent. Apart from that, I removed the code for aggregators handling from BspServiceWorker and BspServiceMaster to separate classes, since I think it's cleaner this way, and those two classes do too much different stuff as it is. But that's the reason why the patch looks big. Later with GIRAPH-273 AggregatorHandler classes should become more independent of BspServices.

I added test for aggregator serialization and manual restarting from checkpoint (that one also relies on recent GIRAPH-296 and GIRAPH-298 working). The patch passes mvn verify and tests in pseudo-distributed mode.
                
> Should aggregators be checkpointed?
> -----------------------------------
>
>                 Key: GIRAPH-293
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-293
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Alessandro Presta
>            Assignee: Maja Kabiljo
>         Attachments: GIRAPH-293.patch
>
>
> As I understand, we don't include aggregators in checkpoints because they are kept in the Zookeeper.
> One of our bootcampers is working on fixing TestManualCheckpoint, which currently involves starting a new job from a checkpoint from a previous job*.
> If this is a functionality we want going forward, then persistent aggregators should be checkpointed.
> [*] That test relies on the fact that either aggregators are checkpointed or they are always reset at each superstep. None of these is happening, but the error cancels out with the fact that we are not actually resuming from a checkpoint, but re-running the job from scratch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-293) Should aggregators be checkpointed?

Posted by "Avery Ching (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/GIRAPH-293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453265#comment-13453265 ] 

Avery Ching commented on GIRAPH-293:
------------------------------------

Quick look is good to me.
                
> Should aggregators be checkpointed?
> -----------------------------------
>
>                 Key: GIRAPH-293
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-293
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Alessandro Presta
>            Assignee: Maja Kabiljo
>         Attachments: GIRAPH-293.patch, GIRAPH-293.patch, GIRAPH-293.patch
>
>
> As I understand, we don't include aggregators in checkpoints because they are kept in the Zookeeper.
> One of our bootcampers is working on fixing TestManualCheckpoint, which currently involves starting a new job from a checkpoint from a previous job*.
> If this is a functionality we want going forward, then persistent aggregators should be checkpointed.
> [*] That test relies on the fact that either aggregators are checkpointed or they are always reset at each superstep. None of these is happening, but the error cancels out with the fact that we are not actually resuming from a checkpoint, but re-running the job from scratch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (GIRAPH-293) Should aggregators be checkpointed?

Posted by "Maja Kabiljo (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/GIRAPH-293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Maja Kabiljo reassigned GIRAPH-293:
-----------------------------------

    Assignee: Maja Kabiljo
    
> Should aggregators be checkpointed?
> -----------------------------------
>
>                 Key: GIRAPH-293
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-293
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Alessandro Presta
>            Assignee: Maja Kabiljo
>
> As I understand, we don't include aggregators in checkpoints because they are kept in the Zookeeper.
> One of our bootcampers is working on fixing TestManualCheckpoint, which currently involves starting a new job from a checkpoint from a previous job*.
> If this is a functionality we want going forward, then persistent aggregators should be checkpointed.
> [*] That test relies on the fact that either aggregators are checkpointed or they are always reset at each superstep. None of these is happening, but the error cancels out with the fact that we are not actually resuming from a checkpoint, but re-running the job from scratch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-293) Should aggregators be checkpointed?

Posted by "Maja Kabiljo (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/GIRAPH-293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453841#comment-13453841 ] 

Maja Kabiljo commented on GIRAPH-293:
-------------------------------------

Thank you for looking!

The deal with the patch was to make checkpointing work, and I separated aggregator code as a bonus :-)

For the code duplication you mean the parts which read/write from ZooKeeper? I didn't pay much attention to making those parts nice, since they are going away soon. I wanted to minimize the change and make it as easy to review as possible, so you can see that those parts are really just copied directly from BspService classes. That's why I keep saying the patch is much smaller than it looks like. The differences between worker and master code are for example that one writes just aggregator names and values while the other also writes aggregator classnames (the opposite for reading); worker just reads final values while master reads values from all workers and aggregates them along the way. 
                
> Should aggregators be checkpointed?
> -----------------------------------
>
>                 Key: GIRAPH-293
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-293
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Alessandro Presta
>            Assignee: Maja Kabiljo
>         Attachments: GIRAPH-293.patch, GIRAPH-293.patch, GIRAPH-293.patch
>
>
> As I understand, we don't include aggregators in checkpoints because they are kept in the Zookeeper.
> One of our bootcampers is working on fixing TestManualCheckpoint, which currently involves starting a new job from a checkpoint from a previous job*.
> If this is a functionality we want going forward, then persistent aggregators should be checkpointed.
> [*] That test relies on the fact that either aggregators are checkpointed or they are always reset at each superstep. None of these is happening, but the error cancels out with the fact that we are not actually resuming from a checkpoint, but re-running the job from scratch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (GIRAPH-293) Should aggregators be checkpointed?

Posted by "Maja Kabiljo (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/GIRAPH-293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Maja Kabiljo updated GIRAPH-293:
--------------------------------

    Attachment: GIRAPH-293.patch

Forgot to mark as 'Patch Available' last time.

I'm just rebasing it now. Do I need to upload new patch when only line numbers and revision ids are changed?
                
> Should aggregators be checkpointed?
> -----------------------------------
>
>                 Key: GIRAPH-293
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-293
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Alessandro Presta
>            Assignee: Maja Kabiljo
>         Attachments: GIRAPH-293.patch, GIRAPH-293.patch
>
>
> As I understand, we don't include aggregators in checkpoints because they are kept in the Zookeeper.
> One of our bootcampers is working on fixing TestManualCheckpoint, which currently involves starting a new job from a checkpoint from a previous job*.
> If this is a functionality we want going forward, then persistent aggregators should be checkpointed.
> [*] That test relies on the fact that either aggregators are checkpointed or they are always reset at each superstep. None of these is happening, but the error cancels out with the fact that we are not actually resuming from a checkpoint, but re-running the job from scratch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-293) Should aggregators be checkpointed?

Posted by "Alessandro Presta (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/GIRAPH-293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439414#comment-13439414 ] 

Alessandro Presta commented on GIRAPH-293:
------------------------------------------

If there's no merge conflict, you don't need to post the new one.
                
> Should aggregators be checkpointed?
> -----------------------------------
>
>                 Key: GIRAPH-293
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-293
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Alessandro Presta
>            Assignee: Maja Kabiljo
>         Attachments: GIRAPH-293.patch, GIRAPH-293.patch
>
>
> As I understand, we don't include aggregators in checkpoints because they are kept in the Zookeeper.
> One of our bootcampers is working on fixing TestManualCheckpoint, which currently involves starting a new job from a checkpoint from a previous job*.
> If this is a functionality we want going forward, then persistent aggregators should be checkpointed.
> [*] That test relies on the fact that either aggregators are checkpointed or they are always reset at each superstep. None of these is happening, but the error cancels out with the fact that we are not actually resuming from a checkpoint, but re-running the job from scratch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-293) Should aggregators be checkpointed?

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/GIRAPH-293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453211#comment-13453211 ] 

Eli Reisman commented on GIRAPH-293:
------------------------------------

I will take a look. Scary huh?
                
> Should aggregators be checkpointed?
> -----------------------------------
>
>                 Key: GIRAPH-293
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-293
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Alessandro Presta
>            Assignee: Maja Kabiljo
>         Attachments: GIRAPH-293.patch, GIRAPH-293.patch, GIRAPH-293.patch
>
>
> As I understand, we don't include aggregators in checkpoints because they are kept in the Zookeeper.
> One of our bootcampers is working on fixing TestManualCheckpoint, which currently involves starting a new job from a checkpoint from a previous job*.
> If this is a functionality we want going forward, then persistent aggregators should be checkpointed.
> [*] That test relies on the fact that either aggregators are checkpointed or they are always reset at each superstep. None of these is happening, but the error cancels out with the fact that we are not actually resuming from a checkpoint, but re-running the job from scratch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira