You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@giraph.apache.org by Benjamin Heitmann <be...@deri.org> on 2012/05/02 21:15:37 UTC

Possible bug when resetting aggregators ? (and missing documentation)

Hello, 

I had to use aggregators for various statistic reporting tasks, 
and I noticed that the aggregator operations need to be used in a very specific squence, 
especially when the aggregator is getting a reset between supersteps. 

I found that the sequence described in RandomMessageBenchmark (in the org.apache.giraph.benchmark package)
results in consistent counts for one aggregator across all workers. 
The most important thing, seems to be to call the reset method setAggregatedValue() in preSuperstep() of the WorkerContext class, 
before calling this.useAggregator(). 

If I called the reset method in postSuperstep(), then every worker reported a different value for the aggregator. 

However, the aggregator which gets the reset between supersteps, still is wrong. 

I know this, because a second aggregator counts the same thing, and reports it after each superstep, 
without getting a reset. 

Is this a known issue ? Should I file a bug report on it ? 


In addition, it would be great to document correct usage of the aggregators somewhere. 
Even just in the javadoc of the aggregator interface might be enough. 

Should I try to add some documentation to the aggregator interface?
(org.apache.giraph.graph.Aggregator.java)
Then the committers can correct me if that documentation is wrong, I guess. 

Re: Possible bug when resetting aggregators ? (and missing documentation)

Posted by Avery Ching <ac...@apache.org>.
I think you're right that the javadoc isn't specific enough.

    * Use a registered aggregator in current superstep.
    * Even when the same aggregator should be used in the next
    * superstep, useAggregator needs to be called at the beginning
    * of that superstep in preSuperstep().
    *
    * @param name Name of aggregator
    * @return boolean (false when not registered)
    */
   boolean useAggregator(String name);

This should be augmented to say that none of the Aggregator methods 
should be called until this method is invoke.  Feel free to file a JIRA 
and fix.  Thanks!

If you would like to, please feel free to add Aggregator documentation 
to https://cwiki.apache.org/confluence/display/GIRAPH/Index

Avery

On 5/2/12 12:15 PM, Benjamin Heitmann wrote:
> Hello,
>
> I had to use aggregators for various statistic reporting tasks,
> and I noticed that the aggregator operations need to be used in a very specific squence,
> especially when the aggregator is getting a reset between supersteps.
>
> I found that the sequence described in RandomMessageBenchmark (in the org.apache.giraph.benchmark package)
> results in consistent counts for one aggregator across all workers.
> The most important thing, seems to be to call the reset method setAggregatedValue() in preSuperstep() of the WorkerContext class,
> before calling this.useAggregator().
>
> If I called the reset method in postSuperstep(), then every worker reported a different value for the aggregator.
>
> However, the aggregator which gets the reset between supersteps, still is wrong.
>
> I know this, because a second aggregator counts the same thing, and reports it after each superstep,
> without getting a reset.
>
> Is this a known issue ? Should I file a bug report on it ?
>
>
> In addition, it would be great to document correct usage of the aggregators somewhere.
> Even just in the javadoc of the aggregator interface might be enough.
>
> Should I try to add some documentation to the aggregator interface?
> (org.apache.giraph.graph.Aggregator.java)
> Then the committers can correct me if that documentation is wrong, I guess.