You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hama.apache.org by Steven van Beelen <sm...@gmail.com> on 2013/04/17 11:27:00 UTC

Possible Aggregator Problem

Hello,

I'm creating my own pagerank in hama for a testing and I think I found a
problem with the AverageAggregator. I'm not sure if it is me or the the
AverageAggregator class in general, but I believe it just returns the mean
of all the values instead of the average difference between the old and new
value as intended.

For testing, I created my own AbsDiffAggregator and AverageAggregator
classes, using FloatWritable instead of DoubleWritables. The same problem
still occured: I got a mean of all the values in the graph instead of an
average difference.

Could someone tell me if I'm doing something wrong or what I should provide
to better explain my problem?

Regards,
Steven van Beelen, Vrije Universiteit of Amsterdam

Re: Possible Aggregator Problem

Posted by "Edward J. Yoon" <ed...@apache.org>.
Since the 'aggregator' is being used for counting the number of
updated vertices as well, I think there's no bug.

Can you provide your scenario as a unit test?

On Wed, Apr 24, 2013 at 7:47 PM, Steven van Beelen <sm...@gmail.com> wrote:
> I'm sorry to say so, but the problem still arises. Additionally I found
> that 'aggregate(v, v.getValue())'
> is called twice as often as 'aggregate(v, lastValue, v.getValue())'.
> I can not seem to find in the AggregationRunner or GraphJobRunner why this
> is so.
> But, in a case were five vertices exists, aggregate(v, v.getValue()) will
> be called five times, directly followed by the finalizeAggregation() call.
> But proceeding this, five pairs of aggregate(v, v.getValue()) and 'aggregate(v,
> lastValue, v.getValue())' are called as logically follows from the
> public void aggregateVertex(M lastValue, Vertex<V, E, M> v) in the
> AggregationRunner class.
>
> Additionally to this I could give you my code, maybe some flaw in there
> causes this problem?
>
>
> On Wed, Apr 24, 2013 at 10:43 AM, Edward J. Yoon <ed...@apache.org>wrote:
>
>> Steven,
>>
>> Could you please try your application again with
>> http://people.apache.org/~edwardyoon/dist/test/ and feedback me
>> whether it works correctly as you expected?
>>
>> On Wed, Apr 24, 2013 at 4:53 PM, Edward J. Yoon <ed...@apache.org>
>> wrote:
>> > Thanks for your report. It could be a bug. I'll have a look at it now.
>> >
>> > On Wed, Apr 24, 2013 at 4:48 PM, Steven van Beelen <sm...@gmail.com>
>> wrote:
>> >> I'm running version 0.6.1.
>> >> Looking at the results I found through testing,
>> >>
>> >>   public void aggregateVertex(M lastValue, Vertex<V, E, M> v)
>> >>
>> >> doesn't seem to be the problem. Both 'aggregate(v, v.getValue())' and
>> >> 'aggregate(v, lastValue, v.getValue())'
>> >> are called correctly and work on the same values.
>> >>
>> >> However, when finalizing through 'finalizeAggregation()' in the
>> >> 'public void doMasterAggregation(MapWritable updatedCnt)' method,
>> >>
>> >> the value aggregated upon by 'aggregate(v, lastValue, v.getValue())'
>> >> is lost. That is what happens at me.
>> >>
>> >> Could it be that I'm implementing the aggregate methods incorrect?
>> >>
>> >> In the end however, I can not find a direct bug in TRUNK[1], although
>> >> it is not clear to me what/which part of the code was changed through
>> >> the ticket on JIRA.
>> >>
>> >>
>> >>
>> >>
>> >> On Wed, Apr 24, 2013 at 2:41 AM, Edward J. Yoon <edwardyoon@apache.org
>> >wrote:
>> >>
>> >>> I found the ticket on JIRA -
>> >>> https://issues.apache.org/jira/browse/HAMA-659
>> >>>
>> >>> And it seems already fixed.
>> >>>
>> >>> What is your version of hama here? and can you find some bug in
>> TRUNK[1]?
>> >>>
>> >>> 1.
>> >>>
>> http://svn.apache.org/repos/asf/hama/trunk/graph/src/main/java/org/apache/hama/graph/AggregationRunner.java
>> >>>
>> >>> On Tue, Apr 23, 2013 at 9:41 PM, Steven van Beelen <
>> smcvbeelen@gmail.com>
>> >>> wrote:
>> >>> > Could anyone tell me if I'm correct concerning the possible problem I
>> >>> > posted and replied on in the previous two emails?
>> >>> >
>> >>> >
>> >>> > On Wed, Apr 17, 2013 at 5:08 PM, Steven van Beelen <
>> smcvbeelen@gmail.com
>> >>> >wrote:
>> >>> >
>> >>> >> Additionally, I found this in the mail archives:
>> >>> >>
>> >>> >>
>> >>>
>> http://mail-archives.apache.org/mod_mbox/hama-user/201210.mbox/%3CCAJ-=ys=W8F5W4aduV+=+yfsvh41xSa22-wNqQRKapadZD+QBag@mail.gmail.com%3E
>> >>> >> This actually exactly covers my point. Is this still considered as a
>> >>> bug,
>> >>> >> calling two different aggregate functions in a row?
>> >>> >>
>> >>> >>
>> >>> >> On Wed, Apr 17, 2013 at 2:35 PM, Steven van Beelen <
>> >>> smcvbeelen@gmail.com>wrote:
>> >>> >>
>> >>> >>> Hi Thomas,
>> >>> >>>
>> >>> >>> Then I guess I did not explain myself clearly.
>> >>> >>> What you describe is indeed how I think of the AverageAggregator to
>> >>> work,
>> >>> >>> but if I use the AverageAggregator in my own PageRank
>> implementation it
>> >>> >>> does not return
>> >>> >>> the average of all absolute differences but just the average of
>> the sum
>> >>> >>> of all values.
>> >>> >>>
>> >>> >>> The (very) small example graph I use has only five vertices, were
>> the
>> >>> sum
>> >>> >>> of every vertice it's value is always 1.0.
>> >>> >>> When I use the AverageAggregator it will always return 0.2 when
>> calling
>> >>> >>> the getLastAggregatedValue method.
>> >>> >>> It shouldn't do that right?
>> >>> >>>
>> >>> >>>
>> >>> >>> On Wed, Apr 17, 2013 at 1:18 PM, Thomas Jungblut <
>> >>> >>> thomas.jungblut@gmail.com> wrote:
>> >>> >>>
>> >>> >>>> Hi Steven,
>> >>> >>>>
>> >>> >>>> the AverageAggregator is used to determine the average of all
>> absolute
>> >>> >>>> differences between old pagerank and new pagerank for every
>> vertex.
>> >>> >>>> This is documented like it should behave in the javadoc of the
>> given
>> >>> >>>> classes and suffices to track if pagerank values have yet
>> converged or
>> >>> >>>> not.
>> >>> >>>>
>> >>> >>>> What you describe is a perfectly valid way to track the pagerank
>> >>> >>>> difference
>> >>> >>>> throughout all supersteps. But this is not how (imho) the
>> >>> >>>> AverageAggregator
>> >>> >>>> should behave, so you have to write your own.
>> >>> >>>>
>> >>> >>>>
>> >>> >>>> 2013/4/17 Steven van Beelen <sm...@gmail.com>
>> >>> >>>>
>> >>> >>>> > The values in my case are the DoubleWritable values each
>> vertice has
>> >>> >>>> and
>> >>> >>>> > the aggregators aggregate on.
>> >>> >>>> > My tests showed that, when the aggregator was set to
>> >>> >>>> AverageAggregator, the
>> >>> >>>> > average of all the vertice values from the past compute step
>> were
>> >>> >>>> returned.
>> >>> >>>> > Actually, AverageAggregator should return the average
>> difference of
>> >>> >>>> all the
>> >>> >>>> > old-new value pairs of every vertice instead of the mean.
>> >>> >>>> > The average difference is then used to check whether
>> convergence is
>> >>> >>>> > reached, which is relevant for all task ofcourse.
>> >>> >>>> >
>> >>> >>>> > Hence, the convergence point, for which the Aggregator is used,
>> will
>> >>> >>>> not be
>> >>> >>>> > reached.
>> >>> >>>> > This thus makes it so that the algorithm will just run the
>> maximum
>> >>> >>>> number
>> >>> >>>> > of iterations set (30 iterations on the PageRank example) in
>> every
>> >>> >>>> case.
>> >>> >>>> > I experienced the same with my own PageRank implementation.
>> >>> >>>> >
>> >>> >>>> > I think it has something to do with the finalizeAggregation step
>> >>> taken.
>> >>> >>>> > Next to that, both the 'aggregate(VERTEX vertex, M value)' and
>> >>> >>>> > 'aggregate(VERTEX vertex, M oldValue, M newValue)' methods are
>> >>> called
>> >>> >>>> every
>> >>> >>>> > time, were one would think only the second (with old/new values)
>> >>> would
>> >>> >>>> > suffice.
>> >>> >>>> > Because of this, the global variable 'absoluteDifference' in the
>> >>> >>>> > 'AbsDiffAggregator' class is overwriten/overruled by the first
>> >>> >>>> aggregate.
>> >>> >>>> > Additionally, if one would make its own Aggregation class in the
>> >>> same
>> >>> >>>> > fashion as AbsDiffAggregator and AverageAggregator, but leave
>> out
>> >>> the
>> >>> >>>> > 'aggregate(VERTEX vertex, M value)', my output turned out to be
>> >>> 0.0000
>> >>> >>>> > every time.
>> >>> >>>> >
>> >>> >>>> > I hope I made myself clear.
>> >>> >>>> > Regards
>> >>> >>>> >
>> >>> >>>> >
>> >>> >>>> > On Wed, Apr 17, 2013 at 11:57 AM, Edward J. Yoon <
>> >>> >>>> edwardyoon@apache.org
>> >>> >>>> > >wrote:
>> >>> >>>> >
>> >>> >>>> > > Thanks for your report.
>> >>> >>>> > >
>> >>> >>>> > > What's the meaning of 'all the values'? Please give me more
>> >>> details
>> >>> >>>> > > about your problem.
>> >>> >>>> > >
>> >>> >>>> > > I didn't look at 'dangling links & aggregators' part of
>> PageRank
>> >>> >>>> > > example closely, but I think there's no bug. Aggregators is
>> just
>> >>> used
>> >>> >>>> > > for global communication. For example, finding max value[1]
>> can be
>> >>> >>>> > > done in only one iteration using MaxValueAggregator.
>> >>> >>>> > >
>> >>> >>>> > > 1.
>> >>> >>>>
>> http://cdn.dejanseo.com.au/wp-content/uploads/2011/06/supersteps.png
>> >>> >>>> > >
>> >>> >>>> > > On Wed, Apr 17, 2013 at 6:27 PM, Steven van Beelen <
>> >>> >>>> smcvbeelen@gmail.com
>> >>> >>>> > >
>> >>> >>>> > > wrote:
>> >>> >>>> > > > Hello,
>> >>> >>>> > > >
>> >>> >>>> > > > I'm creating my own pagerank in hama for a testing and I
>> think I
>> >>> >>>> found
>> >>> >>>> > a
>> >>> >>>> > > > problem with the AverageAggregator. I'm not sure if it is
>> me or
>> >>> >>>> the the
>> >>> >>>> > > > AverageAggregator class in general, but I believe it just
>> >>> returns
>> >>> >>>> the
>> >>> >>>> > > mean
>> >>> >>>> > > > of all the values instead of the average difference between
>> the
>> >>> >>>> old and
>> >>> >>>> > > new
>> >>> >>>> > > > value as intended.
>> >>> >>>> > > >
>> >>> >>>> > > > For testing, I created my own AbsDiffAggregator and
>> >>> >>>> AverageAggregator
>> >>> >>>> > > > classes, using FloatWritable instead of DoubleWritables. The
>> >>> same
>> >>> >>>> > problem
>> >>> >>>> > > > still occured: I got a mean of all the values in the graph
>> >>> instead
>> >>> >>>> of
>> >>> >>>> > an
>> >>> >>>> > > > average difference.
>> >>> >>>> > > >
>> >>> >>>> > > > Could someone tell me if I'm doing something wrong or what I
>> >>> should
>> >>> >>>> > > provide
>> >>> >>>> > > > to better explain my problem?
>> >>> >>>> > > >
>> >>> >>>> > > > Regards,
>> >>> >>>> > > > Steven van Beelen, Vrije Universiteit of Amsterdam
>> >>> >>>> > >
>> >>> >>>> > >
>> >>> >>>> > >
>> >>> >>>> > > --
>> >>> >>>> > > Best Regards, Edward J. Yoon
>> >>> >>>> > > @eddieyoon
>> >>> >>>> > >
>> >>> >>>> >
>> >>> >>>>
>> >>> >>>
>> >>> >>>
>> >>> >>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Best Regards, Edward J. Yoon
>> >>> @eddieyoon
>> >>>
>> >
>> >
>> >
>> > --
>> > Best Regards, Edward J. Yoon
>> > @eddieyoon
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>>



--
Best Regards, Edward J. Yoon
@eddieyoon

Re: Possible Aggregator Problem

Posted by Steven van Beelen <sm...@gmail.com>.
I'm sorry to say so, but the problem still arises. Additionally I found
that 'aggregate(v, v.getValue())'
is called twice as often as 'aggregate(v, lastValue, v.getValue())'.
I can not seem to find in the AggregationRunner or GraphJobRunner why this
is so.
But, in a case were five vertices exists, aggregate(v, v.getValue()) will
be called five times, directly followed by the finalizeAggregation() call.
But proceeding this, five pairs of aggregate(v, v.getValue()) and 'aggregate(v,
lastValue, v.getValue())' are called as logically follows from the
public void aggregateVertex(M lastValue, Vertex<V, E, M> v) in the
AggregationRunner class.

Additionally to this I could give you my code, maybe some flaw in there
causes this problem?


On Wed, Apr 24, 2013 at 10:43 AM, Edward J. Yoon <ed...@apache.org>wrote:

> Steven,
>
> Could you please try your application again with
> http://people.apache.org/~edwardyoon/dist/test/ and feedback me
> whether it works correctly as you expected?
>
> On Wed, Apr 24, 2013 at 4:53 PM, Edward J. Yoon <ed...@apache.org>
> wrote:
> > Thanks for your report. It could be a bug. I'll have a look at it now.
> >
> > On Wed, Apr 24, 2013 at 4:48 PM, Steven van Beelen <sm...@gmail.com>
> wrote:
> >> I'm running version 0.6.1.
> >> Looking at the results I found through testing,
> >>
> >>   public void aggregateVertex(M lastValue, Vertex<V, E, M> v)
> >>
> >> doesn't seem to be the problem. Both 'aggregate(v, v.getValue())' and
> >> 'aggregate(v, lastValue, v.getValue())'
> >> are called correctly and work on the same values.
> >>
> >> However, when finalizing through 'finalizeAggregation()' in the
> >> 'public void doMasterAggregation(MapWritable updatedCnt)' method,
> >>
> >> the value aggregated upon by 'aggregate(v, lastValue, v.getValue())'
> >> is lost. That is what happens at me.
> >>
> >> Could it be that I'm implementing the aggregate methods incorrect?
> >>
> >> In the end however, I can not find a direct bug in TRUNK[1], although
> >> it is not clear to me what/which part of the code was changed through
> >> the ticket on JIRA.
> >>
> >>
> >>
> >>
> >> On Wed, Apr 24, 2013 at 2:41 AM, Edward J. Yoon <edwardyoon@apache.org
> >wrote:
> >>
> >>> I found the ticket on JIRA -
> >>> https://issues.apache.org/jira/browse/HAMA-659
> >>>
> >>> And it seems already fixed.
> >>>
> >>> What is your version of hama here? and can you find some bug in
> TRUNK[1]?
> >>>
> >>> 1.
> >>>
> http://svn.apache.org/repos/asf/hama/trunk/graph/src/main/java/org/apache/hama/graph/AggregationRunner.java
> >>>
> >>> On Tue, Apr 23, 2013 at 9:41 PM, Steven van Beelen <
> smcvbeelen@gmail.com>
> >>> wrote:
> >>> > Could anyone tell me if I'm correct concerning the possible problem I
> >>> > posted and replied on in the previous two emails?
> >>> >
> >>> >
> >>> > On Wed, Apr 17, 2013 at 5:08 PM, Steven van Beelen <
> smcvbeelen@gmail.com
> >>> >wrote:
> >>> >
> >>> >> Additionally, I found this in the mail archives:
> >>> >>
> >>> >>
> >>>
> http://mail-archives.apache.org/mod_mbox/hama-user/201210.mbox/%3CCAJ-=ys=W8F5W4aduV+=+yfsvh41xSa22-wNqQRKapadZD+QBag@mail.gmail.com%3E
> >>> >> This actually exactly covers my point. Is this still considered as a
> >>> bug,
> >>> >> calling two different aggregate functions in a row?
> >>> >>
> >>> >>
> >>> >> On Wed, Apr 17, 2013 at 2:35 PM, Steven van Beelen <
> >>> smcvbeelen@gmail.com>wrote:
> >>> >>
> >>> >>> Hi Thomas,
> >>> >>>
> >>> >>> Then I guess I did not explain myself clearly.
> >>> >>> What you describe is indeed how I think of the AverageAggregator to
> >>> work,
> >>> >>> but if I use the AverageAggregator in my own PageRank
> implementation it
> >>> >>> does not return
> >>> >>> the average of all absolute differences but just the average of
> the sum
> >>> >>> of all values.
> >>> >>>
> >>> >>> The (very) small example graph I use has only five vertices, were
> the
> >>> sum
> >>> >>> of every vertice it's value is always 1.0.
> >>> >>> When I use the AverageAggregator it will always return 0.2 when
> calling
> >>> >>> the getLastAggregatedValue method.
> >>> >>> It shouldn't do that right?
> >>> >>>
> >>> >>>
> >>> >>> On Wed, Apr 17, 2013 at 1:18 PM, Thomas Jungblut <
> >>> >>> thomas.jungblut@gmail.com> wrote:
> >>> >>>
> >>> >>>> Hi Steven,
> >>> >>>>
> >>> >>>> the AverageAggregator is used to determine the average of all
> absolute
> >>> >>>> differences between old pagerank and new pagerank for every
> vertex.
> >>> >>>> This is documented like it should behave in the javadoc of the
> given
> >>> >>>> classes and suffices to track if pagerank values have yet
> converged or
> >>> >>>> not.
> >>> >>>>
> >>> >>>> What you describe is a perfectly valid way to track the pagerank
> >>> >>>> difference
> >>> >>>> throughout all supersteps. But this is not how (imho) the
> >>> >>>> AverageAggregator
> >>> >>>> should behave, so you have to write your own.
> >>> >>>>
> >>> >>>>
> >>> >>>> 2013/4/17 Steven van Beelen <sm...@gmail.com>
> >>> >>>>
> >>> >>>> > The values in my case are the DoubleWritable values each
> vertice has
> >>> >>>> and
> >>> >>>> > the aggregators aggregate on.
> >>> >>>> > My tests showed that, when the aggregator was set to
> >>> >>>> AverageAggregator, the
> >>> >>>> > average of all the vertice values from the past compute step
> were
> >>> >>>> returned.
> >>> >>>> > Actually, AverageAggregator should return the average
> difference of
> >>> >>>> all the
> >>> >>>> > old-new value pairs of every vertice instead of the mean.
> >>> >>>> > The average difference is then used to check whether
> convergence is
> >>> >>>> > reached, which is relevant for all task ofcourse.
> >>> >>>> >
> >>> >>>> > Hence, the convergence point, for which the Aggregator is used,
> will
> >>> >>>> not be
> >>> >>>> > reached.
> >>> >>>> > This thus makes it so that the algorithm will just run the
> maximum
> >>> >>>> number
> >>> >>>> > of iterations set (30 iterations on the PageRank example) in
> every
> >>> >>>> case.
> >>> >>>> > I experienced the same with my own PageRank implementation.
> >>> >>>> >
> >>> >>>> > I think it has something to do with the finalizeAggregation step
> >>> taken.
> >>> >>>> > Next to that, both the 'aggregate(VERTEX vertex, M value)' and
> >>> >>>> > 'aggregate(VERTEX vertex, M oldValue, M newValue)' methods are
> >>> called
> >>> >>>> every
> >>> >>>> > time, were one would think only the second (with old/new values)
> >>> would
> >>> >>>> > suffice.
> >>> >>>> > Because of this, the global variable 'absoluteDifference' in the
> >>> >>>> > 'AbsDiffAggregator' class is overwriten/overruled by the first
> >>> >>>> aggregate.
> >>> >>>> > Additionally, if one would make its own Aggregation class in the
> >>> same
> >>> >>>> > fashion as AbsDiffAggregator and AverageAggregator, but leave
> out
> >>> the
> >>> >>>> > 'aggregate(VERTEX vertex, M value)', my output turned out to be
> >>> 0.0000
> >>> >>>> > every time.
> >>> >>>> >
> >>> >>>> > I hope I made myself clear.
> >>> >>>> > Regards
> >>> >>>> >
> >>> >>>> >
> >>> >>>> > On Wed, Apr 17, 2013 at 11:57 AM, Edward J. Yoon <
> >>> >>>> edwardyoon@apache.org
> >>> >>>> > >wrote:
> >>> >>>> >
> >>> >>>> > > Thanks for your report.
> >>> >>>> > >
> >>> >>>> > > What's the meaning of 'all the values'? Please give me more
> >>> details
> >>> >>>> > > about your problem.
> >>> >>>> > >
> >>> >>>> > > I didn't look at 'dangling links & aggregators' part of
> PageRank
> >>> >>>> > > example closely, but I think there's no bug. Aggregators is
> just
> >>> used
> >>> >>>> > > for global communication. For example, finding max value[1]
> can be
> >>> >>>> > > done in only one iteration using MaxValueAggregator.
> >>> >>>> > >
> >>> >>>> > > 1.
> >>> >>>>
> http://cdn.dejanseo.com.au/wp-content/uploads/2011/06/supersteps.png
> >>> >>>> > >
> >>> >>>> > > On Wed, Apr 17, 2013 at 6:27 PM, Steven van Beelen <
> >>> >>>> smcvbeelen@gmail.com
> >>> >>>> > >
> >>> >>>> > > wrote:
> >>> >>>> > > > Hello,
> >>> >>>> > > >
> >>> >>>> > > > I'm creating my own pagerank in hama for a testing and I
> think I
> >>> >>>> found
> >>> >>>> > a
> >>> >>>> > > > problem with the AverageAggregator. I'm not sure if it is
> me or
> >>> >>>> the the
> >>> >>>> > > > AverageAggregator class in general, but I believe it just
> >>> returns
> >>> >>>> the
> >>> >>>> > > mean
> >>> >>>> > > > of all the values instead of the average difference between
> the
> >>> >>>> old and
> >>> >>>> > > new
> >>> >>>> > > > value as intended.
> >>> >>>> > > >
> >>> >>>> > > > For testing, I created my own AbsDiffAggregator and
> >>> >>>> AverageAggregator
> >>> >>>> > > > classes, using FloatWritable instead of DoubleWritables. The
> >>> same
> >>> >>>> > problem
> >>> >>>> > > > still occured: I got a mean of all the values in the graph
> >>> instead
> >>> >>>> of
> >>> >>>> > an
> >>> >>>> > > > average difference.
> >>> >>>> > > >
> >>> >>>> > > > Could someone tell me if I'm doing something wrong or what I
> >>> should
> >>> >>>> > > provide
> >>> >>>> > > > to better explain my problem?
> >>> >>>> > > >
> >>> >>>> > > > Regards,
> >>> >>>> > > > Steven van Beelen, Vrije Universiteit of Amsterdam
> >>> >>>> > >
> >>> >>>> > >
> >>> >>>> > >
> >>> >>>> > > --
> >>> >>>> > > Best Regards, Edward J. Yoon
> >>> >>>> > > @eddieyoon
> >>> >>>> > >
> >>> >>>> >
> >>> >>>>
> >>> >>>
> >>> >>>
> >>> >>
> >>>
> >>>
> >>>
> >>> --
> >>> Best Regards, Edward J. Yoon
> >>> @eddieyoon
> >>>
> >
> >
> >
> > --
> > Best Regards, Edward J. Yoon
> > @eddieyoon
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>

Re: Possible Aggregator Problem

Posted by "Edward J. Yoon" <ed...@apache.org>.
Steven,

Could you please try your application again with
http://people.apache.org/~edwardyoon/dist/test/ and feedback me
whether it works correctly as you expected?

On Wed, Apr 24, 2013 at 4:53 PM, Edward J. Yoon <ed...@apache.org> wrote:
> Thanks for your report. It could be a bug. I'll have a look at it now.
>
> On Wed, Apr 24, 2013 at 4:48 PM, Steven van Beelen <sm...@gmail.com> wrote:
>> I'm running version 0.6.1.
>> Looking at the results I found through testing,
>>
>>   public void aggregateVertex(M lastValue, Vertex<V, E, M> v)
>>
>> doesn't seem to be the problem. Both 'aggregate(v, v.getValue())' and
>> 'aggregate(v, lastValue, v.getValue())'
>> are called correctly and work on the same values.
>>
>> However, when finalizing through 'finalizeAggregation()' in the
>> 'public void doMasterAggregation(MapWritable updatedCnt)' method,
>>
>> the value aggregated upon by 'aggregate(v, lastValue, v.getValue())'
>> is lost. That is what happens at me.
>>
>> Could it be that I'm implementing the aggregate methods incorrect?
>>
>> In the end however, I can not find a direct bug in TRUNK[1], although
>> it is not clear to me what/which part of the code was changed through
>> the ticket on JIRA.
>>
>>
>>
>>
>> On Wed, Apr 24, 2013 at 2:41 AM, Edward J. Yoon <ed...@apache.org>wrote:
>>
>>> I found the ticket on JIRA -
>>> https://issues.apache.org/jira/browse/HAMA-659
>>>
>>> And it seems already fixed.
>>>
>>> What is your version of hama here? and can you find some bug in TRUNK[1]?
>>>
>>> 1.
>>> http://svn.apache.org/repos/asf/hama/trunk/graph/src/main/java/org/apache/hama/graph/AggregationRunner.java
>>>
>>> On Tue, Apr 23, 2013 at 9:41 PM, Steven van Beelen <sm...@gmail.com>
>>> wrote:
>>> > Could anyone tell me if I'm correct concerning the possible problem I
>>> > posted and replied on in the previous two emails?
>>> >
>>> >
>>> > On Wed, Apr 17, 2013 at 5:08 PM, Steven van Beelen <smcvbeelen@gmail.com
>>> >wrote:
>>> >
>>> >> Additionally, I found this in the mail archives:
>>> >>
>>> >>
>>> http://mail-archives.apache.org/mod_mbox/hama-user/201210.mbox/%3CCAJ-=ys=W8F5W4aduV+=+yfsvh41xSa22-wNqQRKapadZD+QBag@mail.gmail.com%3E
>>> >> This actually exactly covers my point. Is this still considered as a
>>> bug,
>>> >> calling two different aggregate functions in a row?
>>> >>
>>> >>
>>> >> On Wed, Apr 17, 2013 at 2:35 PM, Steven van Beelen <
>>> smcvbeelen@gmail.com>wrote:
>>> >>
>>> >>> Hi Thomas,
>>> >>>
>>> >>> Then I guess I did not explain myself clearly.
>>> >>> What you describe is indeed how I think of the AverageAggregator to
>>> work,
>>> >>> but if I use the AverageAggregator in my own PageRank implementation it
>>> >>> does not return
>>> >>> the average of all absolute differences but just the average of the sum
>>> >>> of all values.
>>> >>>
>>> >>> The (very) small example graph I use has only five vertices, were the
>>> sum
>>> >>> of every vertice it's value is always 1.0.
>>> >>> When I use the AverageAggregator it will always return 0.2 when calling
>>> >>> the getLastAggregatedValue method.
>>> >>> It shouldn't do that right?
>>> >>>
>>> >>>
>>> >>> On Wed, Apr 17, 2013 at 1:18 PM, Thomas Jungblut <
>>> >>> thomas.jungblut@gmail.com> wrote:
>>> >>>
>>> >>>> Hi Steven,
>>> >>>>
>>> >>>> the AverageAggregator is used to determine the average of all absolute
>>> >>>> differences between old pagerank and new pagerank for every vertex.
>>> >>>> This is documented like it should behave in the javadoc of the given
>>> >>>> classes and suffices to track if pagerank values have yet converged or
>>> >>>> not.
>>> >>>>
>>> >>>> What you describe is a perfectly valid way to track the pagerank
>>> >>>> difference
>>> >>>> throughout all supersteps. But this is not how (imho) the
>>> >>>> AverageAggregator
>>> >>>> should behave, so you have to write your own.
>>> >>>>
>>> >>>>
>>> >>>> 2013/4/17 Steven van Beelen <sm...@gmail.com>
>>> >>>>
>>> >>>> > The values in my case are the DoubleWritable values each vertice has
>>> >>>> and
>>> >>>> > the aggregators aggregate on.
>>> >>>> > My tests showed that, when the aggregator was set to
>>> >>>> AverageAggregator, the
>>> >>>> > average of all the vertice values from the past compute step were
>>> >>>> returned.
>>> >>>> > Actually, AverageAggregator should return the average difference of
>>> >>>> all the
>>> >>>> > old-new value pairs of every vertice instead of the mean.
>>> >>>> > The average difference is then used to check whether convergence is
>>> >>>> > reached, which is relevant for all task ofcourse.
>>> >>>> >
>>> >>>> > Hence, the convergence point, for which the Aggregator is used, will
>>> >>>> not be
>>> >>>> > reached.
>>> >>>> > This thus makes it so that the algorithm will just run the maximum
>>> >>>> number
>>> >>>> > of iterations set (30 iterations on the PageRank example) in every
>>> >>>> case.
>>> >>>> > I experienced the same with my own PageRank implementation.
>>> >>>> >
>>> >>>> > I think it has something to do with the finalizeAggregation step
>>> taken.
>>> >>>> > Next to that, both the 'aggregate(VERTEX vertex, M value)' and
>>> >>>> > 'aggregate(VERTEX vertex, M oldValue, M newValue)' methods are
>>> called
>>> >>>> every
>>> >>>> > time, were one would think only the second (with old/new values)
>>> would
>>> >>>> > suffice.
>>> >>>> > Because of this, the global variable 'absoluteDifference' in the
>>> >>>> > 'AbsDiffAggregator' class is overwriten/overruled by the first
>>> >>>> aggregate.
>>> >>>> > Additionally, if one would make its own Aggregation class in the
>>> same
>>> >>>> > fashion as AbsDiffAggregator and AverageAggregator, but leave out
>>> the
>>> >>>> > 'aggregate(VERTEX vertex, M value)', my output turned out to be
>>> 0.0000
>>> >>>> > every time.
>>> >>>> >
>>> >>>> > I hope I made myself clear.
>>> >>>> > Regards
>>> >>>> >
>>> >>>> >
>>> >>>> > On Wed, Apr 17, 2013 at 11:57 AM, Edward J. Yoon <
>>> >>>> edwardyoon@apache.org
>>> >>>> > >wrote:
>>> >>>> >
>>> >>>> > > Thanks for your report.
>>> >>>> > >
>>> >>>> > > What's the meaning of 'all the values'? Please give me more
>>> details
>>> >>>> > > about your problem.
>>> >>>> > >
>>> >>>> > > I didn't look at 'dangling links & aggregators' part of PageRank
>>> >>>> > > example closely, but I think there's no bug. Aggregators is just
>>> used
>>> >>>> > > for global communication. For example, finding max value[1] can be
>>> >>>> > > done in only one iteration using MaxValueAggregator.
>>> >>>> > >
>>> >>>> > > 1.
>>> >>>> http://cdn.dejanseo.com.au/wp-content/uploads/2011/06/supersteps.png
>>> >>>> > >
>>> >>>> > > On Wed, Apr 17, 2013 at 6:27 PM, Steven van Beelen <
>>> >>>> smcvbeelen@gmail.com
>>> >>>> > >
>>> >>>> > > wrote:
>>> >>>> > > > Hello,
>>> >>>> > > >
>>> >>>> > > > I'm creating my own pagerank in hama for a testing and I think I
>>> >>>> found
>>> >>>> > a
>>> >>>> > > > problem with the AverageAggregator. I'm not sure if it is me or
>>> >>>> the the
>>> >>>> > > > AverageAggregator class in general, but I believe it just
>>> returns
>>> >>>> the
>>> >>>> > > mean
>>> >>>> > > > of all the values instead of the average difference between the
>>> >>>> old and
>>> >>>> > > new
>>> >>>> > > > value as intended.
>>> >>>> > > >
>>> >>>> > > > For testing, I created my own AbsDiffAggregator and
>>> >>>> AverageAggregator
>>> >>>> > > > classes, using FloatWritable instead of DoubleWritables. The
>>> same
>>> >>>> > problem
>>> >>>> > > > still occured: I got a mean of all the values in the graph
>>> instead
>>> >>>> of
>>> >>>> > an
>>> >>>> > > > average difference.
>>> >>>> > > >
>>> >>>> > > > Could someone tell me if I'm doing something wrong or what I
>>> should
>>> >>>> > > provide
>>> >>>> > > > to better explain my problem?
>>> >>>> > > >
>>> >>>> > > > Regards,
>>> >>>> > > > Steven van Beelen, Vrije Universiteit of Amsterdam
>>> >>>> > >
>>> >>>> > >
>>> >>>> > >
>>> >>>> > > --
>>> >>>> > > Best Regards, Edward J. Yoon
>>> >>>> > > @eddieyoon
>>> >>>> > >
>>> >>>> >
>>> >>>>
>>> >>>
>>> >>>
>>> >>
>>>
>>>
>>>
>>> --
>>> Best Regards, Edward J. Yoon
>>> @eddieyoon
>>>
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Possible Aggregator Problem

Posted by "Edward J. Yoon" <ed...@apache.org>.
Thanks for your report. It could be a bug. I'll have a look at it now.

On Wed, Apr 24, 2013 at 4:48 PM, Steven van Beelen <sm...@gmail.com> wrote:
> I'm running version 0.6.1.
> Looking at the results I found through testing,
>
>   public void aggregateVertex(M lastValue, Vertex<V, E, M> v)
>
> doesn't seem to be the problem. Both 'aggregate(v, v.getValue())' and
> 'aggregate(v, lastValue, v.getValue())'
> are called correctly and work on the same values.
>
> However, when finalizing through 'finalizeAggregation()' in the
> 'public void doMasterAggregation(MapWritable updatedCnt)' method,
>
> the value aggregated upon by 'aggregate(v, lastValue, v.getValue())'
> is lost. That is what happens at me.
>
> Could it be that I'm implementing the aggregate methods incorrect?
>
> In the end however, I can not find a direct bug in TRUNK[1], although
> it is not clear to me what/which part of the code was changed through
> the ticket on JIRA.
>
>
>
>
> On Wed, Apr 24, 2013 at 2:41 AM, Edward J. Yoon <ed...@apache.org>wrote:
>
>> I found the ticket on JIRA -
>> https://issues.apache.org/jira/browse/HAMA-659
>>
>> And it seems already fixed.
>>
>> What is your version of hama here? and can you find some bug in TRUNK[1]?
>>
>> 1.
>> http://svn.apache.org/repos/asf/hama/trunk/graph/src/main/java/org/apache/hama/graph/AggregationRunner.java
>>
>> On Tue, Apr 23, 2013 at 9:41 PM, Steven van Beelen <sm...@gmail.com>
>> wrote:
>> > Could anyone tell me if I'm correct concerning the possible problem I
>> > posted and replied on in the previous two emails?
>> >
>> >
>> > On Wed, Apr 17, 2013 at 5:08 PM, Steven van Beelen <smcvbeelen@gmail.com
>> >wrote:
>> >
>> >> Additionally, I found this in the mail archives:
>> >>
>> >>
>> http://mail-archives.apache.org/mod_mbox/hama-user/201210.mbox/%3CCAJ-=ys=W8F5W4aduV+=+yfsvh41xSa22-wNqQRKapadZD+QBag@mail.gmail.com%3E
>> >> This actually exactly covers my point. Is this still considered as a
>> bug,
>> >> calling two different aggregate functions in a row?
>> >>
>> >>
>> >> On Wed, Apr 17, 2013 at 2:35 PM, Steven van Beelen <
>> smcvbeelen@gmail.com>wrote:
>> >>
>> >>> Hi Thomas,
>> >>>
>> >>> Then I guess I did not explain myself clearly.
>> >>> What you describe is indeed how I think of the AverageAggregator to
>> work,
>> >>> but if I use the AverageAggregator in my own PageRank implementation it
>> >>> does not return
>> >>> the average of all absolute differences but just the average of the sum
>> >>> of all values.
>> >>>
>> >>> The (very) small example graph I use has only five vertices, were the
>> sum
>> >>> of every vertice it's value is always 1.0.
>> >>> When I use the AverageAggregator it will always return 0.2 when calling
>> >>> the getLastAggregatedValue method.
>> >>> It shouldn't do that right?
>> >>>
>> >>>
>> >>> On Wed, Apr 17, 2013 at 1:18 PM, Thomas Jungblut <
>> >>> thomas.jungblut@gmail.com> wrote:
>> >>>
>> >>>> Hi Steven,
>> >>>>
>> >>>> the AverageAggregator is used to determine the average of all absolute
>> >>>> differences between old pagerank and new pagerank for every vertex.
>> >>>> This is documented like it should behave in the javadoc of the given
>> >>>> classes and suffices to track if pagerank values have yet converged or
>> >>>> not.
>> >>>>
>> >>>> What you describe is a perfectly valid way to track the pagerank
>> >>>> difference
>> >>>> throughout all supersteps. But this is not how (imho) the
>> >>>> AverageAggregator
>> >>>> should behave, so you have to write your own.
>> >>>>
>> >>>>
>> >>>> 2013/4/17 Steven van Beelen <sm...@gmail.com>
>> >>>>
>> >>>> > The values in my case are the DoubleWritable values each vertice has
>> >>>> and
>> >>>> > the aggregators aggregate on.
>> >>>> > My tests showed that, when the aggregator was set to
>> >>>> AverageAggregator, the
>> >>>> > average of all the vertice values from the past compute step were
>> >>>> returned.
>> >>>> > Actually, AverageAggregator should return the average difference of
>> >>>> all the
>> >>>> > old-new value pairs of every vertice instead of the mean.
>> >>>> > The average difference is then used to check whether convergence is
>> >>>> > reached, which is relevant for all task ofcourse.
>> >>>> >
>> >>>> > Hence, the convergence point, for which the Aggregator is used, will
>> >>>> not be
>> >>>> > reached.
>> >>>> > This thus makes it so that the algorithm will just run the maximum
>> >>>> number
>> >>>> > of iterations set (30 iterations on the PageRank example) in every
>> >>>> case.
>> >>>> > I experienced the same with my own PageRank implementation.
>> >>>> >
>> >>>> > I think it has something to do with the finalizeAggregation step
>> taken.
>> >>>> > Next to that, both the 'aggregate(VERTEX vertex, M value)' and
>> >>>> > 'aggregate(VERTEX vertex, M oldValue, M newValue)' methods are
>> called
>> >>>> every
>> >>>> > time, were one would think only the second (with old/new values)
>> would
>> >>>> > suffice.
>> >>>> > Because of this, the global variable 'absoluteDifference' in the
>> >>>> > 'AbsDiffAggregator' class is overwriten/overruled by the first
>> >>>> aggregate.
>> >>>> > Additionally, if one would make its own Aggregation class in the
>> same
>> >>>> > fashion as AbsDiffAggregator and AverageAggregator, but leave out
>> the
>> >>>> > 'aggregate(VERTEX vertex, M value)', my output turned out to be
>> 0.0000
>> >>>> > every time.
>> >>>> >
>> >>>> > I hope I made myself clear.
>> >>>> > Regards
>> >>>> >
>> >>>> >
>> >>>> > On Wed, Apr 17, 2013 at 11:57 AM, Edward J. Yoon <
>> >>>> edwardyoon@apache.org
>> >>>> > >wrote:
>> >>>> >
>> >>>> > > Thanks for your report.
>> >>>> > >
>> >>>> > > What's the meaning of 'all the values'? Please give me more
>> details
>> >>>> > > about your problem.
>> >>>> > >
>> >>>> > > I didn't look at 'dangling links & aggregators' part of PageRank
>> >>>> > > example closely, but I think there's no bug. Aggregators is just
>> used
>> >>>> > > for global communication. For example, finding max value[1] can be
>> >>>> > > done in only one iteration using MaxValueAggregator.
>> >>>> > >
>> >>>> > > 1.
>> >>>> http://cdn.dejanseo.com.au/wp-content/uploads/2011/06/supersteps.png
>> >>>> > >
>> >>>> > > On Wed, Apr 17, 2013 at 6:27 PM, Steven van Beelen <
>> >>>> smcvbeelen@gmail.com
>> >>>> > >
>> >>>> > > wrote:
>> >>>> > > > Hello,
>> >>>> > > >
>> >>>> > > > I'm creating my own pagerank in hama for a testing and I think I
>> >>>> found
>> >>>> > a
>> >>>> > > > problem with the AverageAggregator. I'm not sure if it is me or
>> >>>> the the
>> >>>> > > > AverageAggregator class in general, but I believe it just
>> returns
>> >>>> the
>> >>>> > > mean
>> >>>> > > > of all the values instead of the average difference between the
>> >>>> old and
>> >>>> > > new
>> >>>> > > > value as intended.
>> >>>> > > >
>> >>>> > > > For testing, I created my own AbsDiffAggregator and
>> >>>> AverageAggregator
>> >>>> > > > classes, using FloatWritable instead of DoubleWritables. The
>> same
>> >>>> > problem
>> >>>> > > > still occured: I got a mean of all the values in the graph
>> instead
>> >>>> of
>> >>>> > an
>> >>>> > > > average difference.
>> >>>> > > >
>> >>>> > > > Could someone tell me if I'm doing something wrong or what I
>> should
>> >>>> > > provide
>> >>>> > > > to better explain my problem?
>> >>>> > > >
>> >>>> > > > Regards,
>> >>>> > > > Steven van Beelen, Vrije Universiteit of Amsterdam
>> >>>> > >
>> >>>> > >
>> >>>> > >
>> >>>> > > --
>> >>>> > > Best Regards, Edward J. Yoon
>> >>>> > > @eddieyoon
>> >>>> > >
>> >>>> >
>> >>>>
>> >>>
>> >>>
>> >>
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Possible Aggregator Problem

Posted by Steven van Beelen <sm...@gmail.com>.
I'm running version 0.6.1.
Looking at the results I found through testing,

  public void aggregateVertex(M lastValue, Vertex<V, E, M> v)

doesn't seem to be the problem. Both 'aggregate(v, v.getValue())' and
'aggregate(v, lastValue, v.getValue())'
are called correctly and work on the same values.

However, when finalizing through 'finalizeAggregation()' in the
'public void doMasterAggregation(MapWritable updatedCnt)' method,

the value aggregated upon by 'aggregate(v, lastValue, v.getValue())'
is lost. That is what happens at me.

Could it be that I'm implementing the aggregate methods incorrect?

In the end however, I can not find a direct bug in TRUNK[1], although
it is not clear to me what/which part of the code was changed through
the ticket on JIRA.




On Wed, Apr 24, 2013 at 2:41 AM, Edward J. Yoon <ed...@apache.org>wrote:

> I found the ticket on JIRA -
> https://issues.apache.org/jira/browse/HAMA-659
>
> And it seems already fixed.
>
> What is your version of hama here? and can you find some bug in TRUNK[1]?
>
> 1.
> http://svn.apache.org/repos/asf/hama/trunk/graph/src/main/java/org/apache/hama/graph/AggregationRunner.java
>
> On Tue, Apr 23, 2013 at 9:41 PM, Steven van Beelen <sm...@gmail.com>
> wrote:
> > Could anyone tell me if I'm correct concerning the possible problem I
> > posted and replied on in the previous two emails?
> >
> >
> > On Wed, Apr 17, 2013 at 5:08 PM, Steven van Beelen <smcvbeelen@gmail.com
> >wrote:
> >
> >> Additionally, I found this in the mail archives:
> >>
> >>
> http://mail-archives.apache.org/mod_mbox/hama-user/201210.mbox/%3CCAJ-=ys=W8F5W4aduV+=+yfsvh41xSa22-wNqQRKapadZD+QBag@mail.gmail.com%3E
> >> This actually exactly covers my point. Is this still considered as a
> bug,
> >> calling two different aggregate functions in a row?
> >>
> >>
> >> On Wed, Apr 17, 2013 at 2:35 PM, Steven van Beelen <
> smcvbeelen@gmail.com>wrote:
> >>
> >>> Hi Thomas,
> >>>
> >>> Then I guess I did not explain myself clearly.
> >>> What you describe is indeed how I think of the AverageAggregator to
> work,
> >>> but if I use the AverageAggregator in my own PageRank implementation it
> >>> does not return
> >>> the average of all absolute differences but just the average of the sum
> >>> of all values.
> >>>
> >>> The (very) small example graph I use has only five vertices, were the
> sum
> >>> of every vertice it's value is always 1.0.
> >>> When I use the AverageAggregator it will always return 0.2 when calling
> >>> the getLastAggregatedValue method.
> >>> It shouldn't do that right?
> >>>
> >>>
> >>> On Wed, Apr 17, 2013 at 1:18 PM, Thomas Jungblut <
> >>> thomas.jungblut@gmail.com> wrote:
> >>>
> >>>> Hi Steven,
> >>>>
> >>>> the AverageAggregator is used to determine the average of all absolute
> >>>> differences between old pagerank and new pagerank for every vertex.
> >>>> This is documented like it should behave in the javadoc of the given
> >>>> classes and suffices to track if pagerank values have yet converged or
> >>>> not.
> >>>>
> >>>> What you describe is a perfectly valid way to track the pagerank
> >>>> difference
> >>>> throughout all supersteps. But this is not how (imho) the
> >>>> AverageAggregator
> >>>> should behave, so you have to write your own.
> >>>>
> >>>>
> >>>> 2013/4/17 Steven van Beelen <sm...@gmail.com>
> >>>>
> >>>> > The values in my case are the DoubleWritable values each vertice has
> >>>> and
> >>>> > the aggregators aggregate on.
> >>>> > My tests showed that, when the aggregator was set to
> >>>> AverageAggregator, the
> >>>> > average of all the vertice values from the past compute step were
> >>>> returned.
> >>>> > Actually, AverageAggregator should return the average difference of
> >>>> all the
> >>>> > old-new value pairs of every vertice instead of the mean.
> >>>> > The average difference is then used to check whether convergence is
> >>>> > reached, which is relevant for all task ofcourse.
> >>>> >
> >>>> > Hence, the convergence point, for which the Aggregator is used, will
> >>>> not be
> >>>> > reached.
> >>>> > This thus makes it so that the algorithm will just run the maximum
> >>>> number
> >>>> > of iterations set (30 iterations on the PageRank example) in every
> >>>> case.
> >>>> > I experienced the same with my own PageRank implementation.
> >>>> >
> >>>> > I think it has something to do with the finalizeAggregation step
> taken.
> >>>> > Next to that, both the 'aggregate(VERTEX vertex, M value)' and
> >>>> > 'aggregate(VERTEX vertex, M oldValue, M newValue)' methods are
> called
> >>>> every
> >>>> > time, were one would think only the second (with old/new values)
> would
> >>>> > suffice.
> >>>> > Because of this, the global variable 'absoluteDifference' in the
> >>>> > 'AbsDiffAggregator' class is overwriten/overruled by the first
> >>>> aggregate.
> >>>> > Additionally, if one would make its own Aggregation class in the
> same
> >>>> > fashion as AbsDiffAggregator and AverageAggregator, but leave out
> the
> >>>> > 'aggregate(VERTEX vertex, M value)', my output turned out to be
> 0.0000
> >>>> > every time.
> >>>> >
> >>>> > I hope I made myself clear.
> >>>> > Regards
> >>>> >
> >>>> >
> >>>> > On Wed, Apr 17, 2013 at 11:57 AM, Edward J. Yoon <
> >>>> edwardyoon@apache.org
> >>>> > >wrote:
> >>>> >
> >>>> > > Thanks for your report.
> >>>> > >
> >>>> > > What's the meaning of 'all the values'? Please give me more
> details
> >>>> > > about your problem.
> >>>> > >
> >>>> > > I didn't look at 'dangling links & aggregators' part of PageRank
> >>>> > > example closely, but I think there's no bug. Aggregators is just
> used
> >>>> > > for global communication. For example, finding max value[1] can be
> >>>> > > done in only one iteration using MaxValueAggregator.
> >>>> > >
> >>>> > > 1.
> >>>> http://cdn.dejanseo.com.au/wp-content/uploads/2011/06/supersteps.png
> >>>> > >
> >>>> > > On Wed, Apr 17, 2013 at 6:27 PM, Steven van Beelen <
> >>>> smcvbeelen@gmail.com
> >>>> > >
> >>>> > > wrote:
> >>>> > > > Hello,
> >>>> > > >
> >>>> > > > I'm creating my own pagerank in hama for a testing and I think I
> >>>> found
> >>>> > a
> >>>> > > > problem with the AverageAggregator. I'm not sure if it is me or
> >>>> the the
> >>>> > > > AverageAggregator class in general, but I believe it just
> returns
> >>>> the
> >>>> > > mean
> >>>> > > > of all the values instead of the average difference between the
> >>>> old and
> >>>> > > new
> >>>> > > > value as intended.
> >>>> > > >
> >>>> > > > For testing, I created my own AbsDiffAggregator and
> >>>> AverageAggregator
> >>>> > > > classes, using FloatWritable instead of DoubleWritables. The
> same
> >>>> > problem
> >>>> > > > still occured: I got a mean of all the values in the graph
> instead
> >>>> of
> >>>> > an
> >>>> > > > average difference.
> >>>> > > >
> >>>> > > > Could someone tell me if I'm doing something wrong or what I
> should
> >>>> > > provide
> >>>> > > > to better explain my problem?
> >>>> > > >
> >>>> > > > Regards,
> >>>> > > > Steven van Beelen, Vrije Universiteit of Amsterdam
> >>>> > >
> >>>> > >
> >>>> > >
> >>>> > > --
> >>>> > > Best Regards, Edward J. Yoon
> >>>> > > @eddieyoon
> >>>> > >
> >>>> >
> >>>>
> >>>
> >>>
> >>
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>

Re: Possible Aggregator Problem

Posted by "Edward J. Yoon" <ed...@apache.org>.
I found the ticket on JIRA - https://issues.apache.org/jira/browse/HAMA-659

And it seems already fixed.

What is your version of hama here? and can you find some bug in TRUNK[1]?

1. http://svn.apache.org/repos/asf/hama/trunk/graph/src/main/java/org/apache/hama/graph/AggregationRunner.java

On Tue, Apr 23, 2013 at 9:41 PM, Steven van Beelen <sm...@gmail.com> wrote:
> Could anyone tell me if I'm correct concerning the possible problem I
> posted and replied on in the previous two emails?
>
>
> On Wed, Apr 17, 2013 at 5:08 PM, Steven van Beelen <sm...@gmail.com>wrote:
>
>> Additionally, I found this in the mail archives:
>>
>> http://mail-archives.apache.org/mod_mbox/hama-user/201210.mbox/%3CCAJ-=ys=W8F5W4aduV+=+yfsvh41xSa22-wNqQRKapadZD+QBag@mail.gmail.com%3E
>> This actually exactly covers my point. Is this still considered as a bug,
>> calling two different aggregate functions in a row?
>>
>>
>> On Wed, Apr 17, 2013 at 2:35 PM, Steven van Beelen <sm...@gmail.com>wrote:
>>
>>> Hi Thomas,
>>>
>>> Then I guess I did not explain myself clearly.
>>> What you describe is indeed how I think of the AverageAggregator to work,
>>> but if I use the AverageAggregator in my own PageRank implementation it
>>> does not return
>>> the average of all absolute differences but just the average of the sum
>>> of all values.
>>>
>>> The (very) small example graph I use has only five vertices, were the sum
>>> of every vertice it's value is always 1.0.
>>> When I use the AverageAggregator it will always return 0.2 when calling
>>> the getLastAggregatedValue method.
>>> It shouldn't do that right?
>>>
>>>
>>> On Wed, Apr 17, 2013 at 1:18 PM, Thomas Jungblut <
>>> thomas.jungblut@gmail.com> wrote:
>>>
>>>> Hi Steven,
>>>>
>>>> the AverageAggregator is used to determine the average of all absolute
>>>> differences between old pagerank and new pagerank for every vertex.
>>>> This is documented like it should behave in the javadoc of the given
>>>> classes and suffices to track if pagerank values have yet converged or
>>>> not.
>>>>
>>>> What you describe is a perfectly valid way to track the pagerank
>>>> difference
>>>> throughout all supersteps. But this is not how (imho) the
>>>> AverageAggregator
>>>> should behave, so you have to write your own.
>>>>
>>>>
>>>> 2013/4/17 Steven van Beelen <sm...@gmail.com>
>>>>
>>>> > The values in my case are the DoubleWritable values each vertice has
>>>> and
>>>> > the aggregators aggregate on.
>>>> > My tests showed that, when the aggregator was set to
>>>> AverageAggregator, the
>>>> > average of all the vertice values from the past compute step were
>>>> returned.
>>>> > Actually, AverageAggregator should return the average difference of
>>>> all the
>>>> > old-new value pairs of every vertice instead of the mean.
>>>> > The average difference is then used to check whether convergence is
>>>> > reached, which is relevant for all task ofcourse.
>>>> >
>>>> > Hence, the convergence point, for which the Aggregator is used, will
>>>> not be
>>>> > reached.
>>>> > This thus makes it so that the algorithm will just run the maximum
>>>> number
>>>> > of iterations set (30 iterations on the PageRank example) in every
>>>> case.
>>>> > I experienced the same with my own PageRank implementation.
>>>> >
>>>> > I think it has something to do with the finalizeAggregation step taken.
>>>> > Next to that, both the 'aggregate(VERTEX vertex, M value)' and
>>>> > 'aggregate(VERTEX vertex, M oldValue, M newValue)' methods are called
>>>> every
>>>> > time, were one would think only the second (with old/new values) would
>>>> > suffice.
>>>> > Because of this, the global variable 'absoluteDifference' in the
>>>> > 'AbsDiffAggregator' class is overwriten/overruled by the first
>>>> aggregate.
>>>> > Additionally, if one would make its own Aggregation class in the same
>>>> > fashion as AbsDiffAggregator and AverageAggregator, but leave out the
>>>> > 'aggregate(VERTEX vertex, M value)', my output turned out to be 0.0000
>>>> > every time.
>>>> >
>>>> > I hope I made myself clear.
>>>> > Regards
>>>> >
>>>> >
>>>> > On Wed, Apr 17, 2013 at 11:57 AM, Edward J. Yoon <
>>>> edwardyoon@apache.org
>>>> > >wrote:
>>>> >
>>>> > > Thanks for your report.
>>>> > >
>>>> > > What's the meaning of 'all the values'? Please give me more details
>>>> > > about your problem.
>>>> > >
>>>> > > I didn't look at 'dangling links & aggregators' part of PageRank
>>>> > > example closely, but I think there's no bug. Aggregators is just used
>>>> > > for global communication. For example, finding max value[1] can be
>>>> > > done in only one iteration using MaxValueAggregator.
>>>> > >
>>>> > > 1.
>>>> http://cdn.dejanseo.com.au/wp-content/uploads/2011/06/supersteps.png
>>>> > >
>>>> > > On Wed, Apr 17, 2013 at 6:27 PM, Steven van Beelen <
>>>> smcvbeelen@gmail.com
>>>> > >
>>>> > > wrote:
>>>> > > > Hello,
>>>> > > >
>>>> > > > I'm creating my own pagerank in hama for a testing and I think I
>>>> found
>>>> > a
>>>> > > > problem with the AverageAggregator. I'm not sure if it is me or
>>>> the the
>>>> > > > AverageAggregator class in general, but I believe it just returns
>>>> the
>>>> > > mean
>>>> > > > of all the values instead of the average difference between the
>>>> old and
>>>> > > new
>>>> > > > value as intended.
>>>> > > >
>>>> > > > For testing, I created my own AbsDiffAggregator and
>>>> AverageAggregator
>>>> > > > classes, using FloatWritable instead of DoubleWritables. The same
>>>> > problem
>>>> > > > still occured: I got a mean of all the values in the graph instead
>>>> of
>>>> > an
>>>> > > > average difference.
>>>> > > >
>>>> > > > Could someone tell me if I'm doing something wrong or what I should
>>>> > > provide
>>>> > > > to better explain my problem?
>>>> > > >
>>>> > > > Regards,
>>>> > > > Steven van Beelen, Vrije Universiteit of Amsterdam
>>>> > >
>>>> > >
>>>> > >
>>>> > > --
>>>> > > Best Regards, Edward J. Yoon
>>>> > > @eddieyoon
>>>> > >
>>>> >
>>>>
>>>
>>>
>>



--
Best Regards, Edward J. Yoon
@eddieyoon

Re: Possible Aggregator Problem

Posted by Steven van Beelen <sm...@gmail.com>.
Could anyone tell me if I'm correct concerning the possible problem I
posted and replied on in the previous two emails?


On Wed, Apr 17, 2013 at 5:08 PM, Steven van Beelen <sm...@gmail.com>wrote:

> Additionally, I found this in the mail archives:
>
> http://mail-archives.apache.org/mod_mbox/hama-user/201210.mbox/%3CCAJ-=ys=W8F5W4aduV+=+yfsvh41xSa22-wNqQRKapadZD+QBag@mail.gmail.com%3E
> This actually exactly covers my point. Is this still considered as a bug,
> calling two different aggregate functions in a row?
>
>
> On Wed, Apr 17, 2013 at 2:35 PM, Steven van Beelen <sm...@gmail.com>wrote:
>
>> Hi Thomas,
>>
>> Then I guess I did not explain myself clearly.
>> What you describe is indeed how I think of the AverageAggregator to work,
>> but if I use the AverageAggregator in my own PageRank implementation it
>> does not return
>> the average of all absolute differences but just the average of the sum
>> of all values.
>>
>> The (very) small example graph I use has only five vertices, were the sum
>> of every vertice it's value is always 1.0.
>> When I use the AverageAggregator it will always return 0.2 when calling
>> the getLastAggregatedValue method.
>> It shouldn't do that right?
>>
>>
>> On Wed, Apr 17, 2013 at 1:18 PM, Thomas Jungblut <
>> thomas.jungblut@gmail.com> wrote:
>>
>>> Hi Steven,
>>>
>>> the AverageAggregator is used to determine the average of all absolute
>>> differences between old pagerank and new pagerank for every vertex.
>>> This is documented like it should behave in the javadoc of the given
>>> classes and suffices to track if pagerank values have yet converged or
>>> not.
>>>
>>> What you describe is a perfectly valid way to track the pagerank
>>> difference
>>> throughout all supersteps. But this is not how (imho) the
>>> AverageAggregator
>>> should behave, so you have to write your own.
>>>
>>>
>>> 2013/4/17 Steven van Beelen <sm...@gmail.com>
>>>
>>> > The values in my case are the DoubleWritable values each vertice has
>>> and
>>> > the aggregators aggregate on.
>>> > My tests showed that, when the aggregator was set to
>>> AverageAggregator, the
>>> > average of all the vertice values from the past compute step were
>>> returned.
>>> > Actually, AverageAggregator should return the average difference of
>>> all the
>>> > old-new value pairs of every vertice instead of the mean.
>>> > The average difference is then used to check whether convergence is
>>> > reached, which is relevant for all task ofcourse.
>>> >
>>> > Hence, the convergence point, for which the Aggregator is used, will
>>> not be
>>> > reached.
>>> > This thus makes it so that the algorithm will just run the maximum
>>> number
>>> > of iterations set (30 iterations on the PageRank example) in every
>>> case.
>>> > I experienced the same with my own PageRank implementation.
>>> >
>>> > I think it has something to do with the finalizeAggregation step taken.
>>> > Next to that, both the 'aggregate(VERTEX vertex, M value)' and
>>> > 'aggregate(VERTEX vertex, M oldValue, M newValue)' methods are called
>>> every
>>> > time, were one would think only the second (with old/new values) would
>>> > suffice.
>>> > Because of this, the global variable 'absoluteDifference' in the
>>> > 'AbsDiffAggregator' class is overwriten/overruled by the first
>>> aggregate.
>>> > Additionally, if one would make its own Aggregation class in the same
>>> > fashion as AbsDiffAggregator and AverageAggregator, but leave out the
>>> > 'aggregate(VERTEX vertex, M value)', my output turned out to be 0.0000
>>> > every time.
>>> >
>>> > I hope I made myself clear.
>>> > Regards
>>> >
>>> >
>>> > On Wed, Apr 17, 2013 at 11:57 AM, Edward J. Yoon <
>>> edwardyoon@apache.org
>>> > >wrote:
>>> >
>>> > > Thanks for your report.
>>> > >
>>> > > What's the meaning of 'all the values'? Please give me more details
>>> > > about your problem.
>>> > >
>>> > > I didn't look at 'dangling links & aggregators' part of PageRank
>>> > > example closely, but I think there's no bug. Aggregators is just used
>>> > > for global communication. For example, finding max value[1] can be
>>> > > done in only one iteration using MaxValueAggregator.
>>> > >
>>> > > 1.
>>> http://cdn.dejanseo.com.au/wp-content/uploads/2011/06/supersteps.png
>>> > >
>>> > > On Wed, Apr 17, 2013 at 6:27 PM, Steven van Beelen <
>>> smcvbeelen@gmail.com
>>> > >
>>> > > wrote:
>>> > > > Hello,
>>> > > >
>>> > > > I'm creating my own pagerank in hama for a testing and I think I
>>> found
>>> > a
>>> > > > problem with the AverageAggregator. I'm not sure if it is me or
>>> the the
>>> > > > AverageAggregator class in general, but I believe it just returns
>>> the
>>> > > mean
>>> > > > of all the values instead of the average difference between the
>>> old and
>>> > > new
>>> > > > value as intended.
>>> > > >
>>> > > > For testing, I created my own AbsDiffAggregator and
>>> AverageAggregator
>>> > > > classes, using FloatWritable instead of DoubleWritables. The same
>>> > problem
>>> > > > still occured: I got a mean of all the values in the graph instead
>>> of
>>> > an
>>> > > > average difference.
>>> > > >
>>> > > > Could someone tell me if I'm doing something wrong or what I should
>>> > > provide
>>> > > > to better explain my problem?
>>> > > >
>>> > > > Regards,
>>> > > > Steven van Beelen, Vrije Universiteit of Amsterdam
>>> > >
>>> > >
>>> > >
>>> > > --
>>> > > Best Regards, Edward J. Yoon
>>> > > @eddieyoon
>>> > >
>>> >
>>>
>>
>>
>

Re: Possible Aggregator Problem

Posted by Steven van Beelen <sm...@gmail.com>.
Additionally, I found this in the mail archives:
http://mail-archives.apache.org/mod_mbox/hama-user/201210.mbox/%3CCAJ-=ys=W8F5W4aduV+=+yfsvh41xSa22-wNqQRKapadZD+QBag@mail.gmail.com%3E
This actually exactly covers my point. Is this still considered as a bug,
calling two different aggregate functions in a row?


On Wed, Apr 17, 2013 at 2:35 PM, Steven van Beelen <sm...@gmail.com>wrote:

> Hi Thomas,
>
> Then I guess I did not explain myself clearly.
> What you describe is indeed how I think of the AverageAggregator to work,
> but if I use the AverageAggregator in my own PageRank implementation it
> does not return
> the average of all absolute differences but just the average of the sum of
> all values.
>
> The (very) small example graph I use has only five vertices, were the sum
> of every vertice it's value is always 1.0.
> When I use the AverageAggregator it will always return 0.2 when calling
> the getLastAggregatedValue method.
> It shouldn't do that right?
>
>
> On Wed, Apr 17, 2013 at 1:18 PM, Thomas Jungblut <
> thomas.jungblut@gmail.com> wrote:
>
>> Hi Steven,
>>
>> the AverageAggregator is used to determine the average of all absolute
>> differences between old pagerank and new pagerank for every vertex.
>> This is documented like it should behave in the javadoc of the given
>> classes and suffices to track if pagerank values have yet converged or
>> not.
>>
>> What you describe is a perfectly valid way to track the pagerank
>> difference
>> throughout all supersteps. But this is not how (imho) the
>> AverageAggregator
>> should behave, so you have to write your own.
>>
>>
>> 2013/4/17 Steven van Beelen <sm...@gmail.com>
>>
>> > The values in my case are the DoubleWritable values each vertice has and
>> > the aggregators aggregate on.
>> > My tests showed that, when the aggregator was set to AverageAggregator,
>> the
>> > average of all the vertice values from the past compute step were
>> returned.
>> > Actually, AverageAggregator should return the average difference of all
>> the
>> > old-new value pairs of every vertice instead of the mean.
>> > The average difference is then used to check whether convergence is
>> > reached, which is relevant for all task ofcourse.
>> >
>> > Hence, the convergence point, for which the Aggregator is used, will
>> not be
>> > reached.
>> > This thus makes it so that the algorithm will just run the maximum
>> number
>> > of iterations set (30 iterations on the PageRank example) in every case.
>> > I experienced the same with my own PageRank implementation.
>> >
>> > I think it has something to do with the finalizeAggregation step taken.
>> > Next to that, both the 'aggregate(VERTEX vertex, M value)' and
>> > 'aggregate(VERTEX vertex, M oldValue, M newValue)' methods are called
>> every
>> > time, were one would think only the second (with old/new values) would
>> > suffice.
>> > Because of this, the global variable 'absoluteDifference' in the
>> > 'AbsDiffAggregator' class is overwriten/overruled by the first
>> aggregate.
>> > Additionally, if one would make its own Aggregation class in the same
>> > fashion as AbsDiffAggregator and AverageAggregator, but leave out the
>> > 'aggregate(VERTEX vertex, M value)', my output turned out to be 0.0000
>> > every time.
>> >
>> > I hope I made myself clear.
>> > Regards
>> >
>> >
>> > On Wed, Apr 17, 2013 at 11:57 AM, Edward J. Yoon <edwardyoon@apache.org
>> > >wrote:
>> >
>> > > Thanks for your report.
>> > >
>> > > What's the meaning of 'all the values'? Please give me more details
>> > > about your problem.
>> > >
>> > > I didn't look at 'dangling links & aggregators' part of PageRank
>> > > example closely, but I think there's no bug. Aggregators is just used
>> > > for global communication. For example, finding max value[1] can be
>> > > done in only one iteration using MaxValueAggregator.
>> > >
>> > > 1.
>> http://cdn.dejanseo.com.au/wp-content/uploads/2011/06/supersteps.png
>> > >
>> > > On Wed, Apr 17, 2013 at 6:27 PM, Steven van Beelen <
>> smcvbeelen@gmail.com
>> > >
>> > > wrote:
>> > > > Hello,
>> > > >
>> > > > I'm creating my own pagerank in hama for a testing and I think I
>> found
>> > a
>> > > > problem with the AverageAggregator. I'm not sure if it is me or the
>> the
>> > > > AverageAggregator class in general, but I believe it just returns
>> the
>> > > mean
>> > > > of all the values instead of the average difference between the old
>> and
>> > > new
>> > > > value as intended.
>> > > >
>> > > > For testing, I created my own AbsDiffAggregator and
>> AverageAggregator
>> > > > classes, using FloatWritable instead of DoubleWritables. The same
>> > problem
>> > > > still occured: I got a mean of all the values in the graph instead
>> of
>> > an
>> > > > average difference.
>> > > >
>> > > > Could someone tell me if I'm doing something wrong or what I should
>> > > provide
>> > > > to better explain my problem?
>> > > >
>> > > > Regards,
>> > > > Steven van Beelen, Vrije Universiteit of Amsterdam
>> > >
>> > >
>> > >
>> > > --
>> > > Best Regards, Edward J. Yoon
>> > > @eddieyoon
>> > >
>> >
>>
>
>

Re: Possible Aggregator Problem

Posted by Steven van Beelen <sm...@gmail.com>.
Hi Thomas,

Then I guess I did not explain myself clearly.
What you describe is indeed how I think of the AverageAggregator to work,
but if I use the AverageAggregator in my own PageRank implementation it
does not return
the average of all absolute differences but just the average of the sum of
all values.

The (very) small example graph I use has only five vertices, were the sum
of every vertice it's value is always 1.0.
When I use the AverageAggregator it will always return 0.2 when calling the
getLastAggregatedValue method.
It shouldn't do that right?


On Wed, Apr 17, 2013 at 1:18 PM, Thomas Jungblut
<th...@gmail.com>wrote:

> Hi Steven,
>
> the AverageAggregator is used to determine the average of all absolute
> differences between old pagerank and new pagerank for every vertex.
> This is documented like it should behave in the javadoc of the given
> classes and suffices to track if pagerank values have yet converged or not.
>
> What you describe is a perfectly valid way to track the pagerank difference
> throughout all supersteps. But this is not how (imho) the AverageAggregator
> should behave, so you have to write your own.
>
>
> 2013/4/17 Steven van Beelen <sm...@gmail.com>
>
> > The values in my case are the DoubleWritable values each vertice has and
> > the aggregators aggregate on.
> > My tests showed that, when the aggregator was set to AverageAggregator,
> the
> > average of all the vertice values from the past compute step were
> returned.
> > Actually, AverageAggregator should return the average difference of all
> the
> > old-new value pairs of every vertice instead of the mean.
> > The average difference is then used to check whether convergence is
> > reached, which is relevant for all task ofcourse.
> >
> > Hence, the convergence point, for which the Aggregator is used, will not
> be
> > reached.
> > This thus makes it so that the algorithm will just run the maximum number
> > of iterations set (30 iterations on the PageRank example) in every case.
> > I experienced the same with my own PageRank implementation.
> >
> > I think it has something to do with the finalizeAggregation step taken.
> > Next to that, both the 'aggregate(VERTEX vertex, M value)' and
> > 'aggregate(VERTEX vertex, M oldValue, M newValue)' methods are called
> every
> > time, were one would think only the second (with old/new values) would
> > suffice.
> > Because of this, the global variable 'absoluteDifference' in the
> > 'AbsDiffAggregator' class is overwriten/overruled by the first aggregate.
> > Additionally, if one would make its own Aggregation class in the same
> > fashion as AbsDiffAggregator and AverageAggregator, but leave out the
> > 'aggregate(VERTEX vertex, M value)', my output turned out to be 0.0000
> > every time.
> >
> > I hope I made myself clear.
> > Regards
> >
> >
> > On Wed, Apr 17, 2013 at 11:57 AM, Edward J. Yoon <edwardyoon@apache.org
> > >wrote:
> >
> > > Thanks for your report.
> > >
> > > What's the meaning of 'all the values'? Please give me more details
> > > about your problem.
> > >
> > > I didn't look at 'dangling links & aggregators' part of PageRank
> > > example closely, but I think there's no bug. Aggregators is just used
> > > for global communication. For example, finding max value[1] can be
> > > done in only one iteration using MaxValueAggregator.
> > >
> > > 1.
> http://cdn.dejanseo.com.au/wp-content/uploads/2011/06/supersteps.png
> > >
> > > On Wed, Apr 17, 2013 at 6:27 PM, Steven van Beelen <
> smcvbeelen@gmail.com
> > >
> > > wrote:
> > > > Hello,
> > > >
> > > > I'm creating my own pagerank in hama for a testing and I think I
> found
> > a
> > > > problem with the AverageAggregator. I'm not sure if it is me or the
> the
> > > > AverageAggregator class in general, but I believe it just returns the
> > > mean
> > > > of all the values instead of the average difference between the old
> and
> > > new
> > > > value as intended.
> > > >
> > > > For testing, I created my own AbsDiffAggregator and AverageAggregator
> > > > classes, using FloatWritable instead of DoubleWritables. The same
> > problem
> > > > still occured: I got a mean of all the values in the graph instead of
> > an
> > > > average difference.
> > > >
> > > > Could someone tell me if I'm doing something wrong or what I should
> > > provide
> > > > to better explain my problem?
> > > >
> > > > Regards,
> > > > Steven van Beelen, Vrije Universiteit of Amsterdam
> > >
> > >
> > >
> > > --
> > > Best Regards, Edward J. Yoon
> > > @eddieyoon
> > >
> >
>

Re: Possible Aggregator Problem

Posted by Thomas Jungblut <th...@gmail.com>.
Hi Steven,

the AverageAggregator is used to determine the average of all absolute
differences between old pagerank and new pagerank for every vertex.
This is documented like it should behave in the javadoc of the given
classes and suffices to track if pagerank values have yet converged or not.

What you describe is a perfectly valid way to track the pagerank difference
throughout all supersteps. But this is not how (imho) the AverageAggregator
should behave, so you have to write your own.


2013/4/17 Steven van Beelen <sm...@gmail.com>

> The values in my case are the DoubleWritable values each vertice has and
> the aggregators aggregate on.
> My tests showed that, when the aggregator was set to AverageAggregator, the
> average of all the vertice values from the past compute step were returned.
> Actually, AverageAggregator should return the average difference of all the
> old-new value pairs of every vertice instead of the mean.
> The average difference is then used to check whether convergence is
> reached, which is relevant for all task ofcourse.
>
> Hence, the convergence point, for which the Aggregator is used, will not be
> reached.
> This thus makes it so that the algorithm will just run the maximum number
> of iterations set (30 iterations on the PageRank example) in every case.
> I experienced the same with my own PageRank implementation.
>
> I think it has something to do with the finalizeAggregation step taken.
> Next to that, both the 'aggregate(VERTEX vertex, M value)' and
> 'aggregate(VERTEX vertex, M oldValue, M newValue)' methods are called every
> time, were one would think only the second (with old/new values) would
> suffice.
> Because of this, the global variable 'absoluteDifference' in the
> 'AbsDiffAggregator' class is overwriten/overruled by the first aggregate.
> Additionally, if one would make its own Aggregation class in the same
> fashion as AbsDiffAggregator and AverageAggregator, but leave out the
> 'aggregate(VERTEX vertex, M value)', my output turned out to be 0.0000
> every time.
>
> I hope I made myself clear.
> Regards
>
>
> On Wed, Apr 17, 2013 at 11:57 AM, Edward J. Yoon <edwardyoon@apache.org
> >wrote:
>
> > Thanks for your report.
> >
> > What's the meaning of 'all the values'? Please give me more details
> > about your problem.
> >
> > I didn't look at 'dangling links & aggregators' part of PageRank
> > example closely, but I think there's no bug. Aggregators is just used
> > for global communication. For example, finding max value[1] can be
> > done in only one iteration using MaxValueAggregator.
> >
> > 1. http://cdn.dejanseo.com.au/wp-content/uploads/2011/06/supersteps.png
> >
> > On Wed, Apr 17, 2013 at 6:27 PM, Steven van Beelen <smcvbeelen@gmail.com
> >
> > wrote:
> > > Hello,
> > >
> > > I'm creating my own pagerank in hama for a testing and I think I found
> a
> > > problem with the AverageAggregator. I'm not sure if it is me or the the
> > > AverageAggregator class in general, but I believe it just returns the
> > mean
> > > of all the values instead of the average difference between the old and
> > new
> > > value as intended.
> > >
> > > For testing, I created my own AbsDiffAggregator and AverageAggregator
> > > classes, using FloatWritable instead of DoubleWritables. The same
> problem
> > > still occured: I got a mean of all the values in the graph instead of
> an
> > > average difference.
> > >
> > > Could someone tell me if I'm doing something wrong or what I should
> > provide
> > > to better explain my problem?
> > >
> > > Regards,
> > > Steven van Beelen, Vrije Universiteit of Amsterdam
> >
> >
> >
> > --
> > Best Regards, Edward J. Yoon
> > @eddieyoon
> >
>

Re: Possible Aggregator Problem

Posted by Steven van Beelen <sm...@gmail.com>.
The values in my case are the DoubleWritable values each vertice has and
the aggregators aggregate on.
My tests showed that, when the aggregator was set to AverageAggregator, the
average of all the vertice values from the past compute step were returned.
Actually, AverageAggregator should return the average difference of all the
old-new value pairs of every vertice instead of the mean.
The average difference is then used to check whether convergence is
reached, which is relevant for all task ofcourse.

Hence, the convergence point, for which the Aggregator is used, will not be
reached.
This thus makes it so that the algorithm will just run the maximum number
of iterations set (30 iterations on the PageRank example) in every case.
I experienced the same with my own PageRank implementation.

I think it has something to do with the finalizeAggregation step taken.
Next to that, both the 'aggregate(VERTEX vertex, M value)' and
'aggregate(VERTEX vertex, M oldValue, M newValue)' methods are called every
time, were one would think only the second (with old/new values) would
suffice.
Because of this, the global variable 'absoluteDifference' in the
'AbsDiffAggregator' class is overwriten/overruled by the first aggregate.
Additionally, if one would make its own Aggregation class in the same
fashion as AbsDiffAggregator and AverageAggregator, but leave out the
'aggregate(VERTEX vertex, M value)', my output turned out to be 0.0000
every time.

I hope I made myself clear.
Regards


On Wed, Apr 17, 2013 at 11:57 AM, Edward J. Yoon <ed...@apache.org>wrote:

> Thanks for your report.
>
> What's the meaning of 'all the values'? Please give me more details
> about your problem.
>
> I didn't look at 'dangling links & aggregators' part of PageRank
> example closely, but I think there's no bug. Aggregators is just used
> for global communication. For example, finding max value[1] can be
> done in only one iteration using MaxValueAggregator.
>
> 1. http://cdn.dejanseo.com.au/wp-content/uploads/2011/06/supersteps.png
>
> On Wed, Apr 17, 2013 at 6:27 PM, Steven van Beelen <sm...@gmail.com>
> wrote:
> > Hello,
> >
> > I'm creating my own pagerank in hama for a testing and I think I found a
> > problem with the AverageAggregator. I'm not sure if it is me or the the
> > AverageAggregator class in general, but I believe it just returns the
> mean
> > of all the values instead of the average difference between the old and
> new
> > value as intended.
> >
> > For testing, I created my own AbsDiffAggregator and AverageAggregator
> > classes, using FloatWritable instead of DoubleWritables. The same problem
> > still occured: I got a mean of all the values in the graph instead of an
> > average difference.
> >
> > Could someone tell me if I'm doing something wrong or what I should
> provide
> > to better explain my problem?
> >
> > Regards,
> > Steven van Beelen, Vrije Universiteit of Amsterdam
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>

Re: Possible Aggregator Problem

Posted by "Edward J. Yoon" <ed...@apache.org>.
Thanks for your report.

What's the meaning of 'all the values'? Please give me more details
about your problem.

I didn't look at 'dangling links & aggregators' part of PageRank
example closely, but I think there's no bug. Aggregators is just used
for global communication. For example, finding max value[1] can be
done in only one iteration using MaxValueAggregator.

1. http://cdn.dejanseo.com.au/wp-content/uploads/2011/06/supersteps.png

On Wed, Apr 17, 2013 at 6:27 PM, Steven van Beelen <sm...@gmail.com> wrote:
> Hello,
>
> I'm creating my own pagerank in hama for a testing and I think I found a
> problem with the AverageAggregator. I'm not sure if it is me or the the
> AverageAggregator class in general, but I believe it just returns the mean
> of all the values instead of the average difference between the old and new
> value as intended.
>
> For testing, I created my own AbsDiffAggregator and AverageAggregator
> classes, using FloatWritable instead of DoubleWritables. The same problem
> still occured: I got a mean of all the values in the graph instead of an
> average difference.
>
> Could someone tell me if I'm doing something wrong or what I should provide
> to better explain my problem?
>
> Regards,
> Steven van Beelen, Vrije Universiteit of Amsterdam



-- 
Best Regards, Edward J. Yoon
@eddieyoon