You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@giraph.apache.org by Matthew Saltz <sa...@gmail.com> on 2014/07/17 11:53:19 UTC

Aggregator result type different from aggregate input type

Hi everyone,

I'm trying to implement my own aggregator, whose aggregated value should be
a Map (for which I can use MapWritable) from an id (LongWritable) to a
custom defined type (which simply extends Writable) that contains several
aggregate metrics. I want vertices to be able to do something along the
lines of

aggregate(MY_MAP_AGGREGATOR, new MyAggregatorMessage(id, stat1, stat2));

and then the map aggregator will do something like

public void aggregate(MyAggregatorMessage m) {

    MapWritable currentMap = (MapWritable) getAggregatedValue();

    if (!currentMap.containsKey(m.getId())) {
        // MyAggregatorData contains the aggregate info I want to keep for
        // each id. Contains init. values for stat1 and stat2
        currentMap.put(m.getId(), new MyAggregatorData());
    }

    MyAggregatorData oldData = currentMap.get(m.getId());
    // Performs appropriate aggregates for each stat and stores it. Sum,
    // average, whatever
    oldData.aggregate(m.getStat1(), m.getStat2());
}

However, the problem is that the method signatures
<https://giraph.apache.org/apidocs/org/apache/giraph/aggregators/Aggregator.html>for
Aggregator all have to use the same type. In other words, I can't have

public MapWritable getAggregatedValue()

and

public void aggregate (MyAggregatorMessage m)

because the types are different.

My idea right now is to use a MyAggregatorWritable class that extends
GenericWritable
<http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/GenericWritable.html>
to
wrap both MyAggregatorMessage and MapWritable and then use that as the
method signature for both, and deal with the rest through casting. I've
already used GenericWritable for something else so the implementation would
be straightforward.

So, I have a few principle questions, I suppose:

1) Is there a better way to implement this than to use a GenericWritable as
described above? If any of you have code for your own way to do this, I'd
love to see it, and if not, I'd love to contribute what I come up with as a
MapAggregator (in a generic manner) to the Giraph project if that would be
appropriate.

2) Is there anything wrong in principle with this type of solution? In
other words, is there some kind of philosophical or design reason that
having a Map as an aggregator is a bad idea? I know that it might not end
up being very efficient, but as it stands, I'm not seeing any other
solution to my problem; if there's an ordinary kind of workaround that
would be more efficient I'd love to hear it.

3) [Less important and more discussion oriented] Why is the API designed
such that these methods must use the same type? It seems like having an
Aggregator<Message, Result> would be useful.

I apologize for the quite long message, and I appreciate any help you can
offer. If you need any other information, please let me know and I'll be
happy to provide it. In trying to simplify everything I easily could have
made a mistake or left out something important.  Thanks in advance.

Best,
Matthew
http://www.matthewsaltz.com

Re: Aggregator result type different from aggregate input type

Posted by Matthew Saltz <sa...@gmail.com>.
Also, I just thought of another possibility/question:

Is there any way to dynamically register aggregators? In other words,
instead of doing a Map, it would be ideal to just be able to register an
aggregator for each id, but on the fly, since I don't know what all the ids
will be in advance.

Thanks again for the help.

Matthew


On Thu, Jul 17, 2014 at 11:53 AM, Matthew Saltz <sa...@gmail.com> wrote:

> Hi everyone,
>
> I'm trying to implement my own aggregator, whose aggregated value should
> be a Map (for which I can use MapWritable) from an id (LongWritable) to a
> custom defined type (which simply extends Writable) that contains several
> aggregate metrics. I want vertices to be able to do something along the
> lines of
>
> aggregate(MY_MAP_AGGREGATOR, new MyAggregatorMessage(id, stat1, stat2));
>
> and then the map aggregator will do something like
>
> public void aggregate(MyAggregatorMessage m) {
>
>     MapWritable currentMap = (MapWritable) getAggregatedValue();
>
>     if (!currentMap.containsKey(m.getId())) {
>         // MyAggregatorData contains the aggregate info I want to keep for
>         // each id. Contains init. values for stat1 and stat2
>         currentMap.put(m.getId(), new MyAggregatorData());
>     }
>
>     MyAggregatorData oldData = currentMap.get(m.getId());
>     // Performs appropriate aggregates for each stat and stores it. Sum,
>     // average, whatever
>     oldData.aggregate(m.getStat1(), m.getStat2());
> }
>
> However, the problem is that the method signatures
> <https://giraph.apache.org/apidocs/org/apache/giraph/aggregators/Aggregator.html>for
> Aggregator all have to use the same type. In other words, I can't have
>
> public MapWritable getAggregatedValue()
>
> and
>
> public void aggregate (MyAggregatorMessage m)
>
> because the types are different.
>
> My idea right now is to use a MyAggregatorWritable class that extends
> GenericWritable
> <http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/GenericWritable.html> to
> wrap both MyAggregatorMessage and MapWritable and then use that as the
> method signature for both, and deal with the rest through casting. I've
> already used GenericWritable for something else so the implementation would
> be straightforward.
>
> So, I have a few principle questions, I suppose:
>
> 1) Is there a better way to implement this than to use a GenericWritable
> as described above? If any of you have code for your own way to do this,
> I'd love to see it, and if not, I'd love to contribute what I come up with
> as a MapAggregator (in a generic manner) to the Giraph project if that
> would be appropriate.
>
> 2) Is there anything wrong in principle with this type of solution? In
> other words, is there some kind of philosophical or design reason that
> having a Map as an aggregator is a bad idea? I know that it might not end
> up being very efficient, but as it stands, I'm not seeing any other
> solution to my problem; if there's an ordinary kind of workaround that
> would be more efficient I'd love to hear it.
>
> 3) [Less important and more discussion oriented] Why is the API designed
> such that these methods must use the same type? It seems like having an
> Aggregator<Message, Result> would be useful.
>
> I apologize for the quite long message, and I appreciate any help you can
> offer. If you need any other information, please let me know and I'll be
> happy to provide it. In trying to simplify everything I easily could have
> made a mistake or left out something important.  Thanks in advance.
>
> Best,
> Matthew
> http://www.matthewsaltz.com
>
>