You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@giraph.apache.org by Matthew Cornell <ma...@matthewcornell.org> on 2014/09/16 21:42:35 UTC

how do I maintain a cached List across supersteps?

Hi Folks. I have a custom argument that's passed into my Giraph job that
needs parsing. The parsed value is accessed by my Vertex#compute. To avoid
excessive GC I'd like to cache the parsing results. What's a good way to do
so? I looked at using the ImmutableClassesGiraphConfiguration returned by
getConf(), but it supports only String properties. I looked at using my
custom MasterCompute to manage it, but I couldn't find how to access the
master compute instance from the vertex. My last idea is to use (abuse?) an
aggregator to do this. I'd appreciate your thoughts! -- matt

-- 
Matthew Cornell | matt@matthewcornell.org | 413-626-3621 | 34 Dickinson
Street, Amherst MA 01002 | matthewcornell.org

Re: how do I maintain a cached List across supersteps?

Posted by Matthew Saltz <sa...@gmail.com>.

Hey Matt,

If you need to share data between all vertices that persists across
supersteps and that's created/determined at runtime, I believe an
aggregator is the best way to do this. Then you can declare an instance
variable within the Computation class and within the preSuperstep method of
the class you can use getAggregatedValue to set the value of the variable.
Unless you can afford to just reparse the argument within the Computation
class at each superstep, in which case you can use getConf() and give that
a try.

Best,
Matthew
El 16/09/2014 21:43, "Matthew Cornell" <ma...@matthewcornell.org> escribió:

> Hi Folks. I have a custom argument that's passed into my Giraph job that
> needs parsing. The parsed value is accessed by my Vertex#compute. To avoid
> excessive GC I'd like to cache the parsing results. What's a good way to do
> so? I looked at using the ImmutableClassesGiraphConfiguration returned by
> getConf(), but it supports only String properties. I looked at using my
> custom MasterCompute to manage it, but I couldn't find how to access the
> master compute instance from the vertex. My last idea is to use (abuse?) an
> aggregator to do this. I'd appreciate your thoughts! -- matt
>
> --
> Matthew Cornell | matt@matthewcornell.org | 413-626-3621 | 34 Dickinson
> Street, Amherst MA 01002 | matthewcornell.org
>

Re: how do I maintain a cached List across supersteps?

Posted by Matthew Cornell <ma...@matthewcornell.org>.

Thanks to Claudio and Matthew, I went with the WorkerContext solution. Note
that I wrote a MasterCompute.validate() to verify the correct WorkerContext
class was set. Otherwise I was worried my cast would fail. -- matt

On Wed, Sep 17, 2014 at 11:49 AM, Claudio Martella <
claudio.martella@gmail.com> wrote:

> I would use a workercontext, it is shared and persistent during
> computation by all vertices in a worker. If it's readonly, you won't have
> to manage concurrency.
>
> On Tue, Sep 16, 2014 at 9:42 PM, Matthew Cornell <ma...@matthewcornell.org>
> wrote:
>
>> Hi Folks. I have a custom argument that's passed into my Giraph job that
>> needs parsing. The parsed value is accessed by my Vertex#compute. To avoid
>> excessive GC I'd like to cache the parsing results. What's a good way to do
>> so? I looked at using the ImmutableClassesGiraphConfiguration returned by
>> getConf(), but it supports only String properties. I looked at using my
>> custom MasterCompute to manage it, but I couldn't find how to access the
>> master compute instance from the vertex. My last idea is to use (abuse?) an
>> aggregator to do this. I'd appreciate your thoughts! -- matt
>>
>> --
>> Matthew Cornell | matt@matthewcornell.org | 413-626-3621 | 34 Dickinson
>> Street, Amherst MA 01002 | matthewcornell.org
>>
>
>
>
> --
>    Claudio Martella
>
>



-- 
Matthew Cornell | matt@matthewcornell.org | 413-626-3621 | 34 Dickinson
Street, Amherst MA 01002 | matthewcornell.org

Re: how do I maintain a cached List across supersteps?

Posted by Claudio Martella <cl...@gmail.com>.

I would use a workercontext, it is shared and persistent during computation
by all vertices in a worker. If it's readonly, you won't have to manage
concurrency.

On Tue, Sep 16, 2014 at 9:42 PM, Matthew Cornell <ma...@matthewcornell.org>
wrote:

> Hi Folks. I have a custom argument that's passed into my Giraph job that
> needs parsing. The parsed value is accessed by my Vertex#compute. To avoid
> excessive GC I'd like to cache the parsing results. What's a good way to do
> so? I looked at using the ImmutableClassesGiraphConfiguration returned by
> getConf(), but it supports only String properties. I looked at using my
> custom MasterCompute to manage it, but I couldn't find how to access the
> master compute instance from the vertex. My last idea is to use (abuse?) an
> aggregator to do this. I'd appreciate your thoughts! -- matt
>
> --
> Matthew Cornell | matt@matthewcornell.org | 413-626-3621 | 34 Dickinson
> Street, Amherst MA 01002 | matthewcornell.org
>



-- 
   Claudio Martella