You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@giraph.apache.org by Igor Kabiljo <ik...@gmail.com> on 2015/06/03 04:25:13 UTC

Introducing Blocks Framework for Giraph

Heads up, as I will be submitting Blocks Framework to Giraph, so in the
next few days/weeks you will see large number of JIRAs, diffs and new files
related to it.

Blocks Framework is a new API for writing applications, that tries to
address many deficiencies of current Giraph API, while keeping all of its
functionality and performance. Execution is still the same, and Blocks
Framework API translates internally to Giraph API itself.

As we were developing more complex applications, we were finding that it
was very hard to encapsulate parts of the application, and to compose
multiple parts, within current API. And that is important throughout, from
extending and managing complex application, to being able to build
libraries.
For example, let's say we want to write a simple utility that will compute
total edge weight of the graph, and log it on master. First there is no way
to encapsulate that logic - we would have to have some code on the master
(to register aggregator, and to log the result), and we would have to have
a computation that will aggregate values for each vertex. Since
MasterCompute cannot be changed, at best, we could have two functions:
masterComputeStep1 to register aggregator and change computation, and
masterComputeStep2 to log the result. At minimum, we are leaking how many
supersteps this computation takes, computation would need to match message
types before and after it. Now let's say we have two such utilities, and we
want to compose them - there is no way to do so. If we want to modify
existing application to do this at the beginning, before it's own logic -
there is no easy way to do so.

Blocks Framework is primarily trying to solve that problem - by having all
code being written in "Blocks" (as block of code in programming). You can
compose multiple blocks together, to get a more complex logic (for example
new SequenceBlock(block1, block2), or new RepeatBlock(10, iterBlock)). Each
application is represented by a single complex Block. Each block is an
iterator over individual steps - called Pieces (each Piece is itself a
Block). Each Piece encapsulates one message sending, and master compute in
between (so "send" half of one superstep, master compute and "receive" half
of the next superstep).

There is of course much more to it, and a lot of details, which you will be
able to see from diffs and JIRAs.

If any of these bullet points resonate to being painful with you, you are
going to enjoy new framework:
- checking getSuperstep to decide which logic to execute throughout the
code, or keep track of where you are
- changing computation, trying to match computation message types
- having multiple consecutive Giraph jobs, because it is too complex to
have it all in one
- having multiple concrete computation classes, just to support different
vertex/edges types
- wanting to combine multiple algorithms inside single Giraph job

We have developed this framework within Facebook within the last year or
so, and it is mature enough by now to be shared with the community.

Thanks,
Igor

RE: Introducing Blocks Framework for Giraph

Posted by Pavan Kumar A <pa...@outlook.com>.

Way to go Igor! Good luck.
> Date: Tue, 2 Jun 2015 19:25:13 -0700
> Subject: Introducing Blocks Framework for Giraph
> From: ikabiljo@gmail.com
> To: dev@giraph.apache.org
> 
> Heads up, as I will be submitting Blocks Framework to Giraph, so in the
> next few days/weeks you will see large number of JIRAs, diffs and new files
> related to it.
> 
> Blocks Framework is a new API for writing applications, that tries to
> address many deficiencies of current Giraph API, while keeping all of its
> functionality and performance. Execution is still the same, and Blocks
> Framework API translates internally to Giraph API itself.
> 
> As we were developing more complex applications, we were finding that it
> was very hard to encapsulate parts of the application, and to compose
> multiple parts, within current API. And that is important throughout, from
> extending and managing complex application, to being able to build
> libraries.
> For example, let's say we want to write a simple utility that will compute
> total edge weight of the graph, and log it on master. First there is no way
> to encapsulate that logic - we would have to have some code on the master
> (to register aggregator, and to log the result), and we would have to have
> a computation that will aggregate values for each vertex. Since
> MasterCompute cannot be changed, at best, we could have two functions:
> masterComputeStep1 to register aggregator and change computation, and
> masterComputeStep2 to log the result. At minimum, we are leaking how many
> supersteps this computation takes, computation would need to match message
> types before and after it. Now let's say we have two such utilities, and we
> want to compose them - there is no way to do so. If we want to modify
> existing application to do this at the beginning, before it's own logic -
> there is no easy way to do so.
> 
> Blocks Framework is primarily trying to solve that problem - by having all
> code being written in "Blocks" (as block of code in programming). You can
> compose multiple blocks together, to get a more complex logic (for example
> new SequenceBlock(block1, block2), or new RepeatBlock(10, iterBlock)). Each
> application is represented by a single complex Block. Each block is an
> iterator over individual steps - called Pieces (each Piece is itself a
> Block). Each Piece encapsulates one message sending, and master compute in
> between (so "send" half of one superstep, master compute and "receive" half
> of the next superstep).
> 
> There is of course much more to it, and a lot of details, which you will be
> able to see from diffs and JIRAs.
> 
> If any of these bullet points resonate to being painful with you, you are
> going to enjoy new framework:
> - checking getSuperstep to decide which logic to execute throughout the
> code, or keep track of where you are
> - changing computation, trying to match computation message types
> - having multiple consecutive Giraph jobs, because it is too complex to
> have it all in one
> - having multiple concrete computation classes, just to support different
> vertex/edges types
> - wanting to combine multiple algorithms inside single Giraph job
> 
> We have developed this framework within Facebook within the last year or
> so, and it is mature enough by now to be shared with the community.
> 
> Thanks,
> Igor