You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@giraph.apache.org by Jay Hutfles <ja...@gmail.com> on 2013/04/18 16:06:58 UTC

Network flow problems in Giraph

I'm new to Giraph, but am interested in its applications to classic network
flow problems, specifically max flow or min cost problems.  I've looked for
BSP implementations of algorithms for these problems, but I can't find any
discussion regarding this online.  Has anyone had luck implementing such
problems?


The max flow problem seems like it should be adaptable to the BSP model.
 The flow augmenting algorithm developed by Ford and Fulkerson is
essentially:

while the graph contains a path over which flow could be increased,
   increase flow for arcs on the path

Identifying the flow augmenting paths is a simple labeling algorithm, but
I'm not sure how I'd implement the "while the graph contains ..."
condition.  Is that a super step *above* the labeling algorithm's super
steps?


And I have no idea how to start the min cost algorithm.  Anyone have ideas
for how to formulate this?

Thanks for your time, and for the great work on Giraph!

Re: Network flow problems in Giraph

Posted by Jay Hutfles <ja...@gmail.com>.
That makes perfect sense, Claudio.  I was missing the need for the
algorithm to exclude shared global data, and how that's also required for
it to be vertex-centric...  I was naively thinking that it would be
feasible if the logic for graph alterations and messaging could be
determined locally, but there's obviously data behind the logic and
alterations.  That data must ALSO be local to the vertex.  Is that a fair
summary?

I think I'm having the most difficulty overcoming years of thinking
affected by Operations Research and Data Structures, where techniques like
breadth-first search are avoided for more complicated data structures or
techniques requiring tree pivots or network simplex methods...

To be honest, my first opinion of Pregel and Giraph was that they'd be
mainly useful for Coway's Life or Langdon's Ants.  Ever since, I've been
hoping to find a way to leverage it as a better OR/optimization tool.  Kind
of how actor-based concurrency really changed my outlook on parallel
programming, I'm hoping vertex-centric thinking changes how I think about
graph algorithms.

On a completely unrelated note, is there another term for "vertex-centric
thinking?"  I've been typing v-centric, and have gotten lazy and refer to
writing algorithms in "fifth person" (due to the V, and roman numerals...)




On Sat, Apr 20, 2013 at 12:42 PM, Claudio Martella <
claudio.martella@gmail.com> wrote:

> One disclaimer first: not all iterative algorithms fit well to Giraph.
> Specifically to your example, Dijkstra's method for single source shortest
> path does not. The main general reason is that it does not scale in
> parallel, and specifically to Giraph because it requires centralised shared
> data. In fact, it makes more sense to run a less-efficient but massively
> parallelizable algorithm: Breadth-first search (BFS).
>
> Coming to maxflow problem, you'd have to look for one of the algorithms
> that is iterative and fits easily to a local vertex-centric view (where
> vertices exchange information through messages). Quite honestly, I haven't
> had the need/chance to look at a maxflow problem till now, so I don't have
> a solution for you right now. After a quick look, it appears to me that
> http://en.wikipedia.org/wiki/Dinitz_blocking_flow_algorithm *could* be a
> good starting point.
>
>
>
>
> On Thu, Apr 18, 2013 at 4:41 PM, Jay Hutfles <ja...@gmail.com> wrote:
>
>> Actually, I think a max flow problem fits exactly with the batch
>> processing model you describe.  You are given a massive graph (with
>> predefined maximum flows along the edges between vertices), you run a
>> program, it produces an output (i.e. the flow along each edge) and it
>> terminates.  It's not necessarily adapting to any new inputs as it runs.
>>  But it is an iterative process.  Or, at least, many algorithms are.
>>
>> I don't see it as being that different from Djikstra's Method for finding
>> the shortest distance between two nodes on a graph.  Each super step is
>> updating the labels along the graph, and when all notes are labelled as
>> done, the algorithm finishes.  A max flow problem could be implemented
>> likewise, since there are labeling algorithms for determining the max flow
>> along a graph.
>>
>> See http://en.wikipedia.org/wiki/Maximum_flow_problem, specifically the
>> Solutions section.  I think it should help clarify.
>>
>>
>>
>> On Thu, Apr 18, 2013 at 9:16 AM, André Kelpe <
>> efeshundertelf@googlemail.com> wrote:
>>
>>> Hi Jay,
>>>
>>> this sounds like a continuous operation to me. Giraph is meant for
>>> batch processing of massive graphs, which produces an output after a
>>> successful run. You run a program, it produces an output, it
>>> terminates. From what I understand, a stream processing framework like
>>> storm (https://github.com/nathanmarz/storm/wiki/Rationale) could be a
>>> better fit for this. Please let me know, if I am missing something.
>>>
>>> André
>>>
>>> 2013/4/18 Jay Hutfles <ja...@gmail.com>:
>>> > I'm new to Giraph, but am interested in its applications to classic
>>> network
>>> > flow problems, specifically max flow or min cost problems.  I've
>>> looked for
>>> > BSP implementations of algorithms for these problems, but I can't find
>>> any
>>> > discussion regarding this online.  Has anyone had luck implementing
>>> such
>>> > problems?
>>> >
>>> >
>>> > The max flow problem seems like it should be adaptable to the BSP
>>> model.
>>> > The flow augmenting algorithm developed by Ford and Fulkerson is
>>> > essentially:
>>> >
>>> > while the graph contains a path over which flow could be increased,
>>> >    increase flow for arcs on the path
>>> >
>>> > Identifying the flow augmenting paths is a simple labeling algorithm,
>>> but
>>> > I'm not sure how I'd implement the "while the graph contains ..."
>>> condition.
>>> > Is that a super step above the labeling algorithm's super steps?
>>> >
>>> >
>>> > And I have no idea how to start the min cost algorithm.  Anyone have
>>> ideas
>>> > for how to formulate this?
>>> >
>>> > Thanks for your time, and for the great work on Giraph!
>>>
>>
>>
>
>
> --
>    Claudio Martella
>    claudio.martella@gmail.com
>

Re: Network flow problems in Giraph

Posted by Eli Reisman <ap...@gmail.com>.
Great point Claudio, I had considered this in regards to Dijkstra/BFS
before when considering our own shortest paths example, but have not looked
at maxflow myself. That makes a lot of sense. Considering our earlier phone
call, this sort of thing brings up ideas about how to best make clear in
documentation not just what Giraph can do, but how it should do it. Lots to
think about here...



On Sat, Apr 20, 2013 at 1:42 PM, Claudio Martella <
claudio.martella@gmail.com> wrote:

> One disclaimer first: not all iterative algorithms fit well to Giraph.
> Specifically to your example, Dijkstra's method for single source shortest
> path does not. The main general reason is that it does not scale in
> parallel, and specifically to Giraph because it requires centralised shared
> data. In fact, it makes more sense to run a less-efficient but massively
> parallelizable algorithm: Breadth-first search (BFS).
>
> Coming to maxflow problem, you'd have to look for one of the algorithms
> that is iterative and fits easily to a local vertex-centric view (where
> vertices exchange information through messages). Quite honestly, I haven't
> had the need/chance to look at a maxflow problem till now, so I don't have
> a solution for you right now. After a quick look, it appears to me that
> http://en.wikipedia.org/wiki/Dinitz_blocking_flow_algorithm *could* be a
> good starting point.
>
>
>
>
> On Thu, Apr 18, 2013 at 4:41 PM, Jay Hutfles <ja...@gmail.com> wrote:
>
>> Actually, I think a max flow problem fits exactly with the batch
>> processing model you describe.  You are given a massive graph (with
>> predefined maximum flows along the edges between vertices), you run a
>> program, it produces an output (i.e. the flow along each edge) and it
>> terminates.  It's not necessarily adapting to any new inputs as it runs.
>>  But it is an iterative process.  Or, at least, many algorithms are.
>>
>> I don't see it as being that different from Djikstra's Method for finding
>> the shortest distance between two nodes on a graph.  Each super step is
>> updating the labels along the graph, and when all notes are labelled as
>> done, the algorithm finishes.  A max flow problem could be implemented
>> likewise, since there are labeling algorithms for determining the max flow
>> along a graph.
>>
>> See http://en.wikipedia.org/wiki/Maximum_flow_problem, specifically the
>> Solutions section.  I think it should help clarify.
>>
>>
>>
>> On Thu, Apr 18, 2013 at 9:16 AM, André Kelpe <
>> efeshundertelf@googlemail.com> wrote:
>>
>>> Hi Jay,
>>>
>>> this sounds like a continuous operation to me. Giraph is meant for
>>> batch processing of massive graphs, which produces an output after a
>>> successful run. You run a program, it produces an output, it
>>> terminates. From what I understand, a stream processing framework like
>>> storm (https://github.com/nathanmarz/storm/wiki/Rationale) could be a
>>> better fit for this. Please let me know, if I am missing something.
>>>
>>> André
>>>
>>> 2013/4/18 Jay Hutfles <ja...@gmail.com>:
>>> > I'm new to Giraph, but am interested in its applications to classic
>>> network
>>> > flow problems, specifically max flow or min cost problems.  I've
>>> looked for
>>> > BSP implementations of algorithms for these problems, but I can't find
>>> any
>>> > discussion regarding this online.  Has anyone had luck implementing
>>> such
>>> > problems?
>>> >
>>> >
>>> > The max flow problem seems like it should be adaptable to the BSP
>>> model.
>>> > The flow augmenting algorithm developed by Ford and Fulkerson is
>>> > essentially:
>>> >
>>> > while the graph contains a path over which flow could be increased,
>>> >    increase flow for arcs on the path
>>> >
>>> > Identifying the flow augmenting paths is a simple labeling algorithm,
>>> but
>>> > I'm not sure how I'd implement the "while the graph contains ..."
>>> condition.
>>> > Is that a super step above the labeling algorithm's super steps?
>>> >
>>> >
>>> > And I have no idea how to start the min cost algorithm.  Anyone have
>>> ideas
>>> > for how to formulate this?
>>> >
>>> > Thanks for your time, and for the great work on Giraph!
>>>
>>
>>
>
>
> --
>    Claudio Martella
>    claudio.martella@gmail.com
>

Re: Network flow problems in Giraph

Posted by Claudio Martella <cl...@gmail.com>.
One disclaimer first: not all iterative algorithms fit well to Giraph.
Specifically to your example, Dijkstra's method for single source shortest
path does not. The main general reason is that it does not scale in
parallel, and specifically to Giraph because it requires centralised shared
data. In fact, it makes more sense to run a less-efficient but massively
parallelizable algorithm: Breadth-first search (BFS).

Coming to maxflow problem, you'd have to look for one of the algorithms
that is iterative and fits easily to a local vertex-centric view (where
vertices exchange information through messages). Quite honestly, I haven't
had the need/chance to look at a maxflow problem till now, so I don't have
a solution for you right now. After a quick look, it appears to me that
http://en.wikipedia.org/wiki/Dinitz_blocking_flow_algorithm *could* be a
good starting point.




On Thu, Apr 18, 2013 at 4:41 PM, Jay Hutfles <ja...@gmail.com> wrote:

> Actually, I think a max flow problem fits exactly with the batch
> processing model you describe.  You are given a massive graph (with
> predefined maximum flows along the edges between vertices), you run a
> program, it produces an output (i.e. the flow along each edge) and it
> terminates.  It's not necessarily adapting to any new inputs as it runs.
>  But it is an iterative process.  Or, at least, many algorithms are.
>
> I don't see it as being that different from Djikstra's Method for finding
> the shortest distance between two nodes on a graph.  Each super step is
> updating the labels along the graph, and when all notes are labelled as
> done, the algorithm finishes.  A max flow problem could be implemented
> likewise, since there are labeling algorithms for determining the max flow
> along a graph.
>
> See http://en.wikipedia.org/wiki/Maximum_flow_problem, specifically the
> Solutions section.  I think it should help clarify.
>
>
>
> On Thu, Apr 18, 2013 at 9:16 AM, André Kelpe <
> efeshundertelf@googlemail.com> wrote:
>
>> Hi Jay,
>>
>> this sounds like a continuous operation to me. Giraph is meant for
>> batch processing of massive graphs, which produces an output after a
>> successful run. You run a program, it produces an output, it
>> terminates. From what I understand, a stream processing framework like
>> storm (https://github.com/nathanmarz/storm/wiki/Rationale) could be a
>> better fit for this. Please let me know, if I am missing something.
>>
>> André
>>
>> 2013/4/18 Jay Hutfles <ja...@gmail.com>:
>> > I'm new to Giraph, but am interested in its applications to classic
>> network
>> > flow problems, specifically max flow or min cost problems.  I've looked
>> for
>> > BSP implementations of algorithms for these problems, but I can't find
>> any
>> > discussion regarding this online.  Has anyone had luck implementing such
>> > problems?
>> >
>> >
>> > The max flow problem seems like it should be adaptable to the BSP model.
>> > The flow augmenting algorithm developed by Ford and Fulkerson is
>> > essentially:
>> >
>> > while the graph contains a path over which flow could be increased,
>> >    increase flow for arcs on the path
>> >
>> > Identifying the flow augmenting paths is a simple labeling algorithm,
>> but
>> > I'm not sure how I'd implement the "while the graph contains ..."
>> condition.
>> > Is that a super step above the labeling algorithm's super steps?
>> >
>> >
>> > And I have no idea how to start the min cost algorithm.  Anyone have
>> ideas
>> > for how to formulate this?
>> >
>> > Thanks for your time, and for the great work on Giraph!
>>
>
>


-- 
   Claudio Martella
   claudio.martella@gmail.com

Re: Network flow problems in Giraph

Posted by Eli Reisman <ap...@gmail.com>.
I would point this question at Sebastian or Claudio, but my instict is yes
this is a good fit for Giraph.


On Thu, Apr 18, 2013 at 10:41 AM, Jay Hutfles <ja...@gmail.com> wrote:

> Actually, I think a max flow problem fits exactly with the batch
> processing model you describe.  You are given a massive graph (with
> predefined maximum flows along the edges between vertices), you run a
> program, it produces an output (i.e. the flow along each edge) and it
> terminates.  It's not necessarily adapting to any new inputs as it runs.
>  But it is an iterative process.  Or, at least, many algorithms are.
>
> I don't see it as being that different from Djikstra's Method for finding
> the shortest distance between two nodes on a graph.  Each super step is
> updating the labels along the graph, and when all notes are labelled as
> done, the algorithm finishes.  A max flow problem could be implemented
> likewise, since there are labeling algorithms for determining the max flow
> along a graph.
>
> See http://en.wikipedia.org/wiki/Maximum_flow_problem, specifically the
> Solutions section.  I think it should help clarify.
>
>
>
> On Thu, Apr 18, 2013 at 9:16 AM, André Kelpe <
> efeshundertelf@googlemail.com> wrote:
>
>> Hi Jay,
>>
>> this sounds like a continuous operation to me. Giraph is meant for
>> batch processing of massive graphs, which produces an output after a
>> successful run. You run a program, it produces an output, it
>> terminates. From what I understand, a stream processing framework like
>> storm (https://github.com/nathanmarz/storm/wiki/Rationale) could be a
>> better fit for this. Please let me know, if I am missing something.
>>
>> André
>>
>> 2013/4/18 Jay Hutfles <ja...@gmail.com>:
>> > I'm new to Giraph, but am interested in its applications to classic
>> network
>> > flow problems, specifically max flow or min cost problems.  I've looked
>> for
>> > BSP implementations of algorithms for these problems, but I can't find
>> any
>> > discussion regarding this online.  Has anyone had luck implementing such
>> > problems?
>> >
>> >
>> > The max flow problem seems like it should be adaptable to the BSP model.
>> > The flow augmenting algorithm developed by Ford and Fulkerson is
>> > essentially:
>> >
>> > while the graph contains a path over which flow could be increased,
>> >    increase flow for arcs on the path
>> >
>> > Identifying the flow augmenting paths is a simple labeling algorithm,
>> but
>> > I'm not sure how I'd implement the "while the graph contains ..."
>> condition.
>> > Is that a super step above the labeling algorithm's super steps?
>> >
>> >
>> > And I have no idea how to start the min cost algorithm.  Anyone have
>> ideas
>> > for how to formulate this?
>> >
>> > Thanks for your time, and for the great work on Giraph!
>>
>
>

Re: Network flow problems in Giraph

Posted by Jay Hutfles <ja...@gmail.com>.
Actually, I think a max flow problem fits exactly with the batch processing
model you describe.  You are given a massive graph (with predefined maximum
flows along the edges between vertices), you run a program, it produces an
output (i.e. the flow along each edge) and it terminates.  It's not
necessarily adapting to any new inputs as it runs.  But it is an iterative
process.  Or, at least, many algorithms are.

I don't see it as being that different from Djikstra's Method for finding
the shortest distance between two nodes on a graph.  Each super step is
updating the labels along the graph, and when all notes are labelled as
done, the algorithm finishes.  A max flow problem could be implemented
likewise, since there are labeling algorithms for determining the max flow
along a graph.

See http://en.wikipedia.org/wiki/Maximum_flow_problem, specifically the
Solutions section.  I think it should help clarify.



On Thu, Apr 18, 2013 at 9:16 AM, André Kelpe
<ef...@googlemail.com>wrote:

> Hi Jay,
>
> this sounds like a continuous operation to me. Giraph is meant for
> batch processing of massive graphs, which produces an output after a
> successful run. You run a program, it produces an output, it
> terminates. From what I understand, a stream processing framework like
> storm (https://github.com/nathanmarz/storm/wiki/Rationale) could be a
> better fit for this. Please let me know, if I am missing something.
>
> André
>
> 2013/4/18 Jay Hutfles <ja...@gmail.com>:
> > I'm new to Giraph, but am interested in its applications to classic
> network
> > flow problems, specifically max flow or min cost problems.  I've looked
> for
> > BSP implementations of algorithms for these problems, but I can't find
> any
> > discussion regarding this online.  Has anyone had luck implementing such
> > problems?
> >
> >
> > The max flow problem seems like it should be adaptable to the BSP model.
> > The flow augmenting algorithm developed by Ford and Fulkerson is
> > essentially:
> >
> > while the graph contains a path over which flow could be increased,
> >    increase flow for arcs on the path
> >
> > Identifying the flow augmenting paths is a simple labeling algorithm, but
> > I'm not sure how I'd implement the "while the graph contains ..."
> condition.
> > Is that a super step above the labeling algorithm's super steps?
> >
> >
> > And I have no idea how to start the min cost algorithm.  Anyone have
> ideas
> > for how to formulate this?
> >
> > Thanks for your time, and for the great work on Giraph!
>

Re: Network flow problems in Giraph

Posted by André Kelpe <ef...@googlemail.com>.
Hi Jay,

this sounds like a continuous operation to me. Giraph is meant for
batch processing of massive graphs, which produces an output after a
successful run. You run a program, it produces an output, it
terminates. From what I understand, a stream processing framework like
storm (https://github.com/nathanmarz/storm/wiki/Rationale) could be a
better fit for this. Please let me know, if I am missing something.

André

2013/4/18 Jay Hutfles <ja...@gmail.com>:
> I'm new to Giraph, but am interested in its applications to classic network
> flow problems, specifically max flow or min cost problems.  I've looked for
> BSP implementations of algorithms for these problems, but I can't find any
> discussion regarding this online.  Has anyone had luck implementing such
> problems?
>
>
> The max flow problem seems like it should be adaptable to the BSP model.
> The flow augmenting algorithm developed by Ford and Fulkerson is
> essentially:
>
> while the graph contains a path over which flow could be increased,
>    increase flow for arcs on the path
>
> Identifying the flow augmenting paths is a simple labeling algorithm, but
> I'm not sure how I'd implement the "while the graph contains ..." condition.
> Is that a super step above the labeling algorithm's super steps?
>
>
> And I have no idea how to start the min cost algorithm.  Anyone have ideas
> for how to formulate this?
>
> Thanks for your time, and for the great work on Giraph!