You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@giraph.apache.org by Gustavo Enrique Salazar Torres <gs...@ime.usp.br> on 2012/12/11 14:58:01 UTC

Breadth-first search

Hi:

I implemented a graph algorithm to recommend content to our users. Although
it is working (implementation uses Mahout) it very inefficient because I
have to run many iterations in order to perform a breadth-first search on
my graph.
I would like to use Giraph for that task. I would like to know if it is
production ready. I'm running jobs on Amazon EMR.

Thanks in advance.
Gustavo

Re: Breadth-first search

Posted by Jan van der Lugt <ja...@gmail.com>.

Hi Gustavo,

If your graph fits in memory, you might be interested Green-Marl, a
language tailored for graph processing:
https://github.com/stanford-ppl/Green-Marl
You can compile your Green-Marl program to an extremely fast C++ program,
but also to Giraph program when your graph does not fit in memory anymore.

- Jan

On Tue, Dec 11, 2012 at 8:33 PM, Gustavo Enrique Salazar Torres <
gsalazar@ime.usp.br> wrote:

> Hi Avery:
>
> Regarding resources I guess I won't need that much, our graph has 60,000
> nodes only, I believe one c1.xlarge EC2 machine can handle this or scale if
> needed.
>
> Thank you very much.
> Gustavo
>
> On Tue, Dec 11, 2012 at 4:40 PM, Avery Ching <ac...@apache.org> wrote:
>
>> We are running several Giraph applications in production using our
>> version of Hadoop (Corona) at Facebook.  The part you have to be careful
>> about is ensuring you have enough resources for your job to run.  But
>> otherwise, we are able to run at FB-scale (i.e. 1billion+ nodes, many more
>> edges).
>>
>> Avery
>>
>>
>> On 12/11/12 5:58 AM, Gustavo Enrique Salazar Torres wrote:
>>
>>> Hi:
>>>
>>> I implemented a graph algorithm to recommend content to our users.
>>> Although it is working (implementation uses Mahout) it very inefficient
>>> because I have to run many iterations in order to perform a breadth-first
>>> search on my graph.
>>> I would like to use Giraph for that task. I would like to know if it is
>>> production ready. I'm running jobs on Amazon EMR.
>>>
>>> Thanks in advance.
>>> Gustavo
>>>
>>
>>
>
>
>
>

Re: Breadth-first search

Posted by Gustavo Enrique Salazar Torres <gs...@ime.usp.br>.

Hi Avery:

Regarding resources I guess I won't need that much, our graph has 60,000
nodes only, I believe one c1.xlarge EC2 machine can handle this or scale if
needed.

Thank you very much.
Gustavo

On Tue, Dec 11, 2012 at 4:40 PM, Avery Ching <ac...@apache.org> wrote:

> We are running several Giraph applications in production using our version
> of Hadoop (Corona) at Facebook.  The part you have to be careful about is
> ensuring you have enough resources for your job to run.  But otherwise, we
> are able to run at FB-scale (i.e. 1billion+ nodes, many more edges).
>
> Avery
>
>
> On 12/11/12 5:58 AM, Gustavo Enrique Salazar Torres wrote:
>
>> Hi:
>>
>> I implemented a graph algorithm to recommend content to our users.
>> Although it is working (implementation uses Mahout) it very inefficient
>> because I have to run many iterations in order to perform a breadth-first
>> search on my graph.
>> I would like to use Giraph for that task. I would like to know if it is
>> production ready. I'm running jobs on Amazon EMR.
>>
>> Thanks in advance.
>> Gustavo
>>
>
>

Re: Breadth-first search

Posted by Alexandros Daglis <al...@epfl.ch>.

Dear Avery,

Regarding this decision about resource allocation, do you have a
methodology or a rule of thumb that helps you decide which setting is
expected to perform well?
For example, with a given input (number of graph vertices), can you
estimate what number of workers and how much memory per worker would be
optimal? Or the other way around: given a pool of resources (cores &
memory), what's a reasonable graph size?

That insight would be really interesting.

Thanks,
Alexandros

On 11 December 2012 19:40, Avery Ching <ac...@apache.org> wrote:

> We are running several Giraph applications in production using our version
> of Hadoop (Corona) at Facebook.  The part you have to be careful about is
> ensuring you have enough resources for your job to run.  But otherwise, we
> are able to run at FB-scale (i.e. 1billion+ nodes, many more edges).
>
> Avery
>
>
> On 12/11/12 5:58 AM, Gustavo Enrique Salazar Torres wrote:
>
>> Hi:
>>
>> I implemented a graph algorithm to recommend content to our users.
>> Although it is working (implementation uses Mahout) it very inefficient
>> because I have to run many iterations in order to perform a breadth-first
>> search on my graph.
>> I would like to use Giraph for that task. I would like to know if it is
>> production ready. I'm running jobs on Amazon EMR.
>>
>> Thanks in advance.
>> Gustavo
>>
>
>

Re: Breadth-first search

Posted by Avery Ching <ac...@apache.org>.

We are running several Giraph applications in production using our 
version of Hadoop (Corona) at Facebook.  The part you have to be careful 
about is ensuring you have enough resources for your job to run.  But 
otherwise, we are able to run at FB-scale (i.e. 1billion+ nodes, many 
more edges).

Avery

On 12/11/12 5:58 AM, Gustavo Enrique Salazar Torres wrote:
> Hi:
>
> I implemented a graph algorithm to recommend content to our users. 
> Although it is working (implementation uses Mahout) it very 
> inefficient because I have to run many iterations in order to perform 
> a breadth-first search on my graph.
> I would like to use Giraph for that task. I would like to know if it 
> is production ready. I'm running jobs on Amazon EMR.
>
> Thanks in advance.
> Gustavo