You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@giraph.apache.org by Angelo Immediata <an...@gmail.com> on 2014/03/26 11:11:26 UTC

Information

Hi there

In my project I have to implement a routing system with good performance;
at the beginning this system should be able in giving routes information
only for one italian region (Lombardia) but it could be used for the whole
Italy (or world....)
Let's stop to the Lombardia for now. By reading OSM files I can create my
own graph in the best format i can use it; then I need to use Dijkstra (or
any other algorithm) in order to propose to the user K possible paths from
point A to point B (K becouse i need to show to the customer also the
alternatives). I can't use Contraction Herarchy algorithm becouse I need to
take care of external events that can modify the weights on my built graph
and this implies that I should create the "contracted" graph once again and
this can be a very onerous operation

By my experimentations, I saw that by reading the Lombardia OSM file I
should create a graph with around 1 million of vertexes and 6 million of
edges and I was thinking to use Giraph to solve my issue (I saw this link
http://giraph.apache.org/intro.html where you talked about shortestpaths
problem
I have a couple of question for you giraph/hadoop gurus

   - does it make sense to use giraph for my scenario?
   - must i respect some graph format to pass to the giraph algorithm in
   order to have K shortest paths from point A to point B? If so....which
   format should I respect?
   - what would be perfomance by using giraph? I know that Dijstra
   algorithm problem is that it is slow.....by using giraph will I be able in
   improving its performances on very large graph?

I know these can seem very basic questions, but I'm pretty new to giraph
and I'm trying to understand it

Thank you
Angelo

Re: Information

Posted by Claudio Martella <cl...@gmail.com>.
Nope, you can think about Giraph as MapReduce for graphs. Probably neo4j &
C is the way to go for you.


On Wed, Mar 26, 2014 at 3:18 PM, Angelo Immediata <an...@gmail.com>wrote:

> hi Sebastian
>
> OK...I got it....I was thinking I could use it for an online scenario......
> Thank you
>
> Angelo
>
>
> 2014-03-26 14:52 GMT+01:00 Sebastian Schelter <ss...@apache.org>:
>
> Hi Angelo,
>>
>> It very much depends on your use case. Do you want to precompute paths
>> offline in batch or are you looking for a system that answers online?
>> Giraph has been built for the first scenario.
>>
>> --sebastian
>>
>>
>> On 03/26/2014 02:48 PM, Angelo Immediata wrote:
>>
>>> hi Claudio
>>>
>>> so, if I understood correctly, it has no sense to use Giraph for shortest
>>> path calculation in my scenario
>>>
>>> Am I right?
>>>
>>>
>>> 2014-03-26 13:27 GMT+01:00 Claudio Martella <claudio.martella@gmail.com
>>> >:
>>>
>>>  It looks like you're expecting to use Giraph in an online fashion, such
>>>> as
>>>> you would use a database to answer queries within milliseconds or
>>>> seconds.
>>>> Giraph is an offline batch processing system.
>>>>
>>>>
>>>> On Wed, Mar 26, 2014 at 11:11 AM, Angelo Immediata <angeloimm@gmail.com
>>>> >wrote:
>>>>
>>>>  Hi there
>>>>>
>>>>> In my project I have to implement a routing system with good
>>>>> performance;
>>>>> at the beginning this system should be able in giving routes
>>>>> information
>>>>> only for one italian region (Lombardia) but it could be used for the
>>>>> whole
>>>>> Italy (or world....)
>>>>> Let's stop to the Lombardia for now. By reading OSM files I can create
>>>>> my
>>>>> own graph in the best format i can use it; then I need to use Dijkstra
>>>>> (or
>>>>> any other algorithm) in order to propose to the user K possible paths
>>>>> from
>>>>> point A to point B (K becouse i need to show to the customer also the
>>>>> alternatives). I can't use Contraction Herarchy algorithm becouse I
>>>>> need to
>>>>> take care of external events that can modify the weights on my built
>>>>> graph
>>>>> and this implies that I should create the "contracted" graph once
>>>>> again and
>>>>> this can be a very onerous operation
>>>>>
>>>>> By my experimentations, I saw that by reading the Lombardia OSM file I
>>>>> should create a graph with around 1 million of vertexes and 6 million
>>>>> of
>>>>> edges and I was thinking to use Giraph to solve my issue (I saw this
>>>>> link
>>>>> http://giraph.apache.org/intro.html where you talked about
>>>>> shortestpaths
>>>>> problem
>>>>> I have a couple of question for you giraph/hadoop gurus
>>>>>
>>>>>     - does it make sense to use giraph for my scenario?
>>>>>     - must i respect some graph format to pass to the giraph algorithm
>>>>> in
>>>>>
>>>>>     order to have K shortest paths from point A to point B? If
>>>>> so....which
>>>>>     format should I respect?
>>>>>     - what would be perfomance by using giraph? I know that Dijstra
>>>>>
>>>>>     algorithm problem is that it is slow.....by using giraph will I be
>>>>> able in
>>>>>     improving its performances on very large graph?
>>>>>
>>>>> I know these can seem very basic questions, but I'm pretty new to
>>>>> giraph
>>>>> and I'm trying to understand it
>>>>>
>>>>> Thank you
>>>>> Angelo
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>     Claudio Martella
>>>>
>>>>
>>>>
>>>
>>
>


-- 
   Claudio Martella

Re: Information

Posted by Angelo Immediata <an...@gmail.com>.
hi Sebastian

OK...I got it....I was thinking I could use it for an online scenario......
Thank you

Angelo


2014-03-26 14:52 GMT+01:00 Sebastian Schelter <ss...@apache.org>:

> Hi Angelo,
>
> It very much depends on your use case. Do you want to precompute paths
> offline in batch or are you looking for a system that answers online?
> Giraph has been built for the first scenario.
>
> --sebastian
>
>
> On 03/26/2014 02:48 PM, Angelo Immediata wrote:
>
>> hi Claudio
>>
>> so, if I understood correctly, it has no sense to use Giraph for shortest
>> path calculation in my scenario
>>
>> Am I right?
>>
>>
>> 2014-03-26 13:27 GMT+01:00 Claudio Martella <cl...@gmail.com>:
>>
>>  It looks like you're expecting to use Giraph in an online fashion, such
>>> as
>>> you would use a database to answer queries within milliseconds or
>>> seconds.
>>> Giraph is an offline batch processing system.
>>>
>>>
>>> On Wed, Mar 26, 2014 at 11:11 AM, Angelo Immediata <angeloimm@gmail.com
>>> >wrote:
>>>
>>>  Hi there
>>>>
>>>> In my project I have to implement a routing system with good
>>>> performance;
>>>> at the beginning this system should be able in giving routes information
>>>> only for one italian region (Lombardia) but it could be used for the
>>>> whole
>>>> Italy (or world....)
>>>> Let's stop to the Lombardia for now. By reading OSM files I can create
>>>> my
>>>> own graph in the best format i can use it; then I need to use Dijkstra
>>>> (or
>>>> any other algorithm) in order to propose to the user K possible paths
>>>> from
>>>> point A to point B (K becouse i need to show to the customer also the
>>>> alternatives). I can't use Contraction Herarchy algorithm becouse I
>>>> need to
>>>> take care of external events that can modify the weights on my built
>>>> graph
>>>> and this implies that I should create the "contracted" graph once again
>>>> and
>>>> this can be a very onerous operation
>>>>
>>>> By my experimentations, I saw that by reading the Lombardia OSM file I
>>>> should create a graph with around 1 million of vertexes and 6 million of
>>>> edges and I was thinking to use Giraph to solve my issue (I saw this
>>>> link
>>>> http://giraph.apache.org/intro.html where you talked about
>>>> shortestpaths
>>>> problem
>>>> I have a couple of question for you giraph/hadoop gurus
>>>>
>>>>     - does it make sense to use giraph for my scenario?
>>>>     - must i respect some graph format to pass to the giraph algorithm
>>>> in
>>>>
>>>>     order to have K shortest paths from point A to point B? If
>>>> so....which
>>>>     format should I respect?
>>>>     - what would be perfomance by using giraph? I know that Dijstra
>>>>
>>>>     algorithm problem is that it is slow.....by using giraph will I be
>>>> able in
>>>>     improving its performances on very large graph?
>>>>
>>>> I know these can seem very basic questions, but I'm pretty new to giraph
>>>> and I'm trying to understand it
>>>>
>>>> Thank you
>>>> Angelo
>>>>
>>>>
>>>
>>>
>>> --
>>>     Claudio Martella
>>>
>>>
>>>
>>
>

Re: Information

Posted by Sebastian Schelter <ss...@apache.org>.
Hi Angelo,

It very much depends on your use case. Do you want to precompute paths 
offline in batch or are you looking for a system that answers online? 
Giraph has been built for the first scenario.

--sebastian

On 03/26/2014 02:48 PM, Angelo Immediata wrote:
> hi Claudio
>
> so, if I understood correctly, it has no sense to use Giraph for shortest
> path calculation in my scenario
>
> Am I right?
>
>
> 2014-03-26 13:27 GMT+01:00 Claudio Martella <cl...@gmail.com>:
>
>> It looks like you're expecting to use Giraph in an online fashion, such as
>> you would use a database to answer queries within milliseconds or seconds.
>> Giraph is an offline batch processing system.
>>
>>
>> On Wed, Mar 26, 2014 at 11:11 AM, Angelo Immediata <an...@gmail.com>wrote:
>>
>>> Hi there
>>>
>>> In my project I have to implement a routing system with good performance;
>>> at the beginning this system should be able in giving routes information
>>> only for one italian region (Lombardia) but it could be used for the whole
>>> Italy (or world....)
>>> Let's stop to the Lombardia for now. By reading OSM files I can create my
>>> own graph in the best format i can use it; then I need to use Dijkstra (or
>>> any other algorithm) in order to propose to the user K possible paths from
>>> point A to point B (K becouse i need to show to the customer also the
>>> alternatives). I can't use Contraction Herarchy algorithm becouse I need to
>>> take care of external events that can modify the weights on my built graph
>>> and this implies that I should create the "contracted" graph once again and
>>> this can be a very onerous operation
>>>
>>> By my experimentations, I saw that by reading the Lombardia OSM file I
>>> should create a graph with around 1 million of vertexes and 6 million of
>>> edges and I was thinking to use Giraph to solve my issue (I saw this link
>>> http://giraph.apache.org/intro.html where you talked about shortestpaths
>>> problem
>>> I have a couple of question for you giraph/hadoop gurus
>>>
>>>     - does it make sense to use giraph for my scenario?
>>>     - must i respect some graph format to pass to the giraph algorithm in
>>>     order to have K shortest paths from point A to point B? If so....which
>>>     format should I respect?
>>>     - what would be perfomance by using giraph? I know that Dijstra
>>>     algorithm problem is that it is slow.....by using giraph will I be able in
>>>     improving its performances on very large graph?
>>>
>>> I know these can seem very basic questions, but I'm pretty new to giraph
>>> and I'm trying to understand it
>>>
>>> Thank you
>>> Angelo
>>>
>>
>>
>>
>> --
>>     Claudio Martella
>>
>>
>


Re: Information

Posted by Angelo Immediata <an...@gmail.com>.
hi Claudio

so, if I understood correctly, it has no sense to use Giraph for shortest
path calculation in my scenario

Am I right?


2014-03-26 13:27 GMT+01:00 Claudio Martella <cl...@gmail.com>:

> It looks like you're expecting to use Giraph in an online fashion, such as
> you would use a database to answer queries within milliseconds or seconds.
> Giraph is an offline batch processing system.
>
>
> On Wed, Mar 26, 2014 at 11:11 AM, Angelo Immediata <an...@gmail.com>wrote:
>
>> Hi there
>>
>> In my project I have to implement a routing system with good performance;
>> at the beginning this system should be able in giving routes information
>> only for one italian region (Lombardia) but it could be used for the whole
>> Italy (or world....)
>> Let's stop to the Lombardia for now. By reading OSM files I can create my
>> own graph in the best format i can use it; then I need to use Dijkstra (or
>> any other algorithm) in order to propose to the user K possible paths from
>> point A to point B (K becouse i need to show to the customer also the
>> alternatives). I can't use Contraction Herarchy algorithm becouse I need to
>> take care of external events that can modify the weights on my built graph
>> and this implies that I should create the "contracted" graph once again and
>> this can be a very onerous operation
>>
>> By my experimentations, I saw that by reading the Lombardia OSM file I
>> should create a graph with around 1 million of vertexes and 6 million of
>> edges and I was thinking to use Giraph to solve my issue (I saw this link
>> http://giraph.apache.org/intro.html where you talked about shortestpaths
>> problem
>> I have a couple of question for you giraph/hadoop gurus
>>
>>    - does it make sense to use giraph for my scenario?
>>    - must i respect some graph format to pass to the giraph algorithm in
>>    order to have K shortest paths from point A to point B? If so....which
>>    format should I respect?
>>    - what would be perfomance by using giraph? I know that Dijstra
>>    algorithm problem is that it is slow.....by using giraph will I be able in
>>    improving its performances on very large graph?
>>
>> I know these can seem very basic questions, but I'm pretty new to giraph
>> and I'm trying to understand it
>>
>> Thank you
>> Angelo
>>
>
>
>
> --
>    Claudio Martella
>
>

Re: Information

Posted by Claudio Martella <cl...@gmail.com>.
It looks like you're expecting to use Giraph in an online fashion, such as
you would use a database to answer queries within milliseconds or seconds.
Giraph is an offline batch processing system.


On Wed, Mar 26, 2014 at 11:11 AM, Angelo Immediata <an...@gmail.com>wrote:

> Hi there
>
> In my project I have to implement a routing system with good performance;
> at the beginning this system should be able in giving routes information
> only for one italian region (Lombardia) but it could be used for the whole
> Italy (or world....)
> Let's stop to the Lombardia for now. By reading OSM files I can create my
> own graph in the best format i can use it; then I need to use Dijkstra (or
> any other algorithm) in order to propose to the user K possible paths from
> point A to point B (K becouse i need to show to the customer also the
> alternatives). I can't use Contraction Herarchy algorithm becouse I need to
> take care of external events that can modify the weights on my built graph
> and this implies that I should create the "contracted" graph once again and
> this can be a very onerous operation
>
> By my experimentations, I saw that by reading the Lombardia OSM file I
> should create a graph with around 1 million of vertexes and 6 million of
> edges and I was thinking to use Giraph to solve my issue (I saw this link
> http://giraph.apache.org/intro.html where you talked about shortestpaths
> problem
> I have a couple of question for you giraph/hadoop gurus
>
>    - does it make sense to use giraph for my scenario?
>    - must i respect some graph format to pass to the giraph algorithm in
>    order to have K shortest paths from point A to point B? If so....which
>    format should I respect?
>    - what would be perfomance by using giraph? I know that Dijstra
>    algorithm problem is that it is slow.....by using giraph will I be able in
>    improving its performances on very large graph?
>
> I know these can seem very basic questions, but I'm pretty new to giraph
> and I'm trying to understand it
>
> Thank you
> Angelo
>



-- 
   Claudio Martella

Re: Information

Posted by Angelo Immediata <an...@gmail.com>.
hi Sebastian

thnx for the answer

I was giving a look to cassovary, but I'm losing hot to integrate it with
neo4j and/or hor to calculate paths with it.....Do you have any tips?

Angelo


2014-03-26 11:31 GMT+01:00 Sebastian Schelter <ss...@googlemail.com>:

> For such a small graph, using a single machine  graph processing system
> makes more sense imho. Should be faster and easier to program. Google for
> cassovary.
> Am 26.03.2014 10:12 schrieb "Angelo Immediata" <an...@gmail.com>:
>
> Hi there
>>
>> In my project I have to implement a routing system with good performance;
>> at the beginning this system should be able in giving routes information
>> only for one italian region (Lombardia) but it could be used for the whole
>> Italy (or world....)
>> Let's stop to the Lombardia for now. By reading OSM files I can create my
>> own graph in the best format i can use it; then I need to use Dijkstra (or
>> any other algorithm) in order to propose to the user K possible paths from
>> point A to point B (K becouse i need to show to the customer also the
>> alternatives). I can't use Contraction Herarchy algorithm becouse I need to
>> take care of external events that can modify the weights on my built graph
>> and this implies that I should create the "contracted" graph once again and
>> this can be a very onerous operation
>>
>> By my experimentations, I saw that by reading the Lombardia OSM file I
>> should create a graph with around 1 million of vertexes and 6 million of
>> edges and I was thinking to use Giraph to solve my issue (I saw this link
>> http://giraph.apache.org/intro.html where you talked about shortestpaths
>> problem
>> I have a couple of question for you giraph/hadoop gurus
>>
>>    - does it make sense to use giraph for my scenario?
>>    - must i respect some graph format to pass to the giraph algorithm in
>>    order to have K shortest paths from point A to point B? If so....which
>>    format should I respect?
>>    - what would be perfomance by using giraph? I know that Dijstra
>>    algorithm problem is that it is slow.....by using giraph will I be able in
>>    improving its performances on very large graph?
>>
>> I know these can seem very basic questions, but I'm pretty new to giraph
>> and I'm trying to understand it
>>
>> Thank you
>> Angelo
>>
>

Re: Information

Posted by Sebastian Schelter <ss...@googlemail.com>.
For such a small graph, using a single machine  graph processing system
makes more sense imho. Should be faster and easier to program. Google for
cassovary.
Am 26.03.2014 10:12 schrieb "Angelo Immediata" <an...@gmail.com>:

> Hi there
>
> In my project I have to implement a routing system with good performance;
> at the beginning this system should be able in giving routes information
> only for one italian region (Lombardia) but it could be used for the whole
> Italy (or world....)
> Let's stop to the Lombardia for now. By reading OSM files I can create my
> own graph in the best format i can use it; then I need to use Dijkstra (or
> any other algorithm) in order to propose to the user K possible paths from
> point A to point B (K becouse i need to show to the customer also the
> alternatives). I can't use Contraction Herarchy algorithm becouse I need to
> take care of external events that can modify the weights on my built graph
> and this implies that I should create the "contracted" graph once again and
> this can be a very onerous operation
>
> By my experimentations, I saw that by reading the Lombardia OSM file I
> should create a graph with around 1 million of vertexes and 6 million of
> edges and I was thinking to use Giraph to solve my issue (I saw this link
> http://giraph.apache.org/intro.html where you talked about shortestpaths
> problem
> I have a couple of question for you giraph/hadoop gurus
>
>    - does it make sense to use giraph for my scenario?
>    - must i respect some graph format to pass to the giraph algorithm in
>    order to have K shortest paths from point A to point B? If so....which
>    format should I respect?
>    - what would be perfomance by using giraph? I know that Dijstra
>    algorithm problem is that it is slow.....by using giraph will I be able in
>    improving its performances on very large graph?
>
> I know these can seem very basic questions, but I'm pretty new to giraph
> and I'm trying to understand it
>
> Thank you
> Angelo
>