You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by kodonnell <ka...@datamine.com> on 2016/09/08 21:45:05 UTC

Graphhopper/routing in Spark

Just wondering if anyone has experience at running Graphhopper (or similar)
in Spark?

In short, I can get it running in the master, but not in worker nodes. The
key trouble seems to be that Graphhopper depends on a pre-processed graph,
which it obtains from OSM data. In normal (desktop) use, it pre-processes,
and then caches to disk. My current thinking is that I could create the
cache locally, and then put it in HDFS, and tweak Graphhopper to read from
the HDFS source. Alternatively I could try to broadcast the cache (or the
entire Graphhopper instance) - though I believe that would require both
being serializable (which I've got little clue about). Does anyone have any
recommendations on the above?

In addition, I'm not quite sure how to structure it to minimise the cache
reading - I don't want to have to read the cache (and initialise
Graphhopper) for e.g. every route, as that's likely to be slow. It'd be nice
if this was only done once (e.g. for each partition) and then all the routes
in the partition processed with the same Graphhopper instance. Again, any
thoughts on this?

FYI, discussion on Graphhoper forum is  here
<https://discuss.graphhopper.com/t/how-to-use-graphhopper-in-spark/998>  ,
though no luck there. 



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Graphhopper-routing-in-Spark-tp27682.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


RE: Graphhopper/routing in Spark

Posted by Kane O'Donnell <Ka...@datamine.com>.
It’s not obvious to me either = ) I was thinking more along the lines of retrieving the graph from HDFS/Spark, merging it together (which should be taken care of by sc.textFile) and then giving it to GraphHopper. Alternatively I guess I could just put the graph locally on every worker node. Or broadcast it … I must be able to just broadcast a chunk of byte data? (On disk, the contracted graph is only 30mb.)

I hadn’t considered GraphX. It doesn’t look suitable as it’s likely to be considerably slower, and not do all of the nice stuff GraphHopper does (e.g. vehicle specific stuff, including importing and processing OSM data).

Kane


Kane O'Donnell
Data Scientist<http://www.datamine.com/titles>
Datamine Limited - backing the Innovation Council in recognising brilliance in business


DDI:                        +64 9 303 2300
Mob:                       +64 27 306 3964
0800 DATAMINE:   0800 328 264


Visit us at:


Shed One, 15 Faraday St, Parnell, Auckland 1052, New Zealand


Pop it in the post:


PO Box 37120, Parnell, Auckland 1151, New Zealand


Need more help finding us... ?: Click here!<https://goo.gl/maps/3qur6wXMBCp>


www.datamine.com<http://www.datamine.com/>


Disclaimer: This email and any files transmitted with it are confidential and may contain legally privileged material, intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail message in error, please contact the sender and delete the material from any computer. Any use, review, dissemination, distribution or copying of this document by persons other than the intended recipient is strictly prohibited. Thank you.

From: Robin East [mailto:robin.east@xense.co.uk]
Sent: Friday, 9 September 2016 7:08 p.m.
To: Kane O'Donnell
Cc: user@spark.apache.org
Subject: Re: Graphhopper/routing in Spark

It’s not obvious to me how that would work. In principle I imagine you could have your source data loaded into HDFS and read by GraphHopper instances running on Spark workers. But a graph by it’s nature has items that have connections to potentially any other item so GraphHopper instances would need to have a way of dealing with that and I presume GraphHopper is not designed that way. Spark’s Graph processing library, GraphX, was designed that way and plenty of thought has gone into how to distribute a graph across machines and still have a way of running algorithms.
-------------------------------------------------------------------------------
Robin East
Spark GraphX in Action Michael Malak and Robin East
Manning Publications Co.
http://www.manning.com/books/spark-graphx-in-action




On 8 Sep 2016, at 22:45, kodonnell <ka...@datamine.com>> wrote:

Just wondering if anyone has experience at running Graphhopper (or similar)
in Spark?

In short, I can get it running in the master, but not in worker nodes. The
key trouble seems to be that Graphhopper depends on a pre-processed graph,
which it obtains from OSM data. In normal (desktop) use, it pre-processes,
and then caches to disk. My current thinking is that I could create the
cache locally, and then put it in HDFS, and tweak Graphhopper to read from
the HDFS source. Alternatively I could try to broadcast the cache (or the
entire Graphhopper instance) - though I believe that would require both
being serializable (which I've got little clue about). Does anyone have any
recommendations on the above?

In addition, I'm not quite sure how to structure it to minimise the cache
reading - I don't want to have to read the cache (and initialise
Graphhopper) for e.g. every route, as that's likely to be slow. It'd be nice
if this was only done once (e.g. for each partition) and then all the routes
in the partition processed with the same Graphhopper instance. Again, any
thoughts on this?

FYI, discussion on Graphhoper forum is  here
<https://discuss.graphhopper.com/t/how-to-use-graphhopper-in-spark/998>  ,
though no luck there.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Graphhopper-routing-in-Spark-tp27682.html
Sent from the Apache Spark User List mailing list archive at Nabble.com<http://nabble.com>.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org<ma...@spark.apache.org>


Re: Graphhopper/routing in Spark

Posted by Robin East <ro...@xense.co.uk>.
It’s not obvious to me how that would work. In principle I imagine you could have your source data loaded into HDFS and read by GraphHopper instances running on Spark workers. But a graph by it’s nature has items that have connections to potentially any other item so GraphHopper instances would need to have a way of dealing with that and I presume GraphHopper is not designed that way. Spark’s Graph processing library, GraphX, was designed that way and plenty of thought has gone into how to distribute a graph across machines and still have a way of running algorithms.
-------------------------------------------------------------------------------
Robin East
Spark GraphX in Action Michael Malak and Robin East
Manning Publications Co.
http://www.manning.com/books/spark-graphx-in-action <http://www.manning.com/books/spark-graphx-in-action>





> On 8 Sep 2016, at 22:45, kodonnell <ka...@datamine.com> wrote:
> 
> Just wondering if anyone has experience at running Graphhopper (or similar)
> in Spark?
> 
> In short, I can get it running in the master, but not in worker nodes. The
> key trouble seems to be that Graphhopper depends on a pre-processed graph,
> which it obtains from OSM data. In normal (desktop) use, it pre-processes,
> and then caches to disk. My current thinking is that I could create the
> cache locally, and then put it in HDFS, and tweak Graphhopper to read from
> the HDFS source. Alternatively I could try to broadcast the cache (or the
> entire Graphhopper instance) - though I believe that would require both
> being serializable (which I've got little clue about). Does anyone have any
> recommendations on the above?
> 
> In addition, I'm not quite sure how to structure it to minimise the cache
> reading - I don't want to have to read the cache (and initialise
> Graphhopper) for e.g. every route, as that's likely to be slow. It'd be nice
> if this was only done once (e.g. for each partition) and then all the routes
> in the partition processed with the same Graphhopper instance. Again, any
> thoughts on this?
> 
> FYI, discussion on Graphhoper forum is  here
> <https://discuss.graphhopper.com/t/how-to-use-graphhopper-in-spark/998>  ,
> though no luck there. 
> 
> 
> 
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Graphhopper-routing-in-Spark-tp27682.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>