You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Filli Alem <Al...@ti8m.ch> on 2015/07/29 21:04:32 UTC

IP2Location within spark jobs

Hi,

I would like to use ip2Location databases during my spark jobs (MaxMind).
So far I haven't found a way to properly serialize the database offered by the Java API of the database.
The CSV version isn't easy to handle as it contains of multiple files.

Any recommendations on how to do this?

Thanks
Alem

[http://www.ti8m.ch/fileadmin/daten/ti8m/Bilder/footer/Footer_Paymit_klein.jpg]<https://www.ti8m.ch/competences/garage.html>

RE: IP2Location within spark jobs

Posted by "Young, Matthew T" <ma...@intel.com>.

You can put the database files in a central location accessible to all the workers and build the GeoIP object once per-partition when you go to do a mapPartitions across your dataset, loading from the central location.


___


From: Filli Alem [Alem.Filli@ti8m.ch]

Sent: Wednesday, July 29, 2015 12:04 PM

To: user@spark.apache.org

Subject: IP2Location within spark jobs










Hi,
 
I would like to use ip2Location databases during my spark jobs (MaxMind).
So far I haven’t found a way to properly serialize the database offered by the Java API of the database.

The CSV version isn’t easy to handle as it contains of multiple files.
 
Any recommendations on how to do this?
 
Thanks
Alem
 


















---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org