You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Filli Alem <Al...@ti8m.ch> on 2015/07/29 21:04:32 UTC
IP2Location within spark jobs
Hi,
I would like to use ip2Location databases during my spark jobs (MaxMind).
So far I haven't found a way to properly serialize the database offered by the Java API of the database.
The CSV version isn't easy to handle as it contains of multiple files.
Any recommendations on how to do this?
Thanks
Alem
[http://www.ti8m.ch/fileadmin/daten/ti8m/Bilder/footer/Footer_Paymit_klein.jpg]<https://www.ti8m.ch/competences/garage.html>
RE: IP2Location within spark jobs
Posted by "Young, Matthew T" <ma...@intel.com>.
You can put the database files in a central location accessible to all the workers and build the GeoIP object once per-partition when you go to do a mapPartitions across your dataset, loading from the central location.
___
From: Filli Alem [Alem.Filli@ti8m.ch]
Sent: Wednesday, July 29, 2015 12:04 PM
To: user@spark.apache.org
Subject: IP2Location within spark jobs
Hi,
I would like to use ip2Location databases during my spark jobs (MaxMind).
So far I haven’t found a way to properly serialize the database offered by the Java API of the database.
The CSV version isn’t easy to handle as it contains of multiple files.
Any recommendations on how to do this?
Thanks
Alem
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org