You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Ross Nordeen <rj...@mtu.edu> on 2011/07/11 20:57:56 UTC

GeoIP database lookups

Hello all,

Is there an accepted way to use the GeoIP database with pig?  

I've found some people have tried to write UDF's with their java api.
http://www.maxmind.com/java

Others say to use the streaming interface within pig and run the queries through a perl script.  
http://www.cloudera.com/blog/2009/06/analyzing-apache-logs-with-pig/#comments

I'm just trying to find the most efficient way to run this.  any ideas?  

-Ross

Re: GeoIP database lookups

Posted by Matt Davies <ma...@mattdavies.net>.
We wrote a snazzy UDF that does 1 initialization per mapper and does all the
necessary conversions. Quite efficient and fast.

The trick to maintainability is to have your UDF initialize the
locations.csv from HDFS and not to include the csv file within your jar.
 That way you can easily update the locations without recompiling.

-Matt

On Mon, Jul 11, 2011 at 12:57 PM, Ross Nordeen <rj...@mtu.edu> wrote:

> Hello all,
>
> Is there an accepted way to use the GeoIP database with pig?
>
> I've found some people have tried to write UDF's with their java api.
> http://www.maxmind.com/java
>
> Others say to use the streaming interface within pig and run the queries
> through a perl script.
>
> http://www.cloudera.com/blog/2009/06/analyzing-apache-logs-with-pig/#comments
>
> I'm just trying to find the most efficient way to run this.  any ideas?
>
> -Ross
>