You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Edward Capriolo <ed...@gmail.com> on 2009/04/28 16:42:48 UTC

GEO-IP as User Defined Function

Hey all,
You may all be familiar with geo-ip from maxmind.
http://www.maxmind.com/app/api. GNU General Public License (GPL)

I am running a process where I have to geo locate IP addresses. I
think this would be a good UDF. Right now I am using an external map
reduce process to be inserted back into hive.

GEO-CITY(columnname)
GEO-STATE(columnname)

The only drawback I can see is that GEO IP requires database files to
be on the local file system. However the functions could return NULL
if the local files do not exist.

Does anyone think these would be useful in hive?

Edward

Re: GEO-IP as User Defined Function

Posted by Edward Capriolo <ed...@gmail.com>.
Making it dynamically configurable seems a bit complicated. I will
consider publishing myself.

Re: GEO-IP as User Defined Function

Posted by Dhruba Borthakur <dh...@gmail.com>.
We cannot put GPL code into Hive... licenses are incompatible.

You can make it a dynamically configurable parameter. If the relevant
classes in the CLASSPATH then they will be invoked. Otherwise, the stubs
(built into hive) can throw an exception. A customer can download the
maxmind stuff into his hive install and then set the config parameter
appropriately to make Hive use them,

thanks,
dhruba


On Tue, Apr 28, 2009 at 7:42 AM, Edward Capriolo <ed...@gmail.com>wrote:

> Hey all,
> You may all be familiar with geo-ip from maxmind.
> http://www.maxmind.com/app/api. GNU General Public License (GPL)
>
> I am running a process where I have to geo locate IP addresses. I
> think this would be a good UDF. Right now I am using an external map
> reduce process to be inserted back into hive.
>
> GEO-CITY(columnname)
> GEO-STATE(columnname)
>
> The only drawback I can see is that GEO IP requires database files to
> be on the local file system. However the functions could return NULL
> if the local files do not exist.
>
> Does anyone think these would be useful in hive?
>
> Edward
>

Re: GEO-IP as User Defined Function

Posted by Prasad Chakka <pc...@facebook.com>.
I think it will be very useful to put in Hive (or rather a contrib in hive) but I don't think GPL is compatible with Apache. So it will be a no go putting the maxmind code/data in Hive.

Prasad


________________________________
From: Edward Capriolo <ed...@gmail.com>
Reply-To: <hi...@hadoop.apache.org>
Date: Tue, 28 Apr 2009 07:42:48 -0700
To: <hi...@hadoop.apache.org>
Subject: GEO-IP as User Defined Function

Hey all,
You may all be familiar with geo-ip from maxmind.
http://www.maxmind.com/app/api. GNU General Public License (GPL)

I am running a process where I have to geo locate IP addresses. I
think this would be a good UDF. Right now I am using an external map
reduce process to be inserted back into hive.

GEO-CITY(columnname)
GEO-STATE(columnname)

The only drawback I can see is that GEO IP requires database files to
be on the local file system. However the functions could return NULL
if the local files do not exist.

Does anyone think these would be useful in hive?

Edward