You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Jonathan Coveney <jc...@gmail.com> on 2011/02/10 00:22:02 UTC

Using a file packaged into a UDF jar?

I am trying to implement a maxmind call where I do not have to put the
maxmind file on the nodes.

I referred to this
http://web.archiveorange.com/archive/v/3inw3FVtG19NUTr25Yra
<http://web.archiveorange.com/archive/v/3inw3FVtG19NUTr25Yra>and tried to
mesh it with the method in this
http://blog.data-miners.com/2009/12/hadoop-and-mapreduce-what-country-is-ip.html

<http://blog.data-miners.com/2009/12/hadoop-and-mapreduce-what-country-is-ip.html>This
is my jar's manifest:

META-INF/
META-INF/MANIFEST.MF
maxmind/
maxmind/com/
maxmind/com/maxmind/
maxmind/com/maxmind/geoip/
maxmind/com/maxmind/geoip/Country.class
maxmind/com/maxmind/geoip/DatabaseInfo.class
maxmind/com/maxmind/geoip/Location.class
maxmind/com/maxmind/geoip/LookupService.class
maxmind/com/maxmind/geoip/Region.class
maxmind/com/maxmind/geoip/regionName.class
maxmind/com/maxmind/geoip/timeZone.class
maxmind/ip2country.class
GeoIp.dat

So, as you can see, the file is there. However, it isn't working when I try
to instantiate it. The UDF is attached below. I see the
path, jar:file:/home/jcoveney/udfs/maxmind/jar/maxmind.jar!/GeoIp.dat, so I
think I'm almost there. The question is: what form does this path need to be
in so that the pig execution wil lbe able to get to the GeoIp.dat? I tried
without the full path, I tried without jar:, I tried without file:...I
really just don't know.

Any ideas?

package maxmind;

import java.io.IOException;

import org.apache.pig.EvalFunc;
import org.apache.pig.PigException;
import org.apache.pig.data.Tuple;
import org.apache.pig.backend.executionengine.ExecException;
import org.apache.pig.impl.util.WrappedIOException;

import maxmind.com.maxmind.geoip.*;

public class ip2country extends EvalFunc<String> {
        public LookupService iplookupservice;
        public static String DEFAULT_LOCATION = "/GeoIp.dat";

        public ip2country() throws IOException {
                this(DEFAULT_LOCATION);
        }

        public ip2country(String GeoIpFile) throws IOException {
                if (iplookupservice == null) {
                        String filename =
getClass().getResource(DEFAULT_LOCATION).toExternalForm();
                        System.out.println(filename);
                        iplookupservice = new LookupService(filename,
LookupService.GEOIP_MEMORY_CACHE | LookupService.GEOIP_CHECK_CACHE);
                }
        }

        @Override
        public String exec(Tuple input) throws IOException {
                if (input == null || input.size() == 0)
                        return null;
                try {
                        return "hi";
                } catch (Exception e) {
                        int errCode = 31415;
                        String msg = "Error while performing maxmind lookup
in " + this.getClass().getSimpleName();
                        throw new ExecException(msg, errCode,
PigException.BUG, e);
                }
        }
}
~

Re: Using a file packaged into a UDF jar?

Posted by Jonathan Coveney <jc...@gmail.com>.
It does require the filename, but is there a way to basically point it
towards the value in the jar? I can imagine some hackier ways of doing it
(and I'm not averse to changing the maxmind API's loader to make this work
as well), but I'd really really love to be able to access it from the jar,
as it is a pretty elegant solution. I just do not know what is involved in
accessing it via the jar from the java perspective.

Anyone have any tips or hints? In general, it's a nice way to get around the
issue of how to distribute common files...

As always, thanks.

2011/2/9 Charles Gonçalves <ch...@gmail.com>

> The problem isn't with pig. Is with the maxmind lib, it requires a
> filename ( i'm not so shure right now), but the problem is within the
> maxmind constructor that you hás to pass the reference for de dat
> file.
>
> I passed by this problem and won't be able to solve it in a better way
> than copying the file.
>
> One ugly workaround will be open the filestream write it in some
> temporary folder and pass the path for this file to the maxmind lib
>
> Enviado via iPod
>
> Em 09/02/2011, às 21:22, Jonathan Coveney <jc...@gmail.com> escreveu:
>
> > I am trying to implement a maxmind call where I do not have to put the
> > maxmind file on the nodes.
> >
> > I referred to this
> > http://web.archiveorange.com/archive/v/3inw3FVtG19NUTr25Yra
> > <http://web.archiveorange.com/archive/v/3inw3FVtG19NUTr25Yra>and tried
> to
> > mesh it with the method in this
> >
> http://blog.data-miners.com/2009/12/hadoop-and-mapreduce-what-country-is-ip.html
> >
> > <
> http://blog.data-miners.com/2009/12/hadoop-and-mapreduce-what-country-is-ip.html
> >This
> > is my jar's manifest:
> >
> > META-INF/
> > META-INF/MANIFEST.MF
> > maxmind/
> > maxmind/com/
> > maxmind/com/maxmind/
> > maxmind/com/maxmind/geoip/
> > maxmind/com/maxmind/geoip/Country.class
> > maxmind/com/maxmind/geoip/DatabaseInfo.class
> > maxmind/com/maxmind/geoip/Location.class
> > maxmind/com/maxmind/geoip/LookupService.class
> > maxmind/com/maxmind/geoip/Region.class
> > maxmind/com/maxmind/geoip/regionName.class
> > maxmind/com/maxmind/geoip/timeZone.class
> > maxmind/ip2country.class
> > GeoIp.dat
> >
> > So, as you can see, the file is there. However, it isn't working when I
> try
> > to instantiate it. The UDF is attached below. I see the
> > path, jar:file:/home/jcoveney/udfs/maxmind/jar/maxmind.jar!/GeoIp.dat, so
> I
> > think I'm almost there. The question is: what form does this path need to
> be
> > in so that the pig execution wil lbe able to get to the GeoIp.dat? I
> tried
> > without the full path, I tried without jar:, I tried without file:...I
> > really just don't know.
> >
> > Any ideas?
> >
> > package maxmind;
> >
> > import java.io.IOException;
> >
> > import org.apache.pig.EvalFunc;
> > import org.apache.pig.PigException;
> > import org.apache.pig.data.Tuple;
> > import org.apache.pig.backend.executionengine.ExecException;
> > import org.apache.pig.impl.util.WrappedIOException;
> >
> > import maxmind.com.maxmind.geoip.*;
> >
> > public class ip2country extends EvalFunc<String> {
> >       public LookupService iplookupservice;
> >       public static String DEFAULT_LOCATION = "/GeoIp.dat";
> >
> >       public ip2country() throws IOException {
> >               this(DEFAULT_LOCATION);
> >       }
> >
> >       public ip2country(String GeoIpFile) throws IOException {
> >               if (iplookupservice == null) {
> >                       String filename =
> > getClass().getResource(DEFAULT_LOCATION).toExternalForm();
> >                       System.out.println(filename);
> >                       iplookupservice = new LookupService(filename,
> > LookupService.GEOIP_MEMORY_CACHE | LookupService.GEOIP_CHECK_CACHE);
> >               }
> >       }
> >
> >       @Override
> >       public String exec(Tuple input) throws IOException {
> >               if (input == null || input.size() == 0)
> >                       return null;
> >               try {
> >                       return "hi";
> >               } catch (Exception e) {
> >                       int errCode = 31415;
> >                       String msg = "Error while performing maxmind lookup
> > in " + this.getClass().getSimpleName();
> >                       throw new ExecException(msg, errCode,
> > PigException.BUG, e);
> >               }
> >       }
> > }
> > ~
>

Re: Using a file packaged into a UDF jar?

Posted by Charles Gonçalves <ch...@gmail.com>.
The problem isn't with pig. Is with the maxmind lib, it requires a
filename ( i'm not so shure right now), but the problem is within the
maxmind constructor that you hás to pass the reference for de dat
file.

I passed by this problem and won't be able to solve it in a better way
than copying the file.

One ugly workaround will be open the filestream write it in some
temporary folder and pass the path for this file to the maxmind lib

Enviado via iPod

Em 09/02/2011, às 21:22, Jonathan Coveney <jc...@gmail.com> escreveu:

> I am trying to implement a maxmind call where I do not have to put the
> maxmind file on the nodes.
>
> I referred to this
> http://web.archiveorange.com/archive/v/3inw3FVtG19NUTr25Yra
> <http://web.archiveorange.com/archive/v/3inw3FVtG19NUTr25Yra>and tried to
> mesh it with the method in this
> http://blog.data-miners.com/2009/12/hadoop-and-mapreduce-what-country-is-ip.html
>
> <http://blog.data-miners.com/2009/12/hadoop-and-mapreduce-what-country-is-ip.html>This
> is my jar's manifest:
>
> META-INF/
> META-INF/MANIFEST.MF
> maxmind/
> maxmind/com/
> maxmind/com/maxmind/
> maxmind/com/maxmind/geoip/
> maxmind/com/maxmind/geoip/Country.class
> maxmind/com/maxmind/geoip/DatabaseInfo.class
> maxmind/com/maxmind/geoip/Location.class
> maxmind/com/maxmind/geoip/LookupService.class
> maxmind/com/maxmind/geoip/Region.class
> maxmind/com/maxmind/geoip/regionName.class
> maxmind/com/maxmind/geoip/timeZone.class
> maxmind/ip2country.class
> GeoIp.dat
>
> So, as you can see, the file is there. However, it isn't working when I try
> to instantiate it. The UDF is attached below. I see the
> path, jar:file:/home/jcoveney/udfs/maxmind/jar/maxmind.jar!/GeoIp.dat, so I
> think I'm almost there. The question is: what form does this path need to be
> in so that the pig execution wil lbe able to get to the GeoIp.dat? I tried
> without the full path, I tried without jar:, I tried without file:...I
> really just don't know.
>
> Any ideas?
>
> package maxmind;
>
> import java.io.IOException;
>
> import org.apache.pig.EvalFunc;
> import org.apache.pig.PigException;
> import org.apache.pig.data.Tuple;
> import org.apache.pig.backend.executionengine.ExecException;
> import org.apache.pig.impl.util.WrappedIOException;
>
> import maxmind.com.maxmind.geoip.*;
>
> public class ip2country extends EvalFunc<String> {
>       public LookupService iplookupservice;
>       public static String DEFAULT_LOCATION = "/GeoIp.dat";
>
>       public ip2country() throws IOException {
>               this(DEFAULT_LOCATION);
>       }
>
>       public ip2country(String GeoIpFile) throws IOException {
>               if (iplookupservice == null) {
>                       String filename =
> getClass().getResource(DEFAULT_LOCATION).toExternalForm();
>                       System.out.println(filename);
>                       iplookupservice = new LookupService(filename,
> LookupService.GEOIP_MEMORY_CACHE | LookupService.GEOIP_CHECK_CACHE);
>               }
>       }
>
>       @Override
>       public String exec(Tuple input) throws IOException {
>               if (input == null || input.size() == 0)
>                       return null;
>               try {
>                       return "hi";
>               } catch (Exception e) {
>                       int errCode = 31415;
>                       String msg = "Error while performing maxmind lookup
> in " + this.getClass().getSimpleName();
>                       throw new ExecException(msg, errCode,
> PigException.BUG, e);
>               }
>       }
> }
> ~