You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Haitao Yao <ya...@gmail.com> on 2012/08/28 09:20:49 UTC
Add file command in Pig
hi, all
I want to add GeoIP.dat to my pig scripts. Does Pig have the "add file XXX" command like hive? I want to distribute the data file GeoIP.dat with Pig.
Or is there any other work around?
I don't want to install GeoIP on every hadoop node, so I want to distribute the data file with pig itself.
thanks.
Haitao Yao
yao.erix@gmail.com
weibo: @haitao_yao
Skype: haitao.yao.final
RE: Add file command in Pig
Posted by "Duckworth, Will" <wd...@comscore.com>.
I agree with Jon. We do it with the distributed cache. The GeoIP files that we use are updated monthly. So it makes more sense to put it in the cache than recompile Pig monthly.
Will Duckworth Senior Vice President, Software Engineering | comScore, Inc.(NASDAQ:SCOR)
o +1 (703) 438-2108 | m +1 (301) 606-2977 | mailto:wduckworth@comscore.com
.....................................................................................................
Introducing Mobile Metrix 2.0 - The next generation of mobile behavioral measurement
www.comscore.com/MobileMetrix
-----Original Message-----
From: Jonathan Coveney [mailto:jcoveney@gmail.com]
Sent: Tuesday, August 28, 2012 1:53 PM
To: user@pig.apache.org
Subject: Re: Add file command in Pig
Using the distributed cache is more ideal, IMHO. The UDF that uses it can just add it to the distributed cache (should be in 9 and 10, I can check if you like).
If you want to include it with pig, then you have to include it in the Pig jar, and then you can call it from the Pig script. It's a little tricky but doable. A bit of a hack.
2012/8/28 Haitao Yao <ya...@gmail.com>
> hi, all
> I want to add GeoIP.dat to my pig scripts. Does Pig have the
> "add file XXX" command like hive? I want to distribute the data file
> GeoIP.dat with Pig.
> Or is there any other work around?
> I don't want to install GeoIP on every hadoop node, so I want
> to distribute the data file with pig itself.
>
> thanks.
>
>
>
> Haitao Yao
> yao.erix@gmail.com
> weibo: @haitao_yao
> Skype: haitao.yao.final
>
>
Re: Add file command in Pig
Posted by Jonathan Coveney <jc...@gmail.com>.
Using the distributed cache is more ideal, IMHO. The UDF that uses it can
just add it to the distributed cache (should be in 9 and 10, I can check if
you like).
If you want to include it with pig, then you have to include it in the Pig
jar, and then you can call it from the Pig script. It's a little tricky but
doable. A bit of a hack.
2012/8/28 Haitao Yao <ya...@gmail.com>
> hi, all
> I want to add GeoIP.dat to my pig scripts. Does Pig have the "add
> file XXX" command like hive? I want to distribute the data file GeoIP.dat
> with Pig.
> Or is there any other work around?
> I don't want to install GeoIP on every hadoop node, so I want to
> distribute the data file with pig itself.
>
> thanks.
>
>
>
> Haitao Yao
> yao.erix@gmail.com
> weibo: @haitao_yao
> Skype: haitao.yao.final
>
>