You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Chaitanya Sharma <go...@gmail.com> on 2011/04/18 20:23:21 UTC
Elephant bird with Pig 0.8
Hi,
I am trying to get LZO support for my little pig - 0.8 project ,
I'm using https://github.com/gerritjvv/elephant-bird.git for the
pig-lzo-loaders; and https://github.com/kevinweil/hadoop-lzo for the hadoop
lzo support.
The pig-loader,
com.twitter.elephantbird.mapreduce.input.LzoTextLoaderdoesn't seem to
be upto job, and the map-reduce job created doesn't proceed
any further than 0%.
Has anyone been successful at this, is so please share you experience.
Or what patch levels / git source branches should I be using to get this to
work?
Thanks,
Chaitanya
Re: Elephant bird with Pig 0.8
Posted by Chaitanya Sharma <go...@gmail.com>.
Hi Gerrit,
I don't know about GPB, but I am using LZO, and using using
com.twitter.elephantbird.pig8.load.LzoTextLoader.
Thanks
chaitanya
*
*
Please find attached the hadoop mapred/ core site configuration files and a
pig-error-log.
The error log resembles close to
http://www.mail-archive.com/user@pig.apache.org/msg01009.html mail thread,
but suggestions made there doesn�t seem to work.
Please find my environment config /versions below,
*csharma $ hadoop version*
*Hadoop 0.20.2-cdh3u0*
Subversion -r 81256ad0f2e4ab2bd34b04f53d25a6c23686dd14
Compiled by root on Sat Mar 26 00:12:30 UTC 2011
>From source with checksum 6c1f62dddc4eac69b6b973c18bbc0f55
*csharma $ pig -version*
*Apache Pig version 0.8.0-cdh3u0 (rexported)*
compiled Mar 25 2011, 16:16:24
For LZO I am using, with build output snippets.
*csharma@hadoop-lzo $ git remote -v*
origin https://github.com/kevinweil/hadoop-lzo.git (fetch)
origin https://github.com/kevinweil/hadoop-lzo.git (push)
csharma@hadoop-lzo $ tree build/hadoop-lzo-0.4.10/lib/
build/hadoop-lzo-0.4.10/lib/
|-- commons-logging-1.0.4.jar
|-- commons-logging-api-1.0.4.jar
|-- junit-3.8.1.jar
`-- native
`-- Linux-i386-32
|-- libgplcompression.a
|-- libgplcompression.la
|-- libgplcompression.so -> libgplcompression.so.0.0.0
|-- libgplcompression.so.0 -> libgplcompression.so.0.0.0
`-- libgplcompression.so.0.0.0
csharma@hadoop-lzo $ ls -l build/
drwxr-xr-x 6 csharma csharma 4096 2011-04-18 08:11 hadoop-lzo-0.4.10
-rw-r--r-- 1 csharma csharma 59855 2011-04-18 08:11 hadoop-lzo-0.4.10.jar
-rw-r--r-- 1 csharma csharma 1810286 2011-04-18 08:11
hadoop-lzo-0.4.10.tar.gz
*For pig - 0.8 support, elephant bird :*
*csharma@elephant-bird-gerritjvv $ git remote �v *
origin https://github.com/gerritjvv/elephant-bird.git (fetch)
origin https://github.com/gerritjvv/elephant-bird.git (push)
https://github.com/dvryaboy/elephant-bird/tree/pig-08
*csharma@ubuntu:~/Projects/elephant-bird-dvryaboy$ git remote �v *
origin https://github.com/dvryaboy/elephant-bird.git (fetch)
origin https://github.com/dvryaboy/elephant-bird.git (push)
Also, been having a lot of problems building elephant bird *without* thrift
/ protobuf.
Now I did find this project,
http://code.google.com/p/hadoop-gpl-packing/, saying
EB can handle even Pig 0.8.
This confuses me - can I or can I not use Elephant Bird with Pig 0.8?
-------------------------------------------------------------------
I created a smaple lzo compressed data file clean.txt.lzo, then indexed the
lzo file to verify correct LZO installation for my pseudo cluster,
*hadoop jar /usr/lib/hadoop/lib/hadoop-lzo-0.4.10.jar
com.hadoop.compression.lzo.DistributedLzoIndexer /data/clean.txt.lzo*
but, when running pig jobs, with pig-lzo-loaders from elephant bird, I�ve
always been running into problems; following either just crashes or keeps
running forever, spitting gb�s of garbage data.
*grunt> register
/usr/lib/hadoop/lib/elephant-bird-2.0-SNAPSHOT.jar;
*
*grunt> d = load '/data/clean.txt.lzo' using
com.twitter.elephantbird.pig8.load.LzoTextLoader();*
|
|--Throws a boatload of errors, attached.
Please do let me know, how my configuration / versions are conflicting, and
which should I change to get this to work?
On Mon, Apr 18, 2011 at 2:52 PM, Gerrit Jansen van Vuuren <
gerritjvv@googlemail.com> wrote:
> also,
>
> If your using GPB and LZO, I use
> the com.twitter.elephantbird.pig.proto.LzoProtobuffB64LinePigStore .
>
>
> On Mon, Apr 18, 2011 at 8:47 PM, Gerrit Jansen van Vuuren <
> gerritjvv@googlemail.com> wrote:
>
>> Hi,
>>
>> I've used LzoTextLoader int the past and it seems to hang. Haven't looked
>> into why.
>>
>> Please try the com.twitter.elephantbird.pig.load.LzoPigStorage.
>>
>> Cheers,
>> Gerrit
>>
>> On Mon, Apr 18, 2011 at 8:23 PM, Chaitanya Sharma <go...@gmail.com>wrote:
>>
>>> Hi,
>>>
>>> I am trying to get LZO support for my little pig - 0.8 project ,
>>> I'm using https://github.com/gerritjvv/elephant-bird.git for the
>>> pig-lzo-loaders; and https://github.com/kevinweil/hadoop-lzo for the
>>> hadoop
>>> lzo support.
>>>
>>> The pig-loader,
>>> com.twitter.elephantbird.mapreduce.input.LzoTextLoaderdoesn't seem to
>>> be upto job, and the map-reduce job created doesn't proceed
>>> any further than 0%.
>>>
>>>
>>> Has anyone been successful at this, is so please share you experience.
>>>
>>> Or what patch levels / git source branches should I be using to get this
>>> to
>>> work?
>>>
>>>
>>> Thanks,
>>> Chaitanya
>>>
>>
>>
>
Re: Elephant bird with Pig 0.8
Posted by Gerrit Jansen van Vuuren <ge...@googlemail.com>.
also,
If your using GPB and LZO, I use
the com.twitter.elephantbird.pig.proto.LzoProtobuffB64LinePigStore .
On Mon, Apr 18, 2011 at 8:47 PM, Gerrit Jansen van Vuuren <
gerritjvv@googlemail.com> wrote:
> Hi,
>
> I've used LzoTextLoader int the past and it seems to hang. Haven't looked
> into why.
>
> Please try the com.twitter.elephantbird.pig.load.LzoPigStorage.
>
> Cheers,
> Gerrit
>
> On Mon, Apr 18, 2011 at 8:23 PM, Chaitanya Sharma <go...@gmail.com>wrote:
>
>> Hi,
>>
>> I am trying to get LZO support for my little pig - 0.8 project ,
>> I'm using https://github.com/gerritjvv/elephant-bird.git for the
>> pig-lzo-loaders; and https://github.com/kevinweil/hadoop-lzo for the
>> hadoop
>> lzo support.
>>
>> The pig-loader,
>> com.twitter.elephantbird.mapreduce.input.LzoTextLoaderdoesn't seem to
>> be upto job, and the map-reduce job created doesn't proceed
>> any further than 0%.
>>
>>
>> Has anyone been successful at this, is so please share you experience.
>>
>> Or what patch levels / git source branches should I be using to get this
>> to
>> work?
>>
>>
>> Thanks,
>> Chaitanya
>>
>
>
Re: Elephant bird with Pig 0.8
Posted by Gerrit Jansen van Vuuren <ge...@googlemail.com>.
Hi,
I've used LzoTextLoader int the past and it seems to hang. Haven't looked
into why.
Please try the com.twitter.elephantbird.pig.load.LzoPigStorage.
Cheers,
Gerrit
On Mon, Apr 18, 2011 at 8:23 PM, Chaitanya Sharma <go...@gmail.com>wrote:
> Hi,
>
> I am trying to get LZO support for my little pig - 0.8 project ,
> I'm using https://github.com/gerritjvv/elephant-bird.git for the
> pig-lzo-loaders; and https://github.com/kevinweil/hadoop-lzo for the
> hadoop
> lzo support.
>
> The pig-loader,
> com.twitter.elephantbird.mapreduce.input.LzoTextLoaderdoesn't seem to
> be upto job, and the map-reduce job created doesn't proceed
> any further than 0%.
>
>
> Has anyone been successful at this, is so please share you experience.
>
> Or what patch levels / git source branches should I be using to get this to
> work?
>
>
> Thanks,
> Chaitanya
>