You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Chaitanya Sharma <go...@gmail.com> on 2011/04/18 20:23:21 UTC

Elephant bird with Pig 0.8

Hi,

I am trying to get LZO support for my little pig - 0.8 project ,
I'm using https://github.com/gerritjvv/elephant-bird.git  for the
pig-lzo-loaders; and https://github.com/kevinweil/hadoop-lzo for the hadoop
lzo support.

The pig-loader,
com.twitter.elephantbird.mapreduce.input.LzoTextLoaderdoesn't seem to
be upto job, and the map-reduce job created doesn't proceed
any further than 0%.


Has anyone been successful at this, is so please share you experience.

Or what patch levels / git source branches should I be using to get this to
work?


Thanks,
Chaitanya

Re: Elephant bird with Pig 0.8

Posted by Chaitanya Sharma <go...@gmail.com>.
Hi Gerrit,

I don't know about GPB, but I am using LZO, and using using
com.twitter.elephantbird.pig8.load.LzoTextLoader.

Thanks
chaitanya

*
*

Please find attached the hadoop mapred/ core site configuration files and a
pig-error-log.

The error log resembles close to
http://www.mail-archive.com/user@pig.apache.org/msg01009.html mail thread,
but suggestions made there doesn�t seem to work.

Please find my environment config /versions below,



*csharma $ hadoop version*

*Hadoop 0.20.2-cdh3u0*

Subversion  -r 81256ad0f2e4ab2bd34b04f53d25a6c23686dd14

Compiled by root on Sat Mar 26 00:12:30 UTC 2011

>From source with checksum 6c1f62dddc4eac69b6b973c18bbc0f55



*csharma $ pig -version*

*Apache Pig version 0.8.0-cdh3u0 (rexported)*

compiled Mar 25 2011, 16:16:24



For LZO I am using, with build output snippets.

*csharma@hadoop-lzo $ git remote -v*

origin     https://github.com/kevinweil/hadoop-lzo.git (fetch)

origin     https://github.com/kevinweil/hadoop-lzo.git (push)



csharma@hadoop-lzo $ tree build/hadoop-lzo-0.4.10/lib/

build/hadoop-lzo-0.4.10/lib/

|-- commons-logging-1.0.4.jar

|-- commons-logging-api-1.0.4.jar

|-- junit-3.8.1.jar

`-- native

    `-- Linux-i386-32

        |-- libgplcompression.a

        |-- libgplcompression.la

        |-- libgplcompression.so -> libgplcompression.so.0.0.0

        |-- libgplcompression.so.0 -> libgplcompression.so.0.0.0

        `-- libgplcompression.so.0.0.0



csharma@hadoop-lzo $ ls -l build/

drwxr-xr-x 6 csharma csharma    4096 2011-04-18 08:11 hadoop-lzo-0.4.10

-rw-r--r-- 1 csharma csharma   59855 2011-04-18 08:11 hadoop-lzo-0.4.10.jar

-rw-r--r-- 1 csharma csharma 1810286 2011-04-18 08:11
hadoop-lzo-0.4.10.tar.gz





*For pig - 0.8 support, elephant bird :*

*csharma@elephant-bird-gerritjvv $ git remote �v  *

origin     https://github.com/gerritjvv/elephant-bird.git (fetch)

origin     https://github.com/gerritjvv/elephant-bird.git (push)



https://github.com/dvryaboy/elephant-bird/tree/pig-08

*csharma@ubuntu:~/Projects/elephant-bird-dvryaboy$ git remote �v  *

origin     https://github.com/dvryaboy/elephant-bird.git (fetch)

origin     https://github.com/dvryaboy/elephant-bird.git (push)



Also, been having a lot of problems building elephant bird *without* thrift
/ protobuf.



Now I did find this project,
http://code.google.com/p/hadoop-gpl-packing/, saying
EB can handle even Pig 0.8.

This confuses me - can I or can I not use Elephant Bird with Pig 0.8?

-------------------------------------------------------------------



I created a smaple lzo compressed data file clean.txt.lzo, then indexed the
lzo file to verify correct LZO installation for my pseudo cluster,

*hadoop jar /usr/lib/hadoop/lib/hadoop-lzo-0.4.10.jar
com.hadoop.compression.lzo.DistributedLzoIndexer /data/clean.txt.lzo*



but, when running pig jobs, with pig-lzo-loaders from elephant bird, I�ve
always been running into problems; following either just crashes or keeps
running forever, spitting gb�s of garbage data.

*grunt> register
/usr/lib/hadoop/lib/elephant-bird-2.0-SNAPSHOT.jar;
*

*grunt> d = load '/data/clean.txt.lzo' using
com.twitter.elephantbird.pig8.load.LzoTextLoader();*

|

|--Throws a boatload of errors, attached.




Please do let me know, how my configuration / versions are conflicting, and
which should I change to get this to work?






On Mon, Apr 18, 2011 at 2:52 PM, Gerrit Jansen van Vuuren <
gerritjvv@googlemail.com> wrote:

> also,
>
> If your using GPB and LZO, I use
> the com.twitter.elephantbird.pig.proto.LzoProtobuffB64LinePigStore .
>
>
> On Mon, Apr 18, 2011 at 8:47 PM, Gerrit Jansen van Vuuren <
> gerritjvv@googlemail.com> wrote:
>
>> Hi,
>>
>> I've used LzoTextLoader int the past and it seems to hang. Haven't looked
>> into why.
>>
>> Please try the com.twitter.elephantbird.pig.load.LzoPigStorage.
>>
>> Cheers,
>>  Gerrit
>>
>> On Mon, Apr 18, 2011 at 8:23 PM, Chaitanya Sharma <go...@gmail.com>wrote:
>>
>>> Hi,
>>>
>>> I am trying to get LZO support for my little pig - 0.8 project ,
>>> I'm using https://github.com/gerritjvv/elephant-bird.git  for the
>>> pig-lzo-loaders; and https://github.com/kevinweil/hadoop-lzo for the
>>> hadoop
>>> lzo support.
>>>
>>> The pig-loader,
>>> com.twitter.elephantbird.mapreduce.input.LzoTextLoaderdoesn't seem to
>>> be upto job, and the map-reduce job created doesn't proceed
>>> any further than 0%.
>>>
>>>
>>> Has anyone been successful at this, is so please share you experience.
>>>
>>> Or what patch levels / git source branches should I be using to get this
>>> to
>>> work?
>>>
>>>
>>> Thanks,
>>> Chaitanya
>>>
>>
>>
>

Re: Elephant bird with Pig 0.8

Posted by Gerrit Jansen van Vuuren <ge...@googlemail.com>.
also,

If your using GPB and LZO, I use
the com.twitter.elephantbird.pig.proto.LzoProtobuffB64LinePigStore .


On Mon, Apr 18, 2011 at 8:47 PM, Gerrit Jansen van Vuuren <
gerritjvv@googlemail.com> wrote:

> Hi,
>
> I've used LzoTextLoader int the past and it seems to hang. Haven't looked
> into why.
>
> Please try the com.twitter.elephantbird.pig.load.LzoPigStorage.
>
> Cheers,
>  Gerrit
>
> On Mon, Apr 18, 2011 at 8:23 PM, Chaitanya Sharma <go...@gmail.com>wrote:
>
>> Hi,
>>
>> I am trying to get LZO support for my little pig - 0.8 project ,
>> I'm using https://github.com/gerritjvv/elephant-bird.git  for the
>> pig-lzo-loaders; and https://github.com/kevinweil/hadoop-lzo for the
>> hadoop
>> lzo support.
>>
>> The pig-loader,
>> com.twitter.elephantbird.mapreduce.input.LzoTextLoaderdoesn't seem to
>> be upto job, and the map-reduce job created doesn't proceed
>> any further than 0%.
>>
>>
>> Has anyone been successful at this, is so please share you experience.
>>
>> Or what patch levels / git source branches should I be using to get this
>> to
>> work?
>>
>>
>> Thanks,
>> Chaitanya
>>
>
>

Re: Elephant bird with Pig 0.8

Posted by Gerrit Jansen van Vuuren <ge...@googlemail.com>.
Hi,

I've used LzoTextLoader int the past and it seems to hang. Haven't looked
into why.

Please try the com.twitter.elephantbird.pig.load.LzoPigStorage.

Cheers,
 Gerrit

On Mon, Apr 18, 2011 at 8:23 PM, Chaitanya Sharma <go...@gmail.com>wrote:

> Hi,
>
> I am trying to get LZO support for my little pig - 0.8 project ,
> I'm using https://github.com/gerritjvv/elephant-bird.git  for the
> pig-lzo-loaders; and https://github.com/kevinweil/hadoop-lzo for the
> hadoop
> lzo support.
>
> The pig-loader,
> com.twitter.elephantbird.mapreduce.input.LzoTextLoaderdoesn't seem to
> be upto job, and the map-reduce job created doesn't proceed
> any further than 0%.
>
>
> Has anyone been successful at this, is so please share you experience.
>
> Or what patch levels / git source branches should I be using to get this to
> work?
>
>
> Thanks,
> Chaitanya
>