You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Evert Lammerts <Ev...@sara.nl> on 2011/01/12 21:10:20 UTC

LZO & Pig (Elephantbird?)

Hello list,

I've installed the LZO codecs (https://github.com/kevinweil/hadoop-lzo) and
now I'm looking into using LZO in Pig. Elephant Bird
(https://github.com/kevinweil/elephant-bird) seems to provide some nice
prefab loaders, but it's requirements do not fit out Hadoop installation
(we're on CDH3b2 with Pig 0.7, EB cannot be used with anything > 0.6). Also
the need for Thrift 0.2 is unclear to me - Thrift is now at 0.5.

Now I did find this project, http://code.google.com/p/hadoop-gpl-packing/,
saying EB can handle even Pig 0.8. This confuses me - can I or can I not use
Elephant Bird with Pig 0.7, or even upgrade to Pig 0.8?

Since EB is probably not an option, does anybody have some pointers on how
to use LZO'ed files with Pig?

Thanks!

Evert Lammerts

Re: LZO & Pig (Elephantbird?)

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
P.S. Thrift 0.2 and 0.5 are binary-compatible, so you can read messages
generated with 0.5 using files compiled with thrift 0.2, and vice versa. We
have some projects that use 0.5 and some that are still on 0.2, and all that
means is that you install both versions of the compilers on your dev box and
flip your aliases depending on which project you are building.

On Wed, Jan 12, 2011 at 2:44 PM, Dmitriy Ryaboy <dv...@gmail.com> wrote:

> I am working on the pig 08 compatibility layer; it mostly works, fwiw.
> Converting to Thrift 0.5 would be fairly straightforward; unfortunately the
> signatures of Thrift messages changed so the code is not entirely backwards
> compatible. I don't think the changes for what we do with Pig are material.
>
> Are you trying to load protobufs or thrift files, or do you just want Lzo
> support? If you just want plain text lzo loading, the loaders in the pig-08
> branch totally work.
>
> Let me know if you have any issues.
>
> D
>
>
> On Wed, Jan 12, 2011 at 12:23 PM, Tyler Coffin <tc...@rim.com> wrote:
>
>> There's a fork of elephant-bird where pig-8 support is being worked on:
>> https://github.com/dvryaboy/elephant-bird/tree/pig-08
>>
>> I haven't given it a shot yet.
>>
>> -----Original Message-----
>> From: Evert Lammerts [mailto:Evert.Lammerts@sara.nl]
>> Sent: January 12, 2011 15:10
>> To: 'user@pig.apache.org'
>> Subject: LZO & Pig (Elephantbird?)
>>
>> Hello list,
>>
>> I've installed the LZO codecs (https://github.com/kevinweil/hadoop-lzo)
>> and
>> now I'm looking into using LZO in Pig. Elephant Bird
>> (https://github.com/kevinweil/elephant-bird) seems to provide some nice
>> prefab loaders, but it's requirements do not fit out Hadoop installation
>> (we're on CDH3b2 with Pig 0.7, EB cannot be used with anything > 0.6).
>> Also
>> the need for Thrift 0.2 is unclear to me - Thrift is now at 0.5.
>>
>> Now I did find this project, http://code.google.com/p/hadoop-gpl-packing/
>> ,
>> saying EB can handle even Pig 0.8. This confuses me - can I or can I not
>> use
>> Elephant Bird with Pig 0.7, or even upgrade to Pig 0.8?
>>
>> Since EB is probably not an option, does anybody have some pointers on how
>> to use LZO'ed files with Pig?
>>
>> Thanks!
>>
>> Evert Lammerts
>>
>> ---------------------------------------------------------------------
>> This transmission (including any attachments) may contain confidential
>> information, privileged material (including material protected by the
>> solicitor-client or other applicable privileges), or constitute non-public
>> information. Any use of this information by anyone other than the intended
>> recipient is prohibited. If you have received this transmission in error,
>> please immediately reply to the sender and delete this information from your
>> system. Use, dissemination, distribution, or reproduction of this
>> transmission by unintended recipients is not authorized and may be unlawful.
>>
>
>

Re: LZO & Pig (Elephantbird?)

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
Yes i figured you guys went far ahead since i last checked it.
Our code was spinning for past 7 mos or so, i think before we had a chance
to figure all details on EB

On Thu, Jan 20, 2011 at 12:41 PM, Dmitriy Ryaboy <dv...@gmail.com> wrote:

> FWIW we don't do codegen anymore either, both in the 0.6 and the
> 0.8-compatible branches.
> Pointing to a description file is a good idea, we'll add that.
>
> D
>
> On Thu, Jan 20, 2011 at 12:16 PM, Dmitriy Lyubimov <dlieu.7@gmail.com
> >wrote:
>
> > We just OSSd some load and store funcs for pig 0.7 cdh3b3 supporting
> > reads/writes protobuf from/to sequence files and hbase that we actually
> use
> > in our prod. There's no codegen and i guess they do not support lzo files
> > directly (but i guess one might enable lzo inside sequence files if
> needed.
> > )  It works rather nicely for us. I guess there might be a need for some
> > minor adjustements since we use grunt integrated in our redundant clients
> > rather then spin off grunt on its own. We haven't switched to 0.8 yet but
> i
> > gather the api gap for loadfuncs is narrower between 0.7 and 0.8 than
> > between 0.6 and 0.7 (we actually have some decommissioned funcs for 0.6
> in
> > that tree that we don't use anymore, too.) :
> > https://github.com/dlyubimov/ecoadapters
> >
> >
> >
> > On Thu, Jan 13, 2011 at 7:43 AM, Evert Lammerts <Evert.Lammerts@sara.nl
> > >wrote:
> >
> > > > Are you trying to load protobufs or thrift files, or do you just want
> > > > Lzo
> > > > support?
> > >
> > > Protobufs would be nice, but Elephant Bird is not ready yet for Pig 0.7
> /
> > > 0.8, right?
> > >
> > >
> >
>

Re: LZO & Pig (Elephantbird?)

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
FWIW we don't do codegen anymore either, both in the 0.6 and the
0.8-compatible branches.
Pointing to a description file is a good idea, we'll add that.

D

On Thu, Jan 20, 2011 at 12:16 PM, Dmitriy Lyubimov <dl...@gmail.com>wrote:

> We just OSSd some load and store funcs for pig 0.7 cdh3b3 supporting
> reads/writes protobuf from/to sequence files and hbase that we actually use
> in our prod. There's no codegen and i guess they do not support lzo files
> directly (but i guess one might enable lzo inside sequence files if needed.
> )  It works rather nicely for us. I guess there might be a need for some
> minor adjustements since we use grunt integrated in our redundant clients
> rather then spin off grunt on its own. We haven't switched to 0.8 yet but i
> gather the api gap for loadfuncs is narrower between 0.7 and 0.8 than
> between 0.6 and 0.7 (we actually have some decommissioned funcs for 0.6 in
> that tree that we don't use anymore, too.) :
> https://github.com/dlyubimov/ecoadapters
>
>
>
> On Thu, Jan 13, 2011 at 7:43 AM, Evert Lammerts <Evert.Lammerts@sara.nl
> >wrote:
>
> > > Are you trying to load protobufs or thrift files, or do you just want
> > > Lzo
> > > support?
> >
> > Protobufs would be nice, but Elephant Bird is not ready yet for Pig 0.7 /
> > 0.8, right?
> >
> >
>

Re: LZO & Pig (Elephantbird?)

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
We just OSSd some load and store funcs for pig 0.7 cdh3b3 supporting
reads/writes protobuf from/to sequence files and hbase that we actually use
in our prod. There's no codegen and i guess they do not support lzo files
directly (but i guess one might enable lzo inside sequence files if needed.
)  It works rather nicely for us. I guess there might be a need for some
minor adjustements since we use grunt integrated in our redundant clients
rather then spin off grunt on its own. We haven't switched to 0.8 yet but i
gather the api gap for loadfuncs is narrower between 0.7 and 0.8 than
between 0.6 and 0.7 (we actually have some decommissioned funcs for 0.6 in
that tree that we don't use anymore, too.) :
https://github.com/dlyubimov/ecoadapters



On Thu, Jan 13, 2011 at 7:43 AM, Evert Lammerts <Ev...@sara.nl>wrote:

> > Are you trying to load protobufs or thrift files, or do you just want
> > Lzo
> > support?
>
> Protobufs would be nice, but Elephant Bird is not ready yet for Pig 0.7 /
> 0.8, right?
>
>

Re: LZO & Pig (Elephantbird?)

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Depends on your definition of ready. I haven't put it into production and
there's a bit of clean-up left, but as far as I know there are no critical
bugs; just some interface issues which are shared between the current master
and the version for Pig 0.8.

Pig 0.8 should work with Hadoop 0.20.1; I've been working with it under CDH2
(which is a patched 0.20.1).

-Dmitriy

On Thu, Jan 13, 2011 at 7:43 AM, Evert Lammerts <Ev...@sara.nl>wrote:

> > Are you trying to load protobufs or thrift files, or do you just want
> > Lzo
> > support?
>
> Protobufs would be nice, but Elephant Bird is not ready yet for Pig 0.7 /
> 0.8, right?
>
> > If you just want plain text lzo loading, the loaders in the
> > pig-08
> > branch totally work.
>
> Does Pig 0.8 work with Hadoop 0.20.1 as well?
>
> Thanks for the support!
> Evert
>
> >
> > Let me know if you have any issues.
> >
> > D
> >
> > On Wed, Jan 12, 2011 at 12:23 PM, Tyler Coffin <tc...@rim.com> wrote:
> >
> > > There's a fork of elephant-bird where pig-8 support is being worked
> > on:
> > > https://github.com/dvryaboy/elephant-bird/tree/pig-08
> > >
> > > I haven't given it a shot yet.
> > >
> > > -----Original Message-----
> > > From: Evert Lammerts [mailto:Evert.Lammerts@sara.nl]
> > > Sent: January 12, 2011 15:10
> > > To: 'user@pig.apache.org'
> > > Subject: LZO & Pig (Elephantbird?)
> > >
> > > Hello list,
> > >
> > > I've installed the LZO codecs (https://github.com/kevinweil/hadoop-
> > lzo)
> > > and
> > > now I'm looking into using LZO in Pig. Elephant Bird
> > > (https://github.com/kevinweil/elephant-bird) seems to provide some
> > nice
> > > prefab loaders, but it's requirements do not fit out Hadoop
> > installation
> > > (we're on CDH3b2 with Pig 0.7, EB cannot be used with anything >
> > 0.6). Also
> > > the need for Thrift 0.2 is unclear to me - Thrift is now at 0.5.
> > >
> > > Now I did find this project, http://code.google.com/p/hadoop-gpl-
> > packing/,
> > > saying EB can handle even Pig 0.8. This confuses me - can I or can I
> > not
> > > use
> > > Elephant Bird with Pig 0.7, or even upgrade to Pig 0.8?
> > >
> > > Since EB is probably not an option, does anybody have some pointers
> > on how
> > > to use LZO'ed files with Pig?
> > >
> > > Thanks!
> > >
> > > Evert Lammerts
> > >
> > > ---------------------------------------------------------------------
> > > This transmission (including any attachments) may contain
> > confidential
> > > information, privileged material (including material protected by the
> > > solicitor-client or other applicable privileges), or constitute non-
> > public
> > > information. Any use of this information by anyone other than the
> > intended
> > > recipient is prohibited. If you have received this transmission in
> > error,
> > > please immediately reply to the sender and delete this information
> > from your
> > > system. Use, dissemination, distribution, or reproduction of this
> > > transmission by unintended recipients is not authorized and may be
> > unlawful.
> > >
>

RE: LZO & Pig (Elephantbird?)

Posted by Evert Lammerts <Ev...@sara.nl>.
> Are you trying to load protobufs or thrift files, or do you just want
> Lzo
> support?

Protobufs would be nice, but Elephant Bird is not ready yet for Pig 0.7 /
0.8, right?

> If you just want plain text lzo loading, the loaders in the
> pig-08
> branch totally work.

Does Pig 0.8 work with Hadoop 0.20.1 as well?

Thanks for the support!
Evert

> 
> Let me know if you have any issues.
> 
> D
> 
> On Wed, Jan 12, 2011 at 12:23 PM, Tyler Coffin <tc...@rim.com> wrote:
> 
> > There's a fork of elephant-bird where pig-8 support is being worked
> on:
> > https://github.com/dvryaboy/elephant-bird/tree/pig-08
> >
> > I haven't given it a shot yet.
> >
> > -----Original Message-----
> > From: Evert Lammerts [mailto:Evert.Lammerts@sara.nl]
> > Sent: January 12, 2011 15:10
> > To: 'user@pig.apache.org'
> > Subject: LZO & Pig (Elephantbird?)
> >
> > Hello list,
> >
> > I've installed the LZO codecs (https://github.com/kevinweil/hadoop-
> lzo)
> > and
> > now I'm looking into using LZO in Pig. Elephant Bird
> > (https://github.com/kevinweil/elephant-bird) seems to provide some
> nice
> > prefab loaders, but it's requirements do not fit out Hadoop
> installation
> > (we're on CDH3b2 with Pig 0.7, EB cannot be used with anything >
> 0.6). Also
> > the need for Thrift 0.2 is unclear to me - Thrift is now at 0.5.
> >
> > Now I did find this project, http://code.google.com/p/hadoop-gpl-
> packing/,
> > saying EB can handle even Pig 0.8. This confuses me - can I or can I
> not
> > use
> > Elephant Bird with Pig 0.7, or even upgrade to Pig 0.8?
> >
> > Since EB is probably not an option, does anybody have some pointers
> on how
> > to use LZO'ed files with Pig?
> >
> > Thanks!
> >
> > Evert Lammerts
> >
> > ---------------------------------------------------------------------
> > This transmission (including any attachments) may contain
> confidential
> > information, privileged material (including material protected by the
> > solicitor-client or other applicable privileges), or constitute non-
> public
> > information. Any use of this information by anyone other than the
> intended
> > recipient is prohibited. If you have received this transmission in
> error,
> > please immediately reply to the sender and delete this information
> from your
> > system. Use, dissemination, distribution, or reproduction of this
> > transmission by unintended recipients is not authorized and may be
> unlawful.
> >

Re: LZO & Pig (Elephantbird?)

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
I am working on the pig 08 compatibility layer; it mostly works, fwiw.
Converting to Thrift 0.5 would be fairly straightforward; unfortunately the
signatures of Thrift messages changed so the code is not entirely backwards
compatible. I don't think the changes for what we do with Pig are material.

Are you trying to load protobufs or thrift files, or do you just want Lzo
support? If you just want plain text lzo loading, the loaders in the pig-08
branch totally work.

Let me know if you have any issues.

D

On Wed, Jan 12, 2011 at 12:23 PM, Tyler Coffin <tc...@rim.com> wrote:

> There's a fork of elephant-bird where pig-8 support is being worked on:
> https://github.com/dvryaboy/elephant-bird/tree/pig-08
>
> I haven't given it a shot yet.
>
> -----Original Message-----
> From: Evert Lammerts [mailto:Evert.Lammerts@sara.nl]
> Sent: January 12, 2011 15:10
> To: 'user@pig.apache.org'
> Subject: LZO & Pig (Elephantbird?)
>
> Hello list,
>
> I've installed the LZO codecs (https://github.com/kevinweil/hadoop-lzo)
> and
> now I'm looking into using LZO in Pig. Elephant Bird
> (https://github.com/kevinweil/elephant-bird) seems to provide some nice
> prefab loaders, but it's requirements do not fit out Hadoop installation
> (we're on CDH3b2 with Pig 0.7, EB cannot be used with anything > 0.6). Also
> the need for Thrift 0.2 is unclear to me - Thrift is now at 0.5.
>
> Now I did find this project, http://code.google.com/p/hadoop-gpl-packing/,
> saying EB can handle even Pig 0.8. This confuses me - can I or can I not
> use
> Elephant Bird with Pig 0.7, or even upgrade to Pig 0.8?
>
> Since EB is probably not an option, does anybody have some pointers on how
> to use LZO'ed files with Pig?
>
> Thanks!
>
> Evert Lammerts
>
> ---------------------------------------------------------------------
> This transmission (including any attachments) may contain confidential
> information, privileged material (including material protected by the
> solicitor-client or other applicable privileges), or constitute non-public
> information. Any use of this information by anyone other than the intended
> recipient is prohibited. If you have received this transmission in error,
> please immediately reply to the sender and delete this information from your
> system. Use, dissemination, distribution, or reproduction of this
> transmission by unintended recipients is not authorized and may be unlawful.
>

RE: LZO & Pig (Elephantbird?)

Posted by Tyler Coffin <tc...@rim.com>.
There's a fork of elephant-bird where pig-8 support is being worked on:
https://github.com/dvryaboy/elephant-bird/tree/pig-08

I haven't given it a shot yet.

-----Original Message-----
From: Evert Lammerts [mailto:Evert.Lammerts@sara.nl] 
Sent: January 12, 2011 15:10
To: 'user@pig.apache.org'
Subject: LZO & Pig (Elephantbird?)

Hello list,

I've installed the LZO codecs (https://github.com/kevinweil/hadoop-lzo) and
now I'm looking into using LZO in Pig. Elephant Bird
(https://github.com/kevinweil/elephant-bird) seems to provide some nice
prefab loaders, but it's requirements do not fit out Hadoop installation
(we're on CDH3b2 with Pig 0.7, EB cannot be used with anything > 0.6). Also
the need for Thrift 0.2 is unclear to me - Thrift is now at 0.5.

Now I did find this project, http://code.google.com/p/hadoop-gpl-packing/,
saying EB can handle even Pig 0.8. This confuses me - can I or can I not use
Elephant Bird with Pig 0.7, or even upgrade to Pig 0.8?

Since EB is probably not an option, does anybody have some pointers on how
to use LZO'ed files with Pig?

Thanks!

Evert Lammerts

---------------------------------------------------------------------
This transmission (including any attachments) may contain confidential information, privileged material (including material protected by the solicitor-client or other applicable privileges), or constitute non-public information. Any use of this information by anyone other than the intended recipient is prohibited. If you have received this transmission in error, please immediately reply to the sender and delete this information from your system. Use, dissemination, distribution, or reproduction of this transmission by unintended recipients is not authorized and may be unlawful.