You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Dmitriy Ryaboy <dv...@gmail.com> on 2010/03/29 23:51:33 UTC

Elephant Bird released

Hi folks,
We (but mostly Kevin Weil) just open-sourced some of the code we use at
Twitter to make working with Hadoop and Pig easier. Most of what is
currently included in "Elephant Bird" deals with generating Input/Output
formats for LZO-compressed protocol buffers, Pig LoadFuncs and StoreFuncs
for the same; there are also some handy loaders for LZO-compressed stuff
that is not probtobuf based.

The project is on github: http://github.com/kevinweil/elephant-bird/

Kevin presented on some of this at at HUG recently:
http://www.slideshare.net/hadoopusergroup/twitter-protobufs-and-hadoop-hug-021709

Feedback, bug reports, and patches are welcome! Hope you find this useful.

-Dmitriy

Re: Elephant Bird released

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
ElephantBird now also contains UDFs for dynamically invoking (a subset of)
Java functions that operate on basic classes like Integers, Doubles,
Strings, etc, without having to write custom UDFs every time.  This will be
native to Pig 0.8, but for now you can use the same functionality in 0.5+ by
including the elephant-bird jar.

>From the javadoc:
----------------------------------------

This UDF allows one to dynamically invoke Java methods that return a T

Usage of the Invoker family of UDFs (adjust as appropriate):

 -- invoking a static method
 DEFINE StringToLong InvokeForLong('java.lang.Long.valueOf', 'String')
 longs = FOREACH strings GENERATE StringToLong(some_chararray);

 -- invoking a method on an object
 DEFINE StringConcat InvokeForString('java.lang.String.concat',
'String String', 'false')
 concatenations = FOREACH strings GENERATE StringConcat(str1, str2);

 The first argument to the constructor is the full path to desired method.
The second argument is a list of classes of the method parameters.
If the method is not static, the first element in this list is the object to
invoke the method on.
The third argument is the keyword "static" (or "true") to signify that the
method is static.
The third argument is optional, and true by default.

----------------------------------------

-Dmitriy




On Mon, Mar 29, 2010 at 2:51 PM, Dmitriy Ryaboy <dv...@gmail.com> wrote:

> Hi folks,
> We (but mostly Kevin Weil) just open-sourced some of the code we use at
> Twitter to make working with Hadoop and Pig easier. Most of what is
> currently included in "Elephant Bird" deals with generating Input/Output
> formats for LZO-compressed protocol buffers, Pig LoadFuncs and StoreFuncs
> for the same; there are also some handy loaders for LZO-compressed stuff
> that is not probtobuf based.
>
> The project is on github: http://github.com/kevinweil/elephant-bird/
>
> Kevin presented on some of this at at HUG recently:
> http://www.slideshare.net/hadoopusergroup/twitter-protobufs-and-hadoop-hug-021709
>
> Feedback, bug reports, and patches are welcome! Hope you find this useful.
>
> -Dmitriy
>

Re: Elephant Bird released

Posted by Alan Gates <ga...@yahoo-inc.com>.
I added a link to this on http://wiki.apache.org/pig/PigTools

Alan.

On Mar 29, 2010, at 2:51 PM, Dmitriy Ryaboy wrote:

> Hi folks,
> We (but mostly Kevin Weil) just open-sourced some of the code we use  
> at
> Twitter to make working with Hadoop and Pig easier. Most of what is
> currently included in "Elephant Bird" deals with generating Input/ 
> Output
> formats for LZO-compressed protocol buffers, Pig LoadFuncs and  
> StoreFuncs
> for the same; there are also some handy loaders for LZO-compressed  
> stuff
> that is not probtobuf based.
>
> The project is on github: http://github.com/kevinweil/elephant-bird/
>
> Kevin presented on some of this at at HUG recently:
> http://www.slideshare.net/hadoopusergroup/twitter-protobufs-and-hadoop-hug-021709
>
> Feedback, bug reports, and patches are welcome! Hope you find this  
> useful.
>
> -Dmitriy


Re: Elephant Bird released

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Rohan,
Yes. I think. Let us know if it is not.

-Dmitriy

On Mon, Mar 29, 2010 at 8:42 PM, Rohan Rai <ro...@inmobi.com> wrote:

> Hey...
>
> I am so excited seeing this...
> I am at the edge of my seat...
> I cant even wait to see what it is...
> So just looking and hoping for a heads up..
> Is this the same thing for which people with requirement
> of compressed format and compatibility with pig were waiting for...
>
> Regards
> Rohan
>
> Dmitriy Ryaboy wrote:
>
>> Hi folks,
>> We (but mostly Kevin Weil) just open-sourced some of the code we use at
>> Twitter to make working with Hadoop and Pig easier. Most of what is
>> currently included in "Elephant Bird" deals with generating Input/Output
>> formats for LZO-compressed protocol buffers, Pig LoadFuncs and StoreFuncs
>> for the same; there are also some handy loaders for LZO-compressed stuff
>> that is not probtobuf based.
>>
>> The project is on github: http://github.com/kevinweil/elephant-bird/
>>
>> Kevin presented on some of this at at HUG recently:
>>
>> http://www.slideshare.net/hadoopusergroup/twitter-protobufs-and-hadoop-hug-021709
>>
>> Feedback, bug reports, and patches are welcome! Hope you find this useful.
>>
>> -Dmitriy
>> .
>>
>>
>>
>
> The information contained in this communication is intended solely for the
> use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally privileged
> information. If you are not the intended recipient you are hereby notified
> that any disclosure, copying, distribution or taking any action in reliance
> on the contents of this information is strictly prohibited and may be
> unlawful. If you have received this communication in error, please notify us
> immediately by responding to this email and then delete it from your system.
> The firm is neither liable for the proper and complete transmission of the
> information contained in this communication nor for any delay in its
> receipt.
>

Re: Elephant Bird released

Posted by Rohan Rai <ro...@inmobi.com>.
Hey...

I am so excited seeing this...
I am at the edge of my seat...
I cant even wait to see what it is...
So just looking and hoping for a heads up..
Is this the same thing for which people with requirement
of compressed format and compatibility with pig were waiting for...

Regards
Rohan

Dmitriy Ryaboy wrote:
> Hi folks,
> We (but mostly Kevin Weil) just open-sourced some of the code we use at
> Twitter to make working with Hadoop and Pig easier. Most of what is
> currently included in "Elephant Bird" deals with generating Input/Output
> formats for LZO-compressed protocol buffers, Pig LoadFuncs and StoreFuncs
> for the same; there are also some handy loaders for LZO-compressed stuff
> that is not probtobuf based.
>
> The project is on github: http://github.com/kevinweil/elephant-bird/
>
> Kevin presented on some of this at at HUG recently:
> http://www.slideshare.net/hadoopusergroup/twitter-protobufs-and-hadoop-hug-021709
>
> Feedback, bug reports, and patches are welcome! Hope you find this useful.
>
> -Dmitriy
> .
>
>


The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us immediately by responding to this email and then delete it from your system. The firm is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt.

Re: Elephant Bird released

Posted by 김영우 <wa...@gmail.com>.
Awesome!

Thank you all contributors.

-Youngwoo

2010/3/30 Dmitriy Ryaboy <dv...@gmail.com>

> Hi folks,
> We (but mostly Kevin Weil) just open-sourced some of the code we use at
> Twitter to make working with Hadoop and Pig easier. Most of what is
> currently included in "Elephant Bird" deals with generating Input/Output
> formats for LZO-compressed protocol buffers, Pig LoadFuncs and StoreFuncs
> for the same; there are also some handy loaders for LZO-compressed stuff
> that is not probtobuf based.
>
> The project is on github: http://github.com/kevinweil/elephant-bird/
>
> Kevin presented on some of this at at HUG recently:
>
> http://www.slideshare.net/hadoopusergroup/twitter-protobufs-and-hadoop-hug-021709
>
> Feedback, bug reports, and patches are welcome! Hope you find this useful.
>
> -Dmitriy
>

Re: Elephant Bird released

Posted by jr <jo...@io-consulting.net>.
sorry for the post everyone!
johannes


Re: Elephant Bird released

Posted by jr <jo...@io-consulting.net>.
Hello Kevin,
I hope it's alright if i reply about this off the list since i don't
think it'd be helpful for now.
I'm trying to figure out what has to be compiled how, and first thing i
found is the package com.twitter.data.proto.BlockStorage
i can only find the javadoc for this and a few .java files importing
from that package.
is that being generated by protobuf? unfortunately i'm not familiar with
protobuf at all, so i'm not even sure how to generate that package at
all :)
What do i need to generate/get that?
Johannes

Am Donnerstag, den 01.04.2010, 07:37 -0700 schrieb Kevin Weil:
> Johannes, it does require protobuf 2.3.  All of the inputformats, pig
> loaders, etc will themselves work on earlier versions of the protobuf
> library (we began on 2.2), but the protobuf codegen uses 2.3's new compiler
> plugin API.  If you don't need that, you should be able to use 2.2 with a
> little hand editing.
> 
> HTH,
> Kevin
> 
> On Thu, Apr 1, 2010 at 3:08 AM, jr <jo...@io-consulting.net>wrote:
> 
> > Hi Dmitriy,
> > does this require protobuf 2.3? I'm trying to build it on fedora and it
> > fails, i think it's because only 2.2 is available on fedora.
> > Best regards,
> > Johannes
> >
> > Am Montag, den 29.03.2010, 14:51 -0700 schrieb Dmitriy Ryaboy:
> > > Hi folks,
> > > We (but mostly Kevin Weil) just open-sourced some of the code we use at
> > > Twitter to make working with Hadoop and Pig easier. Most of what is
> > > currently included in "Elephant Bird" deals with generating Input/Output
> > > formats for LZO-compressed protocol buffers, Pig LoadFuncs and StoreFuncs
> > > for the same; there are also some handy loaders for LZO-compressed stuff
> > > that is not probtobuf based.
> > >
> > > The project is on github: http://github.com/kevinweil/elephant-bird/
> > >
> > > Kevin presented on some of this at at HUG recently:
> > >
> > http://www.slideshare.net/hadoopusergroup/twitter-protobufs-and-hadoop-hug-021709
> > >
> > > Feedback, bug reports, and patches are welcome! Hope you find this
> > useful.
> > >
> > > -Dmitriy
> >
> >


Re: Elephant Bird released

Posted by Kevin Weil <ke...@gmail.com>.
Johannes,

If you want to commit a patch to the build file with a "no-protobuf" target,
please do and send me a github pull request.  I bet you aren't the only one
who will want this.

Thanks,
Kevin

On Thu, Apr 1, 2010 at 7:58 AM, jr <jo...@io-consulting.net>wrote:

> Hello Kevin,
> thanks a lot, since i really only need the pig loaders i'll go for hand
> editing :)
> Johannes
>
> Am Donnerstag, den 01.04.2010, 07:37 -0700 schrieb Kevin Weil:
> > Johannes, it does require protobuf 2.3.  All of the inputformats, pig
> > loaders, etc will themselves work on earlier versions of the protobuf
> > library (we began on 2.2), but the protobuf codegen uses 2.3's new
> compiler
> > plugin API.  If you don't need that, you should be able to use 2.2 with a
> > little hand editing.
> >
> > HTH,
> > Kevin
> >
> > On Thu, Apr 1, 2010 at 3:08 AM, jr <johannes.russek@io-consulting.net
> >wrote:
> >
> > > Hi Dmitriy,
> > > does this require protobuf 2.3? I'm trying to build it on fedora and it
> > > fails, i think it's because only 2.2 is available on fedora.
> > > Best regards,
> > > Johannes
> > >
> > > Am Montag, den 29.03.2010, 14:51 -0700 schrieb Dmitriy Ryaboy:
> > > > Hi folks,
> > > > We (but mostly Kevin Weil) just open-sourced some of the code we use
> at
> > > > Twitter to make working with Hadoop and Pig easier. Most of what is
> > > > currently included in "Elephant Bird" deals with generating
> Input/Output
> > > > formats for LZO-compressed protocol buffers, Pig LoadFuncs and
> StoreFuncs
> > > > for the same; there are also some handy loaders for LZO-compressed
> stuff
> > > > that is not probtobuf based.
> > > >
> > > > The project is on github: http://github.com/kevinweil/elephant-bird/
> > > >
> > > > Kevin presented on some of this at at HUG recently:
> > > >
> > >
> http://www.slideshare.net/hadoopusergroup/twitter-protobufs-and-hadoop-hug-021709
> > > >
> > > > Feedback, bug reports, and patches are welcome! Hope you find this
> > > useful.
> > > >
> > > > -Dmitriy
> > >
> > >
>
>

Re: Elephant Bird released

Posted by jr <jo...@io-consulting.net>.
Hello Kevin,
thanks a lot, since i really only need the pig loaders i'll go for hand
editing :)
Johannes

Am Donnerstag, den 01.04.2010, 07:37 -0700 schrieb Kevin Weil:
> Johannes, it does require protobuf 2.3.  All of the inputformats, pig
> loaders, etc will themselves work on earlier versions of the protobuf
> library (we began on 2.2), but the protobuf codegen uses 2.3's new compiler
> plugin API.  If you don't need that, you should be able to use 2.2 with a
> little hand editing.
> 
> HTH,
> Kevin
> 
> On Thu, Apr 1, 2010 at 3:08 AM, jr <jo...@io-consulting.net>wrote:
> 
> > Hi Dmitriy,
> > does this require protobuf 2.3? I'm trying to build it on fedora and it
> > fails, i think it's because only 2.2 is available on fedora.
> > Best regards,
> > Johannes
> >
> > Am Montag, den 29.03.2010, 14:51 -0700 schrieb Dmitriy Ryaboy:
> > > Hi folks,
> > > We (but mostly Kevin Weil) just open-sourced some of the code we use at
> > > Twitter to make working with Hadoop and Pig easier. Most of what is
> > > currently included in "Elephant Bird" deals with generating Input/Output
> > > formats for LZO-compressed protocol buffers, Pig LoadFuncs and StoreFuncs
> > > for the same; there are also some handy loaders for LZO-compressed stuff
> > > that is not probtobuf based.
> > >
> > > The project is on github: http://github.com/kevinweil/elephant-bird/
> > >
> > > Kevin presented on some of this at at HUG recently:
> > >
> > http://www.slideshare.net/hadoopusergroup/twitter-protobufs-and-hadoop-hug-021709
> > >
> > > Feedback, bug reports, and patches are welcome! Hope you find this
> > useful.
> > >
> > > -Dmitriy
> >
> >


Re: Elephant Bird released

Posted by Kevin Weil <ke...@gmail.com>.
Johannes, it does require protobuf 2.3.  All of the inputformats, pig
loaders, etc will themselves work on earlier versions of the protobuf
library (we began on 2.2), but the protobuf codegen uses 2.3's new compiler
plugin API.  If you don't need that, you should be able to use 2.2 with a
little hand editing.

HTH,
Kevin

On Thu, Apr 1, 2010 at 3:08 AM, jr <jo...@io-consulting.net>wrote:

> Hi Dmitriy,
> does this require protobuf 2.3? I'm trying to build it on fedora and it
> fails, i think it's because only 2.2 is available on fedora.
> Best regards,
> Johannes
>
> Am Montag, den 29.03.2010, 14:51 -0700 schrieb Dmitriy Ryaboy:
> > Hi folks,
> > We (but mostly Kevin Weil) just open-sourced some of the code we use at
> > Twitter to make working with Hadoop and Pig easier. Most of what is
> > currently included in "Elephant Bird" deals with generating Input/Output
> > formats for LZO-compressed protocol buffers, Pig LoadFuncs and StoreFuncs
> > for the same; there are also some handy loaders for LZO-compressed stuff
> > that is not probtobuf based.
> >
> > The project is on github: http://github.com/kevinweil/elephant-bird/
> >
> > Kevin presented on some of this at at HUG recently:
> >
> http://www.slideshare.net/hadoopusergroup/twitter-protobufs-and-hadoop-hug-021709
> >
> > Feedback, bug reports, and patches are welcome! Hope you find this
> useful.
> >
> > -Dmitriy
>
>

Re: Elephant Bird released

Posted by jr <jo...@io-consulting.net>.
Hi Dmitriy,
does this require protobuf 2.3? I'm trying to build it on fedora and it
fails, i think it's because only 2.2 is available on fedora.
Best regards,
Johannes

Am Montag, den 29.03.2010, 14:51 -0700 schrieb Dmitriy Ryaboy:
> Hi folks,
> We (but mostly Kevin Weil) just open-sourced some of the code we use at
> Twitter to make working with Hadoop and Pig easier. Most of what is
> currently included in "Elephant Bird" deals with generating Input/Output
> formats for LZO-compressed protocol buffers, Pig LoadFuncs and StoreFuncs
> for the same; there are also some handy loaders for LZO-compressed stuff
> that is not probtobuf based.
> 
> The project is on github: http://github.com/kevinweil/elephant-bird/
> 
> Kevin presented on some of this at at HUG recently:
> http://www.slideshare.net/hadoopusergroup/twitter-protobufs-and-hadoop-hug-021709
> 
> Feedback, bug reports, and patches are welcome! Hope you find this useful.
> 
> -Dmitriy