You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Matthew Hayes <mh...@linkedin.com> on 2011/09/27 19:19:34 UTC

DataFu library

Hi Pig users,

I wanted to share with you all that we recently open sourced a library we have been developing at LinkedIn.  In it we have collected many of the useful UDFs we have developed for products such as "People You May Know" and "Skills".  There are UDFs for median, quantiles, set operations, bag operations, pagerank, etc.  All the UDFs are pretty well documented and unit tested (also tracking code coverage).  It would be great to get people's feedback on it.  Also if anyone would like to contribute please let us know :)

Project page:
http://sna-projects.com/datafu/

Github:
https://github.com/linkedin/datafu

Thanks,
Matt Hayes


Re: DataFu library

Posted by Norbert Burger <no...@gmail.com>.
Hi Matt and the rest of the LinkedIn team:

Thanks for open-sourcing this project!

Norbert

On Tue, Sep 27, 2011 at 1:35 PM, Lakshminarayana Motamarri <
narayana.gupta123@gmail.com> wrote:

> Thank you for sharing the UDF's...
>
> On Tue, Sep 27, 2011 at 10:19 AM, Matthew Hayes <mh...@linkedin.com>
> wrote:
>
> > Hi Pig users,
> >
> > I wanted to share with you all that we recently open sourced a library we
> > have been developing at LinkedIn.  In it we have collected many of the
> > useful UDFs we have developed for products such as "People You May Know"
> and
> > "Skills".  There are UDFs for median, quantiles, set operations, bag
> > operations, pagerank, etc.  All the UDFs are pretty well documented and
> unit
> > tested (also tracking code coverage).  It would be great to get people's
> > feedback on it.  Also if anyone would like to contribute please let us
> know
> > :)
> >
> > Project page:
> > http://sna-projects.com/datafu/
> >
> > Github:
> > https://github.com/linkedin/datafu
> >
> > Thanks,
> > Matt Hayes
> >
> >
>

Re: DataFu library

Posted by Lakshminarayana Motamarri <na...@gmail.com>.
Thank you for sharing the UDF's...

On Tue, Sep 27, 2011 at 10:19 AM, Matthew Hayes <mh...@linkedin.com> wrote:

> Hi Pig users,
>
> I wanted to share with you all that we recently open sourced a library we
> have been developing at LinkedIn.  In it we have collected many of the
> useful UDFs we have developed for products such as "People You May Know" and
> "Skills".  There are UDFs for median, quantiles, set operations, bag
> operations, pagerank, etc.  All the UDFs are pretty well documented and unit
> tested (also tracking code coverage).  It would be great to get people's
> feedback on it.  Also if anyone would like to contribute please let us know
> :)
>
> Project page:
> http://sna-projects.com/datafu/
>
> Github:
> https://github.com/linkedin/datafu
>
> Thanks,
> Matt Hayes
>
>