You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by Robin Anil <ro...@gmail.com> on 2009/09/29 17:40:04 UTC

Including PIG release as a mahout dependency.

Arguments for:
1) Pig has well implemented Tuple implementation(Writable) and
DataBag(collection of tuples) which works well  over hadoop

http://hadoop.apache.org/pig/javadoc/docs/api/org/apache/pig/data/package-summary.html
       Built in functions over tuples.
http://hadoop.apache.org/pig/javadoc/docs/api/org/apache/pig/builtin/package-summary.html

2) There is a contribution to Mahout in PIG.


Or i could just lift them from there.

Robin

Re: Including PIG release as a mahout dependency.

Posted by zaki rahaman <za...@gmail.com>.

Which contrib is in Pig? I remember seeing something related to Hoffman's
PLSI? Is that what you're referring to?

And I see no problems in including it, other than Pig's being very very
fickle when it comes to Hadoop. For me personally, it's been frustrating to
have scripts break down when running locally/vs on EC2 bc of the quirks
between version numbers, but this is something that is being worked on I
believe. What would be even neater I think is to be able to drive Mahout
jobs and functions from Pig (or at the very least store processed data in
formats that Mahout can use). No easy task, but I can dream, can't I?

On Tue, Sep 29, 2009 at 11:47 AM, Ted Dunning <te...@gmail.com> wrote:

> Other arguments for:
>
> 1) pig is a very nice way to express lots of algorithms.  Cooccurence and
> cross occurrence counting in 20-30 lines, that sort of thing.
>
> Cons:
>
> Pig is pickier than any of our other code about hadoop versions.
>
> On Tue, Sep 29, 2009 at 8:40 AM, Robin Anil <ro...@gmail.com> wrote:
>
> > Arguments for:
> > 1) Pig has well implemented Tuple implementation(Writable) and
> > DataBag(collection of tuples) which works well  over hadoop
> >
> >
> >
> http://hadoop.apache.org/pig/javadoc/docs/api/org/apache/pig/data/package-summary.html
> >       Built in functions over tuples.
> >
> >
> http://hadoop.apache.org/pig/javadoc/docs/api/org/apache/pig/builtin/package-summary.html
> >
> > 2) There is a contribution to Mahout in PIG.
> >
> >
> > Or i could just lift them from there.
> >
> > Robin
> >
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>

-- 
Zaki Rahaman

Re: Including PIG release as a mahout dependency.

Posted by Ted Dunning <te...@gmail.com>.

Other arguments for:

1) pig is a very nice way to express lots of algorithms.  Cooccurence and
cross occurrence counting in 20-30 lines, that sort of thing.

Cons:

Pig is pickier than any of our other code about hadoop versions.

On Tue, Sep 29, 2009 at 8:40 AM, Robin Anil <ro...@gmail.com> wrote:

> Arguments for:
> 1) Pig has well implemented Tuple implementation(Writable) and
> DataBag(collection of tuples) which works well  over hadoop
>
>
> http://hadoop.apache.org/pig/javadoc/docs/api/org/apache/pig/data/package-summary.html
>       Built in functions over tuples.
>
> http://hadoop.apache.org/pig/javadoc/docs/api/org/apache/pig/builtin/package-summary.html
>
> 2) There is a contribution to Mahout in PIG.
>
>
> Or i could just lift them from there.
>
> Robin
>

-- 
Ted Dunning, CTO
DeepDyve