You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Chris Olston <ol...@yahoo-inc.com> on 2008/05/20 21:54:43 UTC

fwd: Pig questions

James: forwarding your mail to the pig-user mailing list (probably a  
good idea for you and/or your students to subscribe).

Regarding Hadoop and "hadoop-on-demand", I do not know the answer but  
will forward your question to the hadoop guys.

Regarding a function library for Pig that would include Tokenize and  
other useful functions, I know that such a library does exist within  
Yahoo, and there is an effort underway to create a public library of  
Pig functions that would include string manipulations such as  
Tokenize, as well as some basic math functionality and other items.  
The contact person for this effort is Olga Natkovich (olgan@yahoo- 
inc.com) -- perhaps you can send a list of functions you'd like to  
see to her, and if all goes well they will go into a public library  
over the summer.

Cheers,

Chris

Begin forwarded message:

> From: James Allan <al...@cs.umass.edu>
> Date: May 20, 2008 12:19:59 PM PDT
> To: Chris Olston <ol...@yahoo-inc.com>
> Cc: Rosie Jones <jo...@yahoo-inc.com>
> Subject: Re: PIG requests/suggestions/complaints?
>
> Chris,
>
> After months of distractions, we're getting back to using PIG for  
> some projects this summer.  I have an urgent question about hadoop  
> and then a less urgent question about PIG for you.
>
> The urgent question regards getting hadoop running on a cluster  
> that has the grid engine running.  Our problem is that "hadoop on  
> demand" uses torque rather than grid engine (which we use here).   
> We're trying to hack h.o.d. to use grid engine, but are running  
> into problems.  We're wondering if there's someone we could talk  
> with about that problem.
>
> The less urgent question involves PIG and utility functions.  Our  
> biggest problem with PIG is that the "obvious" (to us)  
> functionality that one would expect is missing.  For example, we  
> can't find a way to use PIG to count the occurrences of every word  
> token in a text file--viz., we can't tokenize in PIG.  To deal with  
> it, we're writing our own little modules to extend PIG (as we can  
> starting with 1.2).  My question is....  is there a library of such  
> added functionality?  If not, is there a plan to create such a  
> repository?
>
> Thanks.
>
>                                         -- james

--
Christopher Olston, Ph.D.
Sr. Research Scientist
Yahoo! Research



RE: Pig questions

Posted by Amir Youssefi <am...@yahoo-inc.com>.
Olga, 

 Please let us know when we have contrib part ready so I kick off
pushing selection of many UDFs we have at Yahoo to contrib.

Regards
Amir

-----Original Message-----
From: Chris Olston [mailto:olston@yahoo-inc.com] 
Sent: Tuesday, May 20, 2008 12:55 PM
To: pig-user@incubator.apache.org
Cc: allan@cs.umass.edu; Rosie Jones
Subject: fwd: Pig questions

James: forwarding your mail to the pig-user mailing list (probably a
good idea for you and/or your students to subscribe).

Regarding Hadoop and "hadoop-on-demand", I do not know the answer but
will forward your question to the hadoop guys.

Regarding a function library for Pig that would include Tokenize and
other useful functions, I know that such a library does exist within
Yahoo, and there is an effort underway to create a public library of Pig
functions that would include string manipulations such as Tokenize, as
well as some basic math functionality and other items.  
The contact person for this effort is Olga Natkovich (olgan@yahoo-
inc.com) -- perhaps you can send a list of functions you'd like to see
to her, and if all goes well they will go into a public library over the
summer.

Cheers,

Chris

Begin forwarded message:

> From: James Allan <al...@cs.umass.edu>
> Date: May 20, 2008 12:19:59 PM PDT
> To: Chris Olston <ol...@yahoo-inc.com>
> Cc: Rosie Jones <jo...@yahoo-inc.com>
> Subject: Re: PIG requests/suggestions/complaints?
>
> Chris,
>
> After months of distractions, we're getting back to using PIG for some

> projects this summer.  I have an urgent question about hadoop and then

> a less urgent question about PIG for you.
>
> The urgent question regards getting hadoop running on a cluster that 
> has the grid engine running.  Our problem is that "hadoop on
> demand" uses torque rather than grid engine (which we use here).   
> We're trying to hack h.o.d. to use grid engine, but are running into 
> problems.  We're wondering if there's someone we could talk with about

> that problem.
>
> The less urgent question involves PIG and utility functions.  Our 
> biggest problem with PIG is that the "obvious" (to us) functionality 
> that one would expect is missing.  For example, we can't find a way to

> use PIG to count the occurrences of every word token in a text 
> file--viz., we can't tokenize in PIG.  To deal with it, we're writing 
> our own little modules to extend PIG (as we can starting with 1.2).  
> My question is....  is there a library of such added functionality?  
> If not, is there a plan to create such a repository?
>
> Thanks.
>
>                                         -- james

--
Christopher Olston, Ph.D.
Sr. Research Scientist
Yahoo! Research