You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Chris Olston <ol...@yahoo-inc.com> on 2008/05/20 21:54:43 UTC
fwd: Pig questions
James: forwarding your mail to the pig-user mailing list (probably a
good idea for you and/or your students to subscribe).
Regarding Hadoop and "hadoop-on-demand", I do not know the answer but
will forward your question to the hadoop guys.
Regarding a function library for Pig that would include Tokenize and
other useful functions, I know that such a library does exist within
Yahoo, and there is an effort underway to create a public library of
Pig functions that would include string manipulations such as
Tokenize, as well as some basic math functionality and other items.
The contact person for this effort is Olga Natkovich (olgan@yahoo-
inc.com) -- perhaps you can send a list of functions you'd like to
see to her, and if all goes well they will go into a public library
over the summer.
Cheers,
Chris
Begin forwarded message:
> From: James Allan <al...@cs.umass.edu>
> Date: May 20, 2008 12:19:59 PM PDT
> To: Chris Olston <ol...@yahoo-inc.com>
> Cc: Rosie Jones <jo...@yahoo-inc.com>
> Subject: Re: PIG requests/suggestions/complaints?
>
> Chris,
>
> After months of distractions, we're getting back to using PIG for
> some projects this summer. I have an urgent question about hadoop
> and then a less urgent question about PIG for you.
>
> The urgent question regards getting hadoop running on a cluster
> that has the grid engine running. Our problem is that "hadoop on
> demand" uses torque rather than grid engine (which we use here).
> We're trying to hack h.o.d. to use grid engine, but are running
> into problems. We're wondering if there's someone we could talk
> with about that problem.
>
> The less urgent question involves PIG and utility functions. Our
> biggest problem with PIG is that the "obvious" (to us)
> functionality that one would expect is missing. For example, we
> can't find a way to use PIG to count the occurrences of every word
> token in a text file--viz., we can't tokenize in PIG. To deal with
> it, we're writing our own little modules to extend PIG (as we can
> starting with 1.2). My question is.... is there a library of such
> added functionality? If not, is there a plan to create such a
> repository?
>
> Thanks.
>
> -- james
--
Christopher Olston, Ph.D.
Sr. Research Scientist
Yahoo! Research
RE: Pig questions
Posted by Amir Youssefi <am...@yahoo-inc.com>.
Olga,
Please let us know when we have contrib part ready so I kick off
pushing selection of many UDFs we have at Yahoo to contrib.
Regards
Amir
-----Original Message-----
From: Chris Olston [mailto:olston@yahoo-inc.com]
Sent: Tuesday, May 20, 2008 12:55 PM
To: pig-user@incubator.apache.org
Cc: allan@cs.umass.edu; Rosie Jones
Subject: fwd: Pig questions
James: forwarding your mail to the pig-user mailing list (probably a
good idea for you and/or your students to subscribe).
Regarding Hadoop and "hadoop-on-demand", I do not know the answer but
will forward your question to the hadoop guys.
Regarding a function library for Pig that would include Tokenize and
other useful functions, I know that such a library does exist within
Yahoo, and there is an effort underway to create a public library of Pig
functions that would include string manipulations such as Tokenize, as
well as some basic math functionality and other items.
The contact person for this effort is Olga Natkovich (olgan@yahoo-
inc.com) -- perhaps you can send a list of functions you'd like to see
to her, and if all goes well they will go into a public library over the
summer.
Cheers,
Chris
Begin forwarded message:
> From: James Allan <al...@cs.umass.edu>
> Date: May 20, 2008 12:19:59 PM PDT
> To: Chris Olston <ol...@yahoo-inc.com>
> Cc: Rosie Jones <jo...@yahoo-inc.com>
> Subject: Re: PIG requests/suggestions/complaints?
>
> Chris,
>
> After months of distractions, we're getting back to using PIG for some
> projects this summer. I have an urgent question about hadoop and then
> a less urgent question about PIG for you.
>
> The urgent question regards getting hadoop running on a cluster that
> has the grid engine running. Our problem is that "hadoop on
> demand" uses torque rather than grid engine (which we use here).
> We're trying to hack h.o.d. to use grid engine, but are running into
> problems. We're wondering if there's someone we could talk with about
> that problem.
>
> The less urgent question involves PIG and utility functions. Our
> biggest problem with PIG is that the "obvious" (to us) functionality
> that one would expect is missing. For example, we can't find a way to
> use PIG to count the occurrences of every word token in a text
> file--viz., we can't tokenize in PIG. To deal with it, we're writing
> our own little modules to extend PIG (as we can starting with 1.2).
> My question is.... is there a library of such added functionality?
> If not, is there a plan to create such a repository?
>
> Thanks.
>
> -- james
--
Christopher Olston, Ph.D.
Sr. Research Scientist
Yahoo! Research