You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by jr <jo...@io-consulting.net> on 2010/06/16 15:10:51 UTC

simple way to REPLACE on various columns

Hello Everybody,
I'm looking for a way to run REPLACE on multiple columns in a dataset to
escape some characters that would confuse loading after processing in
pig. 
Is there an easy way to do that without having to do 

FOREACH x GENERATE REPLACE(a,"char","\\char"), REPLACE(b,"char","\
\char"), REPLACE(c... etc

?
Johannes


Re: simple way to REPLACE on various columns

Posted by hc busy <hc...@gmail.com>.
yeah, that'd be really cool. The other way that we can say this, (to make
map reduce interface available in pig), is to allow FOREACH to be nested:


TRIMED_TABLE = FOREACH TABLE {

     stripped = FOREACH TABLE.SOME_BAG GENERATE String.Trim(value);

     GENERATE k1,k2,k3, stripped;

}

On Wed, Jun 16, 2010 at 12:08 PM, Dmitriy Ryaboy <dv...@gmail.com> wrote:

> We really need an APPLY function (really, it's map, but don't want to
> overload the terms), which would take a funcspec and apply it to every
> element in a tuple or bag. Then you could say
> FOREACH x GENERATE APPLY REPLACE(*, "char", "\\char") TO (a, b, c);
>
> That would be rad. Especially useful when dealing with the bags produced by
> grouping things and projections on such bags.
>
> -D
>
> On Wed, Jun 16, 2010 at 6:10 AM, jr <johannes.russek@io-consulting.net
> >wrote:
>
> > Hello Everybody,
> > I'm looking for a way to run REPLACE on multiple columns in a dataset to
> > escape some characters that would confuse loading after processing in
> > pig.
> > Is there an easy way to do that without having to do
> >
> > FOREACH x GENERATE REPLACE(a,"char","\\char"), REPLACE(b,"char","\
> > \char"), REPLACE(c... etc
> >
> > ?
> > Johannes
> >
> >
>

Re: simple way to REPLACE on various columns

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
We really need an APPLY function (really, it's map, but don't want to
overload the terms), which would take a funcspec and apply it to every
element in a tuple or bag. Then you could say
FOREACH x GENERATE APPLY REPLACE(*, "char", "\\char") TO (a, b, c);

That would be rad. Especially useful when dealing with the bags produced by
grouping things and projections on such bags.

-D

On Wed, Jun 16, 2010 at 6:10 AM, jr <jo...@io-consulting.net>wrote:

> Hello Everybody,
> I'm looking for a way to run REPLACE on multiple columns in a dataset to
> escape some characters that would confuse loading after processing in
> pig.
> Is there an easy way to do that without having to do
>
> FOREACH x GENERATE REPLACE(a,"char","\\char"), REPLACE(b,"char","\
> \char"), REPLACE(c... etc
>
> ?
> Johannes
>
>