You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by James Kebinger <jk...@gmail.com> on 2011/08/29 20:15:22 UTC

How to coalesce fields in Pig?

My apologies if this is in the docs somewhere, I was unable to find
anything, but I might be calling it the wrong name.

I'm doing a full outer join in Pig - as such, one or the other join keys may
be null. I'd like to be able to look at 2 columns, and retrieve just the one
that is not null. Is that possible?

I tried an expression in generate with is null and the ternary operator, and
took a look at DECODE. That might do the trick but wasn't sure if null
checking would work, and if other expressions could appear inside the
decode.

In my case the fields are integers, so I abused the MAX and TOBAG operators
like this MAX(TOBAG(rx_keyed::u2,cx_keyed::u2)) to get the effect I was
after, but I would love to know if there's a better way.

Thanks for your time!

-James Kebinger

Re: How to coalesce fields in Pig?

Posted by James Kebinger <jk...@gmail.com>.
Thanks, it must have been the lack of parenthesis that did me in when i
tried the ternary expression, or some other typo. I'll use that in the
future.

On Mon, Aug 29, 2011 at 2:29 PM, Dmitriy Ryaboy <dv...@gmail.com> wrote:

> Hi James,
> I use ternary expressions for this: foreach joined generate ( rel1.x is
> null
> ? rel2.x : rel1.x) as x;
>
> On Mon, Aug 29, 2011 at 11:15 AM, James Kebinger <jkebinger@gmail.com
> >wrote:
>
> > My apologies if this is in the docs somewhere, I was unable to find
> > anything, but I might be calling it the wrong name.
> >
> > I'm doing a full outer join in Pig - as such, one or the other join keys
> > may
> > be null. I'd like to be able to look at 2 columns, and retrieve just the
> > one
> > that is not null. Is that possible?
> >
> > I tried an expression in generate with is null and the ternary operator,
> > and
> > took a look at DECODE. That might do the trick but wasn't sure if null
> > checking would work, and if other expressions could appear inside the
> > decode.
> >
> > In my case the fields are integers, so I abused the MAX and TOBAG
> operators
> > like this MAX(TOBAG(rx_keyed::u2,cx_keyed::u2)) to get the effect I was
> > after, but I would love to know if there's a better way.
> >
> > Thanks for your time!
> >
> > -James Kebinger
> >
>

Re: How to coalesce fields in Pig?

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Hi James,
I use ternary expressions for this: foreach joined generate ( rel1.x is null
? rel2.x : rel1.x) as x;

On Mon, Aug 29, 2011 at 11:15 AM, James Kebinger <jk...@gmail.com>wrote:

> My apologies if this is in the docs somewhere, I was unable to find
> anything, but I might be calling it the wrong name.
>
> I'm doing a full outer join in Pig - as such, one or the other join keys
> may
> be null. I'd like to be able to look at 2 columns, and retrieve just the
> one
> that is not null. Is that possible?
>
> I tried an expression in generate with is null and the ternary operator,
> and
> took a look at DECODE. That might do the trick but wasn't sure if null
> checking would work, and if other expressions could appear inside the
> decode.
>
> In my case the fields are integers, so I abused the MAX and TOBAG operators
> like this MAX(TOBAG(rx_keyed::u2,cx_keyed::u2)) to get the effect I was
> after, but I would love to know if there's a better way.
>
> Thanks for your time!
>
> -James Kebinger
>