You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by David Lapsley <da...@openx.com> on 2012/09/08 00:41:05 UTC

Applying schemas after flatten?

Hi Folks:

I am new to the pig world. I have been using it for about a week and I am completely blown away with how good it is.

I have a question about Schemas. I have a processing chain similar to the following:

A = LOAD 'data' USING PigStorage('\u0001') AS (y:chararray, cust1:int, cust2:int);
B = FOREACH A GENERATE (y, {(cust1), (cust2)}) AS t: tuple(y, CUSTS);
C = FOREACH B GENERATE(t.y, FLATTEN(t.CUSTS));

So, basically, my raw data contains multiple customer records per row, and some common data. I would like to "explode" each row, so that I have one row per customer data (which also includes the common data).

The code above does this, however, I am not able to supply a schema for C. Whenever I try to do this, I get an error regarding mismatched schemas.

I would greatly appreciate any pointers you may have.

Best regards,

Dave.


Re: Applying schemas after flatten?

Posted by TianYi Zhu <ti...@facilitatedigital.com>.
Hi Dave
try
C = FOREACH B generate(t.y, Flatten(t.CUSTS) AS (anothery:chararray,
custbag:bag));

On Sat, Sep 8, 2012 at 8:41 AM, David Lapsley <da...@openx.com>wrote:

> Hi Folks:
>
> I am new to the pig world. I have been using it for about a week and I am
> completely blown away with how good it is.
>
> I have a question about Schemas. I have a processing chain similar to the
> following:
>
> A = LOAD 'data' USING PigStorage('\u0001') AS (y:chararray, cust1:int,
> cust2:int);
> B = FOREACH A GENERATE (y, {(cust1), (cust2)}) AS t: tuple(y, CUSTS);
> C = FOREACH B GENERATE(t.y, FLATTEN(t.CUSTS));
>
> So, basically, my raw data contains multiple customer records per row, and
> some common data. I would like to "explode" each row, so that I have one
> row per customer data (which also includes the common data).
>
> The code above does this, however, I am not able to supply a schema for C.
> Whenever I try to do this, I get an error regarding mismatched schemas.
>
> I would greatly appreciate any pointers you may have.
>
> Best regards,
>
> Dave.
>
>