You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by Peter Gieser <pe...@intentmedia.com> on 2012/04/10 08:26:32 UTC

"duplicate uid in schema" feature or bug?

I have created a bug (https://issues.apache.org/jira/browse/PIG-2636) based on the following (simplified) script:

A = LOAD 'bug.in' AS a:tuple(x:int, y:int);
B1 = FOREACH A GENERATE a.x, a.y;
B2 = FOREACH A GENERATE a.x, a.y;
C = JOIN B1 BY x, B2 by x;

that yields the following error:

org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: Logical plan invalid state: duplicate uid in schema : B1::x#35:int,B1::y#36:int,B2::x#35:int,B2::y#36:int


I assumed this was a bug, but perhaps pig is not meant to support this?  Is there an easy way to achieve the result if it turns out to be unsupported?

Thanks,
Pete

Re: "duplicate uid in schema" feature or bug?

Posted by Norbert Burger <no...@gmail.com>.

Not sure if this will work in your use-case, but adding a FLATTEN to strip
the outer tuple before the FOREACHs seems to detour Pig enough to work
around the bug:

B = FOREACH A GENERATE FLATTEN(a);
B1 = FOREACH B GENERATE x, y;
B2 = FOREACH B GENERATE x, y;

Norbert

On Tue, Apr 10, 2012 at 2:42 AM, Jonathan Coveney <jc...@gmail.com>wrote:

> This is indeed a bug (and a pretty nasty, if infrequent, one). Thank you
> for filing the JIRA!
>
> 2012/4/9 Peter Gieser <pe...@intentmedia.com>
>
> > I have created a bug (https://issues.apache.org/jira/browse/PIG-2636)
> > based on the following (simplified) script:
> >
> > A = LOAD 'bug.in' AS a:tuple(x:int, y:int);
> > B1 = FOREACH A GENERATE a.x, a.y;
> > B2 = FOREACH A GENERATE a.x, a.y;
> > C = JOIN B1 BY x, B2 by x;
> >
> > that yields the following error:
> >
> > org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: Logical
> plan
> > invalid state: duplicate uid in schema :
> > B1::x#35:int,B1::y#36:int,B2::x#35:int,B2::y#36:int
> >
> >
> > I assumed this was a bug, but perhaps pig is not meant to support this?
> >  Is there an easy way to achieve the result if it turns out to be
> > unsupported?
> >
> > Thanks,
> > Pete
>

Re: "duplicate uid in schema" feature or bug?

Posted by Jonathan Coveney <jc...@gmail.com>.

This is indeed a bug (and a pretty nasty, if infrequent, one). Thank you
for filing the JIRA!

2012/4/9 Peter Gieser <pe...@intentmedia.com>

> I have created a bug (https://issues.apache.org/jira/browse/PIG-2636)
> based on the following (simplified) script:
>
> A = LOAD 'bug.in' AS a:tuple(x:int, y:int);
> B1 = FOREACH A GENERATE a.x, a.y;
> B2 = FOREACH A GENERATE a.x, a.y;
> C = JOIN B1 BY x, B2 by x;
>
> that yields the following error:
>
> org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: Logical plan
> invalid state: duplicate uid in schema :
> B1::x#35:int,B1::y#36:int,B2::x#35:int,B2::y#36:int
>
>
> I assumed this was a bug, but perhaps pig is not meant to support this?
>  Is there an easy way to achieve the result if it turns out to be
> unsupported?
>
> Thanks,
> Pete