You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Chun Yang <cy...@contractor.salesforce.com> on 2012/07/06 21:41:53 UTC

Alias Confusion

Hi all,

I'm walking through a pig script in grunt, but I am getting stuck with some
issues using nested foreach. I'm using Pig version 0.9.2

I'm trying to find the number of unique users from a bag 'top100'

grunt> describe top100
top100: {name: chararray,licenses: long,instance: chararray,transactions:
long,users: {(projected::userId: chararray)},runTimes: {(projected::runTime:
double)}}

grunt>
grunt>
grunt>


Re: Alias Confusion

Posted by Russell Jurney <ru...@gmail.com>.
Don't duplicate relation names as column names.

Russell Jurney http://datasyndrome.com

On Jul 6, 2012, at 12:56 PM, Chun Yang <cy...@contractor.salesforce.com> wrote:

> Hi all,
>
> I'm walking through a pig script in grunt, but I am getting stuck with some
> issues using nested foreach. I'm using Pig version 0.9.2
>
> I'm trying to find the number of unique users from a bag 'top100'
>
> grunt> describe top100
> top100: {name: chararray,licenses: long,instance: chararray,transactions:
> long,users: {(projected::userId: chararray)},runTimes: {(projected::runTime:
> double)}}
>
> grunt> uu = foreach top100 {
>>> uniqUsers = distinct users;
>>> generate uniqUsers as uniqUsers;
>>> }
> ERROR 1200: Pig script failed to parse:
> <line 132, column 9> Invalid scalar projection: uniqUsers : A column needs
> to be projected from a relation for it to be used as a scalar
>
> I realized that I had defined uniqUsers earlier, but I didn't think it would
> conflict inside the nested foreach block. The schema for uniqUsers is:
>
> grunt> describe uniqUsers
> uniqUsers: {key: chararray,uniqUsers: long}
>
> I tried a different alias for the distinct clause and it seems to work.
>
> grunt> uu = foreach top100 {
>>> un = distinct users;
>>> generate un as uniqUsers;
>>> }
> grunt> describe uu
> uu: {un: {(projected::userId: chararray)}}
> grunt> uu = foreach top100 {
>>> un = distinct users;
>>> generate COUNT(un) as uniqUsers;
>>> }
> grunt> describe uu
> uu: {uniqUsers: long}
>
> I was curious, so I tried the following, but I do not understand what the
> results are.
>
> grunt> u2 = foreach top100 {
>>> uniqUsers = distinct users;
>>> generate uniqUsers.key;
>>> }
> grunt> describe u2
> u2: {projected::userId: chararray}
>
> grunt> u3 = foreach top100 {
>>> uniqUsers = distinct users;
>>> generate uniqUsers.uniqUsers;
>>> }
> grunt> describe u3
> u3: {projected::userId: chararray}
>
> Specifically, what is actually in the result of u3? Why is it a chararray
> when uniqUsers.uniqUsers is a long? Why is the alias still
> projected::userId?
>
> Thanks for any help!
>
> -Chun
>
> PS Sorry for the double post, I accidentally hit a keyboard shortcut for
> Send.
>

Re: Alias Confusion

Posted by Chun Yang <cy...@contractor.salesforce.com>.
Hi all,

I'm walking through a pig script in grunt, but I am getting stuck with some
issues using nested foreach. I'm using Pig version 0.9.2

I'm trying to find the number of unique users from a bag 'top100'

grunt> describe top100
top100: {name: chararray,licenses: long,instance: chararray,transactions:
long,users: {(projected::userId: chararray)},runTimes: {(projected::runTime:
double)}}

grunt> uu = foreach top100 {
>> uniqUsers = distinct users;
>> generate uniqUsers as uniqUsers;
>> }
ERROR 1200: Pig script failed to parse:
<line 132, column 9> Invalid scalar projection: uniqUsers : A column needs
to be projected from a relation for it to be used as a scalar

I realized that I had defined uniqUsers earlier, but I didn't think it would
conflict inside the nested foreach block. The schema for uniqUsers is:

grunt> describe uniqUsers
uniqUsers: {key: chararray,uniqUsers: long}

I tried a different alias for the distinct clause and it seems to work.

grunt> uu = foreach top100 {
>> un = distinct users;
>> generate un as uniqUsers;
>> }
grunt> describe uu
uu: {un: {(projected::userId: chararray)}}
grunt> uu = foreach top100 {
>> un = distinct users;
>> generate COUNT(un) as uniqUsers;
>> }
grunt> describe uu
uu: {uniqUsers: long}

I was curious, so I tried the following, but I do not understand what the
results are.

grunt> u2 = foreach top100 {
>> uniqUsers = distinct users;
>> generate uniqUsers.key;
>> }
grunt> describe u2
u2: {projected::userId: chararray}

grunt> u3 = foreach top100 {
>> uniqUsers = distinct users;
>> generate uniqUsers.uniqUsers;
>> }
grunt> describe u3
u3: {projected::userId: chararray}

Specifically, what is actually in the result of u3? Why is it a chararray
when uniqUsers.uniqUsers is a long? Why is the alias still
projected::userId?

Thanks for any help!

-Chun

PS Sorry for the double post, I accidentally hit a keyboard shortcut for
Send.