You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Ashish Dobhal <do...@gmail.com> on 2014/07/21 14:44:28 UTC

Problem in understanding UDF COUNT

a = load '/user/hue/word_count_text.txt';
b = foreach a generate flatten(TOKENIZE((chararray)$0)) as word;
c = group b by word;
d = foreach c generate COUNT(b), group;

I want to know what would be the input to the udf COUNT in this
case.Also what is the meaning of b being passed as an arguement.

Also I am still not clear acout how count operates.

Thanks

Ashish

Re: Problem in understanding UDF COUNT

Posted by Serega Sheypak <se...@gmail.com>.
b = foreach a generate flatten(TOKENIZE((chararray)$0)) as word;
--result
--aa
--bb
--cc
--cc
--cc

c = group b by word;
--aa{aa}
--bb{bb}
--cc{cc,cc,cc}

d = foreach c generate COUNT(b), group;
--1, aa
--1, bb
--3, cc



2014-07-21 16:44 GMT+04:00 Ashish Dobhal <do...@gmail.com>:

> a = load '/user/hue/word_count_text.txt';
> b = foreach a generate flatten(TOKENIZE((chararray)$0)) as word;
> c = group b by word;
> d = foreach c generate COUNT(b), group;
>
> I want to know what would be the input to the udf COUNT in this
> case.Also what is the meaning of b being passed as an arguement.
>
> Also I am still not clear acout how count operates.
>
> Thanks
>
> Ashish
>

Re: Problem in understanding UDF COUNT

Posted by Serega Sheypak <se...@gmail.com>.
You are welcome, hoe this helps you with pig. It's really easy


2014-07-23 17:01 GMT+04:00 Ashish Dobhal <do...@gmail.com>:

> Thanks Serega Sheypak.
>
>
> On Wed, Jul 23, 2014 at 6:16 PM, Serega Sheypak <se...@gmail.com>
> wrote:
>
> > The best way to get answers for such easy questions
> > 1. read docs
> > 2. create sample script and run
> >
> > doc says that a group (bag of tuple having the same 'stars' value) would
> be
> > passed to your UDF.
> > Can't understand what confuses you. These things are really basics.
> >
> >
> > 2014-07-23 16:30 GMT+04:00 Ashish Dobhal <do...@gmail.com>:
> >
> > > Sorry ,
> > > I mean group a by stars;
> > >
> > >
> > > On Wed, Jul 23, 2014 at 5:58 PM, Serega Sheypak <
> > serega.sheypak@gmail.com>
> > > wrote:
> > >
> > > > a=load....
> > > > --
> > > > b=group movies by stars;
> > > > --error here movies is not an alias
> > > >
> > > > c= foreach b genearte myudf(a);
> > > >
> > > >
> > > >
> > > > 2014-07-23 16:03 GMT+04:00 Ashish Dobhal <dobhalashish772@gmail.com
> >:
> > > >
> > > > > Thanks Shahab and William I am now clear about The count
> > > > functionality.But
> > > > > stil I have a doubt in the functioning of UDF in general.
> > > > > Example:
> > > > > a=load 'movies' using PigStorage() as (name:chararray,
> > > > > movid:int,stars:int,comment:varchar(300));
> > > > > b=group movies by stars;
> > > > > c= foreach b genearte myudf(a);
> > > > > In this case what would be the input to the udf : the entire group
> > or a
> > > > > single tupple of that group.
> > > > > I think the input would be a single tupple of that group for each
> > > > > itteration but not sure.
> > > > > Thanks.
> > > > > Ashish.
> > > > >
> > > > >
> > > > > On Tue, Jul 22, 2014 at 5:30 PM, Shahab Yunus <
> > shahab.yunus@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > That is confusing and that is something that William Dowling
> > > explained
> > > > an
> > > > > > email blow.
> > > > > >
> > > > > > The scope of the alias b has changed. Now when used with 'for
> each'
> > > on
> > > > c,
> > > > > > the alias/variable b will be used just to count what belongs to
> the
> > > > > current
> > > > > > c.
> > > > > >
> > > > > > Imagine that b although is a bag of all the records but when
> passed
> > > to
> > > > > the
> > > > > > count function in 'for each c', only those items/records are
> > filtered
> > > > or
> > > > > > counted which belong to the current c.
> > > > > >
> > > > > > Take a look at this link that I sent earlier (especially the
> > > age_counts
> > > > > > example):
> > > > > >
> > > >
> > http://squarecog.wordpress.com/2010/05/11/group-operator-in-apache-pig/
> > > > > >
> > > > > > It does not explain everything but it is a more detailed example
> > with
> > > > > > comments and perhaps would help you to understand this Pig
> specific
> > > > > > concept.
> > > > > >
> > > > > > Regards,
> > > > > > Shahab
> > > > > >
> > > > > >
> > > > > > On Tue, Jul 22, 2014 at 12:07 AM, Ashish Dobhal <
> > > > > dobhalashish772@gmail.com
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > In this case does the b refer to the tupples corresponding to a
> > > > single
> > > > > > > group. If so I still did not get the point because b is a bag
> > that
> > > > > > contains
> > > > > > > all the records and not only the records of a single group
> > > > > > >
> > > > > > > On Jul 21, 2014 8:33 PM, <wi...@thomsonreuters.com>
> > > wrote:
> > > > > > > >
> > > > > > > > This was hard for me to get when I started using pig, and it
> > > still
> > > > > > annoys
> > > > > > > me after 1.5 year's experience with pig. In mathematics and
> > logic,
> > > > > > > quantifiers (like "for each", "there exist") bind variables
> that
> > > > occur
> > > > > in
> > > > > > > their scope:
> > > > > > > > (for each x)(there exists y) [y > x]
> > > > > > > >
> > > > > > > > The (for each x) binds x in (there exists y) [y > x]
> > > > > > > >
> > > > > > > > But in pig the variable x in (for each x) *does not bind
> > > > occurrences
> > > > > of
> > > > > > > x* in the following subexpression. IMO this is an unnecessary
> > > > stumbling
> > > > > > > block to people learning pig, who have a background in math or
> > > logic.
> > > > > > > >
> > > > > > > > Here is how you can read
> > > > > > > >         foreach c generate COUNT(b), group;
> > > > > > > > so it makes sense:
> > > > > > > >         c's components are "group" and (bag) b, so:
> > > > > > > >         foreach (group, b) in c generate COUNT(b), group;
> > > > > > > >
> > > > > > > > I would love it if the Pig syntax were extended to allow
> > > > quantifiers
> > > > > > like
> > > > > > >  "foreach (group, b) in c" but I don't know how feasible that
> > would
> > > > be.
> > > > > > > >
> > > > > > > > William F Dowling
> > > > > > > > Senior Technologist
> > > > > > > > Thomson Reuters
> > > > > > > >
> > > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Ashish Dobhal [mailto:dobhalashish772@gmail.com]
> > > > > > > > Sent: Monday, July 21, 2014 10:34 AM
> > > > > > > > To: user@pig.apache.org
> > > > > > > > Subject: Re: Problem in understanding UDF COUNT
> > > > > > > >
> > > > > > > > Shahab Thanks
> > > > > > > > My doubt is why are we taking the bag b and not  bag c as the
> > > > > arguement
> > > > > > > in the COUNT(b) function.
> > > > > > > > The bag c contains the groups and not hte bag b.
> > > > > > > > TThanks.
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Jul 21, 2014 at 6:21 PM, Shahab Yunus <
> > > > > shahab.yunus@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Have you seen this documentation and blog?
> > > > > > > > >
> > > > > >
> > > http://squarecog.wordpress.com/2010/05/11/group-operator-in-apache-pig
> > > > > > > > > / http://pig.apache.org/docs/r0.9.2/func.html#count
> > > > > > > > >
> > > > > > > > > They explain this in detail.
> > > > > > > > >
> > > > > > > > > Regards,
> > > > > > > > > Shahab
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, Jul 21, 2014 at 8:44 AM, Ashish Dobhal
> > > > > > > > > <do...@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > a = load '/user/hue/word_count_text.txt'; b = foreach a
> > > > generate
> > > > > > > > > > flatten(TOKENIZE((chararray)$0)) as word; c = group b by
> > > word;
> > > > d
> > > > > =
> > > > > > > > > > foreach c generate COUNT(b), group;
> > > > > > > > > >
> > > > > > > > > > I want to know what would be the input to the udf COUNT
> in
> > > this
> > > > > > > > > > case.Also what is the meaning of b being passed as an
> > > > arguement.
> > > > > > > > > >
> > > > > > > > > > Also I am still not clear acout how count operates.
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > > Ashish
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Problem in understanding UDF COUNT

Posted by Ashish Dobhal <do...@gmail.com>.
Thanks Serega Sheypak.


On Wed, Jul 23, 2014 at 6:16 PM, Serega Sheypak <se...@gmail.com>
wrote:

> The best way to get answers for such easy questions
> 1. read docs
> 2. create sample script and run
>
> doc says that a group (bag of tuple having the same 'stars' value) would be
> passed to your UDF.
> Can't understand what confuses you. These things are really basics.
>
>
> 2014-07-23 16:30 GMT+04:00 Ashish Dobhal <do...@gmail.com>:
>
> > Sorry ,
> > I mean group a by stars;
> >
> >
> > On Wed, Jul 23, 2014 at 5:58 PM, Serega Sheypak <
> serega.sheypak@gmail.com>
> > wrote:
> >
> > > a=load....
> > > --
> > > b=group movies by stars;
> > > --error here movies is not an alias
> > >
> > > c= foreach b genearte myudf(a);
> > >
> > >
> > >
> > > 2014-07-23 16:03 GMT+04:00 Ashish Dobhal <do...@gmail.com>:
> > >
> > > > Thanks Shahab and William I am now clear about The count
> > > functionality.But
> > > > stil I have a doubt in the functioning of UDF in general.
> > > > Example:
> > > > a=load 'movies' using PigStorage() as (name:chararray,
> > > > movid:int,stars:int,comment:varchar(300));
> > > > b=group movies by stars;
> > > > c= foreach b genearte myudf(a);
> > > > In this case what would be the input to the udf : the entire group
> or a
> > > > single tupple of that group.
> > > > I think the input would be a single tupple of that group for each
> > > > itteration but not sure.
> > > > Thanks.
> > > > Ashish.
> > > >
> > > >
> > > > On Tue, Jul 22, 2014 at 5:30 PM, Shahab Yunus <
> shahab.yunus@gmail.com>
> > > > wrote:
> > > >
> > > > > That is confusing and that is something that William Dowling
> > explained
> > > an
> > > > > email blow.
> > > > >
> > > > > The scope of the alias b has changed. Now when used with 'for each'
> > on
> > > c,
> > > > > the alias/variable b will be used just to count what belongs to the
> > > > current
> > > > > c.
> > > > >
> > > > > Imagine that b although is a bag of all the records but when passed
> > to
> > > > the
> > > > > count function in 'for each c', only those items/records are
> filtered
> > > or
> > > > > counted which belong to the current c.
> > > > >
> > > > > Take a look at this link that I sent earlier (especially the
> > age_counts
> > > > > example):
> > > > >
> > >
> http://squarecog.wordpress.com/2010/05/11/group-operator-in-apache-pig/
> > > > >
> > > > > It does not explain everything but it is a more detailed example
> with
> > > > > comments and perhaps would help you to understand this Pig specific
> > > > > concept.
> > > > >
> > > > > Regards,
> > > > > Shahab
> > > > >
> > > > >
> > > > > On Tue, Jul 22, 2014 at 12:07 AM, Ashish Dobhal <
> > > > dobhalashish772@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > In this case does the b refer to the tupples corresponding to a
> > > single
> > > > > > group. If so I still did not get the point because b is a bag
> that
> > > > > contains
> > > > > > all the records and not only the records of a single group
> > > > > >
> > > > > > On Jul 21, 2014 8:33 PM, <wi...@thomsonreuters.com>
> > wrote:
> > > > > > >
> > > > > > > This was hard for me to get when I started using pig, and it
> > still
> > > > > annoys
> > > > > > me after 1.5 year's experience with pig. In mathematics and
> logic,
> > > > > > quantifiers (like "for each", "there exist") bind variables that
> > > occur
> > > > in
> > > > > > their scope:
> > > > > > > (for each x)(there exists y) [y > x]
> > > > > > >
> > > > > > > The (for each x) binds x in (there exists y) [y > x]
> > > > > > >
> > > > > > > But in pig the variable x in (for each x) *does not bind
> > > occurrences
> > > > of
> > > > > > x* in the following subexpression. IMO this is an unnecessary
> > > stumbling
> > > > > > block to people learning pig, who have a background in math or
> > logic.
> > > > > > >
> > > > > > > Here is how you can read
> > > > > > >         foreach c generate COUNT(b), group;
> > > > > > > so it makes sense:
> > > > > > >         c's components are "group" and (bag) b, so:
> > > > > > >         foreach (group, b) in c generate COUNT(b), group;
> > > > > > >
> > > > > > > I would love it if the Pig syntax were extended to allow
> > > quantifiers
> > > > > like
> > > > > >  "foreach (group, b) in c" but I don't know how feasible that
> would
> > > be.
> > > > > > >
> > > > > > > William F Dowling
> > > > > > > Senior Technologist
> > > > > > > Thomson Reuters
> > > > > > >
> > > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Ashish Dobhal [mailto:dobhalashish772@gmail.com]
> > > > > > > Sent: Monday, July 21, 2014 10:34 AM
> > > > > > > To: user@pig.apache.org
> > > > > > > Subject: Re: Problem in understanding UDF COUNT
> > > > > > >
> > > > > > > Shahab Thanks
> > > > > > > My doubt is why are we taking the bag b and not  bag c as the
> > > > arguement
> > > > > > in the COUNT(b) function.
> > > > > > > The bag c contains the groups and not hte bag b.
> > > > > > > TThanks.
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Jul 21, 2014 at 6:21 PM, Shahab Yunus <
> > > > shahab.yunus@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Have you seen this documentation and blog?
> > > > > > > >
> > > > >
> > http://squarecog.wordpress.com/2010/05/11/group-operator-in-apache-pig
> > > > > > > > / http://pig.apache.org/docs/r0.9.2/func.html#count
> > > > > > > >
> > > > > > > > They explain this in detail.
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Shahab
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Jul 21, 2014 at 8:44 AM, Ashish Dobhal
> > > > > > > > <do...@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > a = load '/user/hue/word_count_text.txt'; b = foreach a
> > > generate
> > > > > > > > > flatten(TOKENIZE((chararray)$0)) as word; c = group b by
> > word;
> > > d
> > > > =
> > > > > > > > > foreach c generate COUNT(b), group;
> > > > > > > > >
> > > > > > > > > I want to know what would be the input to the udf COUNT in
> > this
> > > > > > > > > case.Also what is the meaning of b being passed as an
> > > arguement.
> > > > > > > > >
> > > > > > > > > Also I am still not clear acout how count operates.
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > > Ashish
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Problem in understanding UDF COUNT

Posted by Serega Sheypak <se...@gmail.com>.
The best way to get answers for such easy questions
1. read docs
2. create sample script and run

doc says that a group (bag of tuple having the same 'stars' value) would be
passed to your UDF.
Can't understand what confuses you. These things are really basics.


2014-07-23 16:30 GMT+04:00 Ashish Dobhal <do...@gmail.com>:

> Sorry ,
> I mean group a by stars;
>
>
> On Wed, Jul 23, 2014 at 5:58 PM, Serega Sheypak <se...@gmail.com>
> wrote:
>
> > a=load....
> > --
> > b=group movies by stars;
> > --error here movies is not an alias
> >
> > c= foreach b genearte myudf(a);
> >
> >
> >
> > 2014-07-23 16:03 GMT+04:00 Ashish Dobhal <do...@gmail.com>:
> >
> > > Thanks Shahab and William I am now clear about The count
> > functionality.But
> > > stil I have a doubt in the functioning of UDF in general.
> > > Example:
> > > a=load 'movies' using PigStorage() as (name:chararray,
> > > movid:int,stars:int,comment:varchar(300));
> > > b=group movies by stars;
> > > c= foreach b genearte myudf(a);
> > > In this case what would be the input to the udf : the entire group or a
> > > single tupple of that group.
> > > I think the input would be a single tupple of that group for each
> > > itteration but not sure.
> > > Thanks.
> > > Ashish.
> > >
> > >
> > > On Tue, Jul 22, 2014 at 5:30 PM, Shahab Yunus <sh...@gmail.com>
> > > wrote:
> > >
> > > > That is confusing and that is something that William Dowling
> explained
> > an
> > > > email blow.
> > > >
> > > > The scope of the alias b has changed. Now when used with 'for each'
> on
> > c,
> > > > the alias/variable b will be used just to count what belongs to the
> > > current
> > > > c.
> > > >
> > > > Imagine that b although is a bag of all the records but when passed
> to
> > > the
> > > > count function in 'for each c', only those items/records are filtered
> > or
> > > > counted which belong to the current c.
> > > >
> > > > Take a look at this link that I sent earlier (especially the
> age_counts
> > > > example):
> > > >
> > http://squarecog.wordpress.com/2010/05/11/group-operator-in-apache-pig/
> > > >
> > > > It does not explain everything but it is a more detailed example with
> > > > comments and perhaps would help you to understand this Pig specific
> > > > concept.
> > > >
> > > > Regards,
> > > > Shahab
> > > >
> > > >
> > > > On Tue, Jul 22, 2014 at 12:07 AM, Ashish Dobhal <
> > > dobhalashish772@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > In this case does the b refer to the tupples corresponding to a
> > single
> > > > > group. If so I still did not get the point because b is a bag that
> > > > contains
> > > > > all the records and not only the records of a single group
> > > > >
> > > > > On Jul 21, 2014 8:33 PM, <wi...@thomsonreuters.com>
> wrote:
> > > > > >
> > > > > > This was hard for me to get when I started using pig, and it
> still
> > > > annoys
> > > > > me after 1.5 year's experience with pig. In mathematics and logic,
> > > > > quantifiers (like "for each", "there exist") bind variables that
> > occur
> > > in
> > > > > their scope:
> > > > > > (for each x)(there exists y) [y > x]
> > > > > >
> > > > > > The (for each x) binds x in (there exists y) [y > x]
> > > > > >
> > > > > > But in pig the variable x in (for each x) *does not bind
> > occurrences
> > > of
> > > > > x* in the following subexpression. IMO this is an unnecessary
> > stumbling
> > > > > block to people learning pig, who have a background in math or
> logic.
> > > > > >
> > > > > > Here is how you can read
> > > > > >         foreach c generate COUNT(b), group;
> > > > > > so it makes sense:
> > > > > >         c's components are "group" and (bag) b, so:
> > > > > >         foreach (group, b) in c generate COUNT(b), group;
> > > > > >
> > > > > > I would love it if the Pig syntax were extended to allow
> > quantifiers
> > > > like
> > > > >  "foreach (group, b) in c" but I don't know how feasible that would
> > be.
> > > > > >
> > > > > > William F Dowling
> > > > > > Senior Technologist
> > > > > > Thomson Reuters
> > > > > >
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: Ashish Dobhal [mailto:dobhalashish772@gmail.com]
> > > > > > Sent: Monday, July 21, 2014 10:34 AM
> > > > > > To: user@pig.apache.org
> > > > > > Subject: Re: Problem in understanding UDF COUNT
> > > > > >
> > > > > > Shahab Thanks
> > > > > > My doubt is why are we taking the bag b and not  bag c as the
> > > arguement
> > > > > in the COUNT(b) function.
> > > > > > The bag c contains the groups and not hte bag b.
> > > > > > TThanks.
> > > > > >
> > > > > >
> > > > > > On Mon, Jul 21, 2014 at 6:21 PM, Shahab Yunus <
> > > shahab.yunus@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Have you seen this documentation and blog?
> > > > > > >
> > > >
> http://squarecog.wordpress.com/2010/05/11/group-operator-in-apache-pig
> > > > > > > / http://pig.apache.org/docs/r0.9.2/func.html#count
> > > > > > >
> > > > > > > They explain this in detail.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Shahab
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Jul 21, 2014 at 8:44 AM, Ashish Dobhal
> > > > > > > <do...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > a = load '/user/hue/word_count_text.txt'; b = foreach a
> > generate
> > > > > > > > flatten(TOKENIZE((chararray)$0)) as word; c = group b by
> word;
> > d
> > > =
> > > > > > > > foreach c generate COUNT(b), group;
> > > > > > > >
> > > > > > > > I want to know what would be the input to the udf COUNT in
> this
> > > > > > > > case.Also what is the meaning of b being passed as an
> > arguement.
> > > > > > > >
> > > > > > > > Also I am still not clear acout how count operates.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > Ashish
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Problem in understanding UDF COUNT

Posted by Ashish Dobhal <do...@gmail.com>.
Sorry ,
I mean group a by stars;


On Wed, Jul 23, 2014 at 5:58 PM, Serega Sheypak <se...@gmail.com>
wrote:

> a=load....
> --
> b=group movies by stars;
> --error here movies is not an alias
>
> c= foreach b genearte myudf(a);
>
>
>
> 2014-07-23 16:03 GMT+04:00 Ashish Dobhal <do...@gmail.com>:
>
> > Thanks Shahab and William I am now clear about The count
> functionality.But
> > stil I have a doubt in the functioning of UDF in general.
> > Example:
> > a=load 'movies' using PigStorage() as (name:chararray,
> > movid:int,stars:int,comment:varchar(300));
> > b=group movies by stars;
> > c= foreach b genearte myudf(a);
> > In this case what would be the input to the udf : the entire group or a
> > single tupple of that group.
> > I think the input would be a single tupple of that group for each
> > itteration but not sure.
> > Thanks.
> > Ashish.
> >
> >
> > On Tue, Jul 22, 2014 at 5:30 PM, Shahab Yunus <sh...@gmail.com>
> > wrote:
> >
> > > That is confusing and that is something that William Dowling explained
> an
> > > email blow.
> > >
> > > The scope of the alias b has changed. Now when used with 'for each' on
> c,
> > > the alias/variable b will be used just to count what belongs to the
> > current
> > > c.
> > >
> > > Imagine that b although is a bag of all the records but when passed to
> > the
> > > count function in 'for each c', only those items/records are filtered
> or
> > > counted which belong to the current c.
> > >
> > > Take a look at this link that I sent earlier (especially the age_counts
> > > example):
> > >
> http://squarecog.wordpress.com/2010/05/11/group-operator-in-apache-pig/
> > >
> > > It does not explain everything but it is a more detailed example with
> > > comments and perhaps would help you to understand this Pig specific
> > > concept.
> > >
> > > Regards,
> > > Shahab
> > >
> > >
> > > On Tue, Jul 22, 2014 at 12:07 AM, Ashish Dobhal <
> > dobhalashish772@gmail.com
> > > >
> > > wrote:
> > >
> > > > In this case does the b refer to the tupples corresponding to a
> single
> > > > group. If so I still did not get the point because b is a bag that
> > > contains
> > > > all the records and not only the records of a single group
> > > >
> > > > On Jul 21, 2014 8:33 PM, <wi...@thomsonreuters.com> wrote:
> > > > >
> > > > > This was hard for me to get when I started using pig, and it still
> > > annoys
> > > > me after 1.5 year's experience with pig. In mathematics and logic,
> > > > quantifiers (like "for each", "there exist") bind variables that
> occur
> > in
> > > > their scope:
> > > > > (for each x)(there exists y) [y > x]
> > > > >
> > > > > The (for each x) binds x in (there exists y) [y > x]
> > > > >
> > > > > But in pig the variable x in (for each x) *does not bind
> occurrences
> > of
> > > > x* in the following subexpression. IMO this is an unnecessary
> stumbling
> > > > block to people learning pig, who have a background in math or logic.
> > > > >
> > > > > Here is how you can read
> > > > >         foreach c generate COUNT(b), group;
> > > > > so it makes sense:
> > > > >         c's components are "group" and (bag) b, so:
> > > > >         foreach (group, b) in c generate COUNT(b), group;
> > > > >
> > > > > I would love it if the Pig syntax were extended to allow
> quantifiers
> > > like
> > > >  "foreach (group, b) in c" but I don't know how feasible that would
> be.
> > > > >
> > > > > William F Dowling
> > > > > Senior Technologist
> > > > > Thomson Reuters
> > > > >
> > > > >
> > > > > -----Original Message-----
> > > > > From: Ashish Dobhal [mailto:dobhalashish772@gmail.com]
> > > > > Sent: Monday, July 21, 2014 10:34 AM
> > > > > To: user@pig.apache.org
> > > > > Subject: Re: Problem in understanding UDF COUNT
> > > > >
> > > > > Shahab Thanks
> > > > > My doubt is why are we taking the bag b and not  bag c as the
> > arguement
> > > > in the COUNT(b) function.
> > > > > The bag c contains the groups and not hte bag b.
> > > > > TThanks.
> > > > >
> > > > >
> > > > > On Mon, Jul 21, 2014 at 6:21 PM, Shahab Yunus <
> > shahab.yunus@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Have you seen this documentation and blog?
> > > > > >
> > > http://squarecog.wordpress.com/2010/05/11/group-operator-in-apache-pig
> > > > > > / http://pig.apache.org/docs/r0.9.2/func.html#count
> > > > > >
> > > > > > They explain this in detail.
> > > > > >
> > > > > > Regards,
> > > > > > Shahab
> > > > > >
> > > > > >
> > > > > > On Mon, Jul 21, 2014 at 8:44 AM, Ashish Dobhal
> > > > > > <do...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > a = load '/user/hue/word_count_text.txt'; b = foreach a
> generate
> > > > > > > flatten(TOKENIZE((chararray)$0)) as word; c = group b by word;
> d
> > =
> > > > > > > foreach c generate COUNT(b), group;
> > > > > > >
> > > > > > > I want to know what would be the input to the udf COUNT in this
> > > > > > > case.Also what is the meaning of b being passed as an
> arguement.
> > > > > > >
> > > > > > > Also I am still not clear acout how count operates.
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > Ashish
> > > > > > >
> > > > > >
> > > >
> > >
> >
>

Re: Problem in understanding UDF COUNT

Posted by Serega Sheypak <se...@gmail.com>.
a=load....
--
b=group movies by stars;
--error here movies is not an alias

c= foreach b genearte myudf(a);



2014-07-23 16:03 GMT+04:00 Ashish Dobhal <do...@gmail.com>:

> Thanks Shahab and William I am now clear about The count functionality.But
> stil I have a doubt in the functioning of UDF in general.
> Example:
> a=load 'movies' using PigStorage() as (name:chararray,
> movid:int,stars:int,comment:varchar(300));
> b=group movies by stars;
> c= foreach b genearte myudf(a);
> In this case what would be the input to the udf : the entire group or a
> single tupple of that group.
> I think the input would be a single tupple of that group for each
> itteration but not sure.
> Thanks.
> Ashish.
>
>
> On Tue, Jul 22, 2014 at 5:30 PM, Shahab Yunus <sh...@gmail.com>
> wrote:
>
> > That is confusing and that is something that William Dowling explained an
> > email blow.
> >
> > The scope of the alias b has changed. Now when used with 'for each' on c,
> > the alias/variable b will be used just to count what belongs to the
> current
> > c.
> >
> > Imagine that b although is a bag of all the records but when passed to
> the
> > count function in 'for each c', only those items/records are filtered or
> > counted which belong to the current c.
> >
> > Take a look at this link that I sent earlier (especially the age_counts
> > example):
> > http://squarecog.wordpress.com/2010/05/11/group-operator-in-apache-pig/
> >
> > It does not explain everything but it is a more detailed example with
> > comments and perhaps would help you to understand this Pig specific
> > concept.
> >
> > Regards,
> > Shahab
> >
> >
> > On Tue, Jul 22, 2014 at 12:07 AM, Ashish Dobhal <
> dobhalashish772@gmail.com
> > >
> > wrote:
> >
> > > In this case does the b refer to the tupples corresponding to a single
> > > group. If so I still did not get the point because b is a bag that
> > contains
> > > all the records and not only the records of a single group
> > >
> > > On Jul 21, 2014 8:33 PM, <wi...@thomsonreuters.com> wrote:
> > > >
> > > > This was hard for me to get when I started using pig, and it still
> > annoys
> > > me after 1.5 year's experience with pig. In mathematics and logic,
> > > quantifiers (like "for each", "there exist") bind variables that occur
> in
> > > their scope:
> > > > (for each x)(there exists y) [y > x]
> > > >
> > > > The (for each x) binds x in (there exists y) [y > x]
> > > >
> > > > But in pig the variable x in (for each x) *does not bind occurrences
> of
> > > x* in the following subexpression. IMO this is an unnecessary stumbling
> > > block to people learning pig, who have a background in math or logic.
> > > >
> > > > Here is how you can read
> > > >         foreach c generate COUNT(b), group;
> > > > so it makes sense:
> > > >         c's components are "group" and (bag) b, so:
> > > >         foreach (group, b) in c generate COUNT(b), group;
> > > >
> > > > I would love it if the Pig syntax were extended to allow quantifiers
> > like
> > >  "foreach (group, b) in c" but I don't know how feasible that would be.
> > > >
> > > > William F Dowling
> > > > Senior Technologist
> > > > Thomson Reuters
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Ashish Dobhal [mailto:dobhalashish772@gmail.com]
> > > > Sent: Monday, July 21, 2014 10:34 AM
> > > > To: user@pig.apache.org
> > > > Subject: Re: Problem in understanding UDF COUNT
> > > >
> > > > Shahab Thanks
> > > > My doubt is why are we taking the bag b and not  bag c as the
> arguement
> > > in the COUNT(b) function.
> > > > The bag c contains the groups and not hte bag b.
> > > > TThanks.
> > > >
> > > >
> > > > On Mon, Jul 21, 2014 at 6:21 PM, Shahab Yunus <
> shahab.yunus@gmail.com>
> > > > wrote:
> > > >
> > > > > Have you seen this documentation and blog?
> > > > >
> > http://squarecog.wordpress.com/2010/05/11/group-operator-in-apache-pig
> > > > > / http://pig.apache.org/docs/r0.9.2/func.html#count
> > > > >
> > > > > They explain this in detail.
> > > > >
> > > > > Regards,
> > > > > Shahab
> > > > >
> > > > >
> > > > > On Mon, Jul 21, 2014 at 8:44 AM, Ashish Dobhal
> > > > > <do...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > a = load '/user/hue/word_count_text.txt'; b = foreach a generate
> > > > > > flatten(TOKENIZE((chararray)$0)) as word; c = group b by word; d
> =
> > > > > > foreach c generate COUNT(b), group;
> > > > > >
> > > > > > I want to know what would be the input to the udf COUNT in this
> > > > > > case.Also what is the meaning of b being passed as an arguement.
> > > > > >
> > > > > > Also I am still not clear acout how count operates.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > Ashish
> > > > > >
> > > > >
> > >
> >
>

Re: Problem in understanding UDF COUNT

Posted by Ashish Dobhal <do...@gmail.com>.
Thanks Shahab and William I am now clear about The count functionality.But
stil I have a doubt in the functioning of UDF in general.
Example:
a=load 'movies' using PigStorage() as (name:chararray,
movid:int,stars:int,comment:varchar(300));
b=group movies by stars;
c= foreach b genearte myudf(a);
In this case what would be the input to the udf : the entire group or a
single tupple of that group.
I think the input would be a single tupple of that group for each
itteration but not sure.
Thanks.
Ashish.


On Tue, Jul 22, 2014 at 5:30 PM, Shahab Yunus <sh...@gmail.com>
wrote:

> That is confusing and that is something that William Dowling explained an
> email blow.
>
> The scope of the alias b has changed. Now when used with 'for each' on c,
> the alias/variable b will be used just to count what belongs to the current
> c.
>
> Imagine that b although is a bag of all the records but when passed to the
> count function in 'for each c', only those items/records are filtered or
> counted which belong to the current c.
>
> Take a look at this link that I sent earlier (especially the age_counts
> example):
> http://squarecog.wordpress.com/2010/05/11/group-operator-in-apache-pig/
>
> It does not explain everything but it is a more detailed example with
> comments and perhaps would help you to understand this Pig specific
> concept.
>
> Regards,
> Shahab
>
>
> On Tue, Jul 22, 2014 at 12:07 AM, Ashish Dobhal <dobhalashish772@gmail.com
> >
> wrote:
>
> > In this case does the b refer to the tupples corresponding to a single
> > group. If so I still did not get the point because b is a bag that
> contains
> > all the records and not only the records of a single group
> >
> > On Jul 21, 2014 8:33 PM, <wi...@thomsonreuters.com> wrote:
> > >
> > > This was hard for me to get when I started using pig, and it still
> annoys
> > me after 1.5 year's experience with pig. In mathematics and logic,
> > quantifiers (like "for each", "there exist") bind variables that occur in
> > their scope:
> > > (for each x)(there exists y) [y > x]
> > >
> > > The (for each x) binds x in (there exists y) [y > x]
> > >
> > > But in pig the variable x in (for each x) *does not bind occurrences of
> > x* in the following subexpression. IMO this is an unnecessary stumbling
> > block to people learning pig, who have a background in math or logic.
> > >
> > > Here is how you can read
> > >         foreach c generate COUNT(b), group;
> > > so it makes sense:
> > >         c's components are "group" and (bag) b, so:
> > >         foreach (group, b) in c generate COUNT(b), group;
> > >
> > > I would love it if the Pig syntax were extended to allow quantifiers
> like
> >  "foreach (group, b) in c" but I don't know how feasible that would be.
> > >
> > > William F Dowling
> > > Senior Technologist
> > > Thomson Reuters
> > >
> > >
> > > -----Original Message-----
> > > From: Ashish Dobhal [mailto:dobhalashish772@gmail.com]
> > > Sent: Monday, July 21, 2014 10:34 AM
> > > To: user@pig.apache.org
> > > Subject: Re: Problem in understanding UDF COUNT
> > >
> > > Shahab Thanks
> > > My doubt is why are we taking the bag b and not  bag c as the arguement
> > in the COUNT(b) function.
> > > The bag c contains the groups and not hte bag b.
> > > TThanks.
> > >
> > >
> > > On Mon, Jul 21, 2014 at 6:21 PM, Shahab Yunus <sh...@gmail.com>
> > > wrote:
> > >
> > > > Have you seen this documentation and blog?
> > > >
> http://squarecog.wordpress.com/2010/05/11/group-operator-in-apache-pig
> > > > / http://pig.apache.org/docs/r0.9.2/func.html#count
> > > >
> > > > They explain this in detail.
> > > >
> > > > Regards,
> > > > Shahab
> > > >
> > > >
> > > > On Mon, Jul 21, 2014 at 8:44 AM, Ashish Dobhal
> > > > <do...@gmail.com>
> > > > wrote:
> > > >
> > > > > a = load '/user/hue/word_count_text.txt'; b = foreach a generate
> > > > > flatten(TOKENIZE((chararray)$0)) as word; c = group b by word; d =
> > > > > foreach c generate COUNT(b), group;
> > > > >
> > > > > I want to know what would be the input to the udf COUNT in this
> > > > > case.Also what is the meaning of b being passed as an arguement.
> > > > >
> > > > > Also I am still not clear acout how count operates.
> > > > >
> > > > > Thanks
> > > > >
> > > > > Ashish
> > > > >
> > > >
> >
>

Re: Problem in understanding UDF COUNT

Posted by Shahab Yunus <sh...@gmail.com>.
That is confusing and that is something that William Dowling explained an
email blow.

The scope of the alias b has changed. Now when used with 'for each' on c,
the alias/variable b will be used just to count what belongs to the current
c.

Imagine that b although is a bag of all the records but when passed to the
count function in 'for each c', only those items/records are filtered or
counted which belong to the current c.

Take a look at this link that I sent earlier (especially the age_counts
example):
http://squarecog.wordpress.com/2010/05/11/group-operator-in-apache-pig/

It does not explain everything but it is a more detailed example with
comments and perhaps would help you to understand this Pig specific concept.

Regards,
Shahab


On Tue, Jul 22, 2014 at 12:07 AM, Ashish Dobhal <do...@gmail.com>
wrote:

> In this case does the b refer to the tupples corresponding to a single
> group. If so I still did not get the point because b is a bag that contains
> all the records and not only the records of a single group
>
> On Jul 21, 2014 8:33 PM, <wi...@thomsonreuters.com> wrote:
> >
> > This was hard for me to get when I started using pig, and it still annoys
> me after 1.5 year's experience with pig. In mathematics and logic,
> quantifiers (like "for each", "there exist") bind variables that occur in
> their scope:
> > (for each x)(there exists y) [y > x]
> >
> > The (for each x) binds x in (there exists y) [y > x]
> >
> > But in pig the variable x in (for each x) *does not bind occurrences of
> x* in the following subexpression. IMO this is an unnecessary stumbling
> block to people learning pig, who have a background in math or logic.
> >
> > Here is how you can read
> >         foreach c generate COUNT(b), group;
> > so it makes sense:
> >         c's components are "group" and (bag) b, so:
> >         foreach (group, b) in c generate COUNT(b), group;
> >
> > I would love it if the Pig syntax were extended to allow quantifiers like
>  "foreach (group, b) in c" but I don't know how feasible that would be.
> >
> > William F Dowling
> > Senior Technologist
> > Thomson Reuters
> >
> >
> > -----Original Message-----
> > From: Ashish Dobhal [mailto:dobhalashish772@gmail.com]
> > Sent: Monday, July 21, 2014 10:34 AM
> > To: user@pig.apache.org
> > Subject: Re: Problem in understanding UDF COUNT
> >
> > Shahab Thanks
> > My doubt is why are we taking the bag b and not  bag c as the arguement
> in the COUNT(b) function.
> > The bag c contains the groups and not hte bag b.
> > TThanks.
> >
> >
> > On Mon, Jul 21, 2014 at 6:21 PM, Shahab Yunus <sh...@gmail.com>
> > wrote:
> >
> > > Have you seen this documentation and blog?
> > > http://squarecog.wordpress.com/2010/05/11/group-operator-in-apache-pig
> > > / http://pig.apache.org/docs/r0.9.2/func.html#count
> > >
> > > They explain this in detail.
> > >
> > > Regards,
> > > Shahab
> > >
> > >
> > > On Mon, Jul 21, 2014 at 8:44 AM, Ashish Dobhal
> > > <do...@gmail.com>
> > > wrote:
> > >
> > > > a = load '/user/hue/word_count_text.txt'; b = foreach a generate
> > > > flatten(TOKENIZE((chararray)$0)) as word; c = group b by word; d =
> > > > foreach c generate COUNT(b), group;
> > > >
> > > > I want to know what would be the input to the udf COUNT in this
> > > > case.Also what is the meaning of b being passed as an arguement.
> > > >
> > > > Also I am still not clear acout how count operates.
> > > >
> > > > Thanks
> > > >
> > > > Ashish
> > > >
> > >
>

RE: Problem in understanding UDF COUNT

Posted by Ashish Dobhal <do...@gmail.com>.
In this case does the b refer to the tupples corresponding to a single
group. If so I still did not get the point because b is a bag that contains
all the records and not only the records of a single group

On Jul 21, 2014 8:33 PM, <wi...@thomsonreuters.com> wrote:
>
> This was hard for me to get when I started using pig, and it still annoys
me after 1.5 year's experience with pig. In mathematics and logic,
quantifiers (like "for each", "there exist") bind variables that occur in
their scope:
> (for each x)(there exists y) [y > x]
>
> The (for each x) binds x in (there exists y) [y > x]
>
> But in pig the variable x in (for each x) *does not bind occurrences of
x* in the following subexpression. IMO this is an unnecessary stumbling
block to people learning pig, who have a background in math or logic.
>
> Here is how you can read
>         foreach c generate COUNT(b), group;
> so it makes sense:
>         c's components are "group" and (bag) b, so:
>         foreach (group, b) in c generate COUNT(b), group;
>
> I would love it if the Pig syntax were extended to allow quantifiers like
 "foreach (group, b) in c" but I don't know how feasible that would be.
>
> William F Dowling
> Senior Technologist
> Thomson Reuters
>
>
> -----Original Message-----
> From: Ashish Dobhal [mailto:dobhalashish772@gmail.com]
> Sent: Monday, July 21, 2014 10:34 AM
> To: user@pig.apache.org
> Subject: Re: Problem in understanding UDF COUNT
>
> Shahab Thanks
> My doubt is why are we taking the bag b and not  bag c as the arguement
in the COUNT(b) function.
> The bag c contains the groups and not hte bag b.
> TThanks.
>
>
> On Mon, Jul 21, 2014 at 6:21 PM, Shahab Yunus <sh...@gmail.com>
> wrote:
>
> > Have you seen this documentation and blog?
> > http://squarecog.wordpress.com/2010/05/11/group-operator-in-apache-pig
> > / http://pig.apache.org/docs/r0.9.2/func.html#count
> >
> > They explain this in detail.
> >
> > Regards,
> > Shahab
> >
> >
> > On Mon, Jul 21, 2014 at 8:44 AM, Ashish Dobhal
> > <do...@gmail.com>
> > wrote:
> >
> > > a = load '/user/hue/word_count_text.txt'; b = foreach a generate
> > > flatten(TOKENIZE((chararray)$0)) as word; c = group b by word; d =
> > > foreach c generate COUNT(b), group;
> > >
> > > I want to know what would be the input to the udf COUNT in this
> > > case.Also what is the meaning of b being passed as an arguement.
> > >
> > > Also I am still not clear acout how count operates.
> > >
> > > Thanks
> > >
> > > Ashish
> > >
> >

Re: Problem in understanding UDF COUNT

Posted by Shahab Yunus <sh...@gmail.com>.
Ashish,

*d = foreach c generate COUNT(b), group;*

I interpret or visualize is as:

c is a structure holding or consisting of groups of words or items. Imagine
a list where each entry is the groupid and each groupid points to a
collection of objects/items belonging to that same groupid. We can call
this collection b. You can also imagine c as a nested map, where the key is
distinct groupids and the value is a collection of items (again, let us
call it b) belonging to one key.

So, now you want to count how many items exist for for each groupid in list
(or map) c. Recall that we are calling group of items for each value of c
as b.

c[0]=new york points to  [1,2,3]
c[1]=philadelphia points to  [1,2,3,4]
c[2]=boston points to  [5,6,7,8,9]

So in the above example in the c list we have 3 unique gropuids (new york,
boston and philadelphia) and each point to its own collection of items that
we are calling b. We want to know the count for each group, which is 3,4 &
5 for new york, philadelphia & boston respectively.

Now coming back to the pig statement once again:
*d = foreach c generate COUNT(b), group;*

This is exactly what we are doing....
*Counting for each c (new york, philadelphia, boston in out example), how
many b's are in there (3,4 & 5).*

The second argument to the pig statement of 'group' will give us the group
id (the c's) for each count of b as well.

Regards,
Shahab




On Mon, Jul 21, 2014 at 11:02 AM, <wi...@thomsonreuters.com>
wrote:

> This was hard for me to get when I started using pig, and it still annoys
> me after 1.5 year's experience with pig. In mathematics and logic,
> quantifiers (like "for each", "there exist") bind variables that occur in
> their scope:
> (for each x)(there exists y) [y > x]
>
> The (for each x) binds x in (there exists y) [y > x]
>
> But in pig the variable x in (for each x) *does not bind occurrences of x*
> in the following subexpression. IMO this is an unnecessary stumbling block
> to people learning pig, who have a background in math or logic.
>
> Here is how you can read
>         foreach c generate COUNT(b), group;
> so it makes sense:
>         c's components are "group" and (bag) b, so:
>         foreach (group, b) in c generate COUNT(b), group;
>
> I would love it if the Pig syntax were extended to allow quantifiers like
>  "foreach (group, b) in c" but I don't know how feasible that would be.
>
> William F Dowling
> Senior Technologist
> Thomson Reuters
>
>
> -----Original Message-----
> From: Ashish Dobhal [mailto:dobhalashish772@gmail.com]
> Sent: Monday, July 21, 2014 10:34 AM
> To: user@pig.apache.org
> Subject: Re: Problem in understanding UDF COUNT
>
> Shahab Thanks
> My doubt is why are we taking the bag b and not  bag c as the arguement in
> the COUNT(b) function.
> The bag c contains the groups and not hte bag b.
> TThanks.
>
>
> On Mon, Jul 21, 2014 at 6:21 PM, Shahab Yunus <sh...@gmail.com>
> wrote:
>
> > Have you seen this documentation and blog?
> > http://squarecog.wordpress.com/2010/05/11/group-operator-in-apache-pig
> > / http://pig.apache.org/docs/r0.9.2/func.html#count
> >
> > They explain this in detail.
> >
> > Regards,
> > Shahab
> >
> >
> > On Mon, Jul 21, 2014 at 8:44 AM, Ashish Dobhal
> > <do...@gmail.com>
> > wrote:
> >
> > > a = load '/user/hue/word_count_text.txt'; b = foreach a generate
> > > flatten(TOKENIZE((chararray)$0)) as word; c = group b by word; d =
> > > foreach c generate COUNT(b), group;
> > >
> > > I want to know what would be the input to the udf COUNT in this
> > > case.Also what is the meaning of b being passed as an arguement.
> > >
> > > Also I am still not clear acout how count operates.
> > >
> > > Thanks
> > >
> > > Ashish
> > >
> >
>

RE: Problem in understanding UDF COUNT

Posted by wi...@thomsonreuters.com.
This was hard for me to get when I started using pig, and it still annoys me after 1.5 year's experience with pig. In mathematics and logic, quantifiers (like "for each", "there exist") bind variables that occur in their scope:
(for each x)(there exists y) [y > x]

The (for each x) binds x in (there exists y) [y > x]

But in pig the variable x in (for each x) *does not bind occurrences of x* in the following subexpression. IMO this is an unnecessary stumbling block to people learning pig, who have a background in math or logic.

Here is how you can read
	foreach c generate COUNT(b), group;
so it makes sense:
	c's components are "group" and (bag) b, so:
	foreach (group, b) in c generate COUNT(b), group;

I would love it if the Pig syntax were extended to allow quantifiers like  "foreach (group, b) in c" but I don't know how feasible that would be.

William F Dowling
Senior Technologist
Thomson Reuters


-----Original Message-----
From: Ashish Dobhal [mailto:dobhalashish772@gmail.com] 
Sent: Monday, July 21, 2014 10:34 AM
To: user@pig.apache.org
Subject: Re: Problem in understanding UDF COUNT

Shahab Thanks
My doubt is why are we taking the bag b and not  bag c as the arguement in the COUNT(b) function.
The bag c contains the groups and not hte bag b.
TThanks.


On Mon, Jul 21, 2014 at 6:21 PM, Shahab Yunus <sh...@gmail.com>
wrote:

> Have you seen this documentation and blog?
> http://squarecog.wordpress.com/2010/05/11/group-operator-in-apache-pig
> / http://pig.apache.org/docs/r0.9.2/func.html#count
>
> They explain this in detail.
>
> Regards,
> Shahab
>
>
> On Mon, Jul 21, 2014 at 8:44 AM, Ashish Dobhal 
> <do...@gmail.com>
> wrote:
>
> > a = load '/user/hue/word_count_text.txt'; b = foreach a generate 
> > flatten(TOKENIZE((chararray)$0)) as word; c = group b by word; d = 
> > foreach c generate COUNT(b), group;
> >
> > I want to know what would be the input to the udf COUNT in this 
> > case.Also what is the meaning of b being passed as an arguement.
> >
> > Also I am still not clear acout how count operates.
> >
> > Thanks
> >
> > Ashish
> >
>

Re: Problem in understanding UDF COUNT

Posted by Ashish Dobhal <do...@gmail.com>.
Shahab Thanks
My doubt is why are we taking the bag b and not  bag c as the arguement in
the COUNT(b) function.
The bag c contains the groups and not hte bag b.
TThanks.


On Mon, Jul 21, 2014 at 6:21 PM, Shahab Yunus <sh...@gmail.com>
wrote:

> Have you seen this documentation and blog?
> http://squarecog.wordpress.com/2010/05/11/group-operator-in-apache-pig/
> http://pig.apache.org/docs/r0.9.2/func.html#count
>
> They explain this in detail.
>
> Regards,
> Shahab
>
>
> On Mon, Jul 21, 2014 at 8:44 AM, Ashish Dobhal <do...@gmail.com>
> wrote:
>
> > a = load '/user/hue/word_count_text.txt';
> > b = foreach a generate flatten(TOKENIZE((chararray)$0)) as word;
> > c = group b by word;
> > d = foreach c generate COUNT(b), group;
> >
> > I want to know what would be the input to the udf COUNT in this
> > case.Also what is the meaning of b being passed as an arguement.
> >
> > Also I am still not clear acout how count operates.
> >
> > Thanks
> >
> > Ashish
> >
>

Re: Problem in understanding UDF COUNT

Posted by Shahab Yunus <sh...@gmail.com>.
Have you seen this documentation and blog?
http://squarecog.wordpress.com/2010/05/11/group-operator-in-apache-pig/
http://pig.apache.org/docs/r0.9.2/func.html#count

They explain this in detail.

Regards,
Shahab


On Mon, Jul 21, 2014 at 8:44 AM, Ashish Dobhal <do...@gmail.com>
wrote:

> a = load '/user/hue/word_count_text.txt';
> b = foreach a generate flatten(TOKENIZE((chararray)$0)) as word;
> c = group b by word;
> d = foreach c generate COUNT(b), group;
>
> I want to know what would be the input to the udf COUNT in this
> case.Also what is the meaning of b being passed as an arguement.
>
> Also I am still not clear acout how count operates.
>
> Thanks
>
> Ashish
>