You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by deepak kumar v <de...@gmail.com> on 2011/03/08 06:55:57 UTC

PIG-671

Hi Pig Developers,
This is my first dive into open source contribution and i hope to dive deep.

I was going through https://issues.apache.org/jira/browse/PIG-671 and
observed the following with COUNT.java

COUNT.exec() always retrieves the first item from input tuple which it
assumes is a bag and counts the numbers of items in the bag.
Even if we pass multiple arguments to COUNT(), it will always pick the first
argument.

There are few ways we go through this
a) Leave as is cause it returns correct result for counting the number of
items in the first argument.
OR
b) Make a check for the size of the input tuple in COUNT.exec() and if it is
not 1 then throw ExecException()  or IllegalArgumentException {might be
correct}
which will cause the Map job to fail.

Let me know how to we go about it.


Regards,
Deepak

Re: PIG-671

Posted by deepak kumar v <de...@gmail.com>.
Hi Dmitriy,
Thanks for the help.

I was able to set up the project in eclipse after playing a bit with build
path. I am now trying to run unit test cases (TestBuiltin.java) with eclipse
so that debugging becomes easy and i can walk through bit of code while
running test cases in debug mode. (It runs from cmd line) However it fails
after sometime. PFA the exceptions.

Any help.

Regards,
Deepak

On Wed, Mar 9, 2011 at 9:52 AM, Dmitriy Ryaboy <dv...@gmail.com> wrote:

> Deepak,
> Don't create from Ant Buildfile. Just import an existing project (create
> new java project -> uncheck "create in default location" -> point it at
> where you have the project checked out and where you ran ant eclipse-files).
> You might need to change the build path a bit, adding / removing jars and
> all that, but it should be pretty straightforward from there.
>
> D
>
> On Tue, Mar 8, 2011 at 7:43 PM, deepak kumar v <de...@gmail.com>wrote:
>
>> Hi,
>> Sure, will do that.
>>
>> Meanwhile, I tried to import pig src into eclipse using New project wizard
>> with "Create a Java Project from Ant Buildfile" option, i see the following
>> error
>> "Reference ${cp} not found" and it fails to import src code. A empty New
>> project "Pig" is created as a result.
>>
>> Before trying above i ran
>>
>> ant eclipse-files
>>
>> which ran to successful.
>>
>>
>> Regards,
>>
>> Deepak
>>
>>
>> On Wed, Mar 9, 2011 at 1:37 AM, Dmitriy Ryaboy <dv...@gmail.com>wrote:
>>
>>> Nice work.
>>>
>>> You are going to want to make sure COUNT also works on the scenarios it's
>>> supposed to work on. So far you only seem to be testing failures?
>>>
>>> Also, write it up as proper unit tests so we don't get regressions.
>>>
>>> D
>>>
>>>
>>> On Tue, Mar 8, 2011 at 10:40 AM, deepak kumar v <de...@gmail.com>wrote:
>>>
>>>> Hi Dmitriy,
>>>> Will checkout TestBuiltins.java once my eclipse setup is ready.
>>>> Meanwhile i tried the couple of scenarios that you mentioned.
>>>>
>>>> 1) Schema defined for a
>>>> grunt> a = load 'test.txt' as (data:chararray);
>>>>  grunt> b = group a all;
>>>> grunt> describe a;
>>>> a: {data: chararray}
>>>> grunt> describe b;
>>>> b: {group: chararray,a: {(data: chararray)}}
>>>> grunt> x = foreach b generate COUNT(a.data, a.data);
>>>> grunt> dump x;
>>>> 2011-03-09 00:06:40,953 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>>>> ERROR 1045: Could not infer the matching function for
>>>> org.apache.pig.builtin.COUNT as multiple or none of them fit. Please use an
>>>> explicit cast.
>>>>
>>>> 2) Schema not defined for a
>>>> grunt> a = load 'test.txt';
>>>> grunt> b = group a all;
>>>> grunt> describe a;
>>>> Schema for a unknown.
>>>> grunt> describe b;
>>>> b: {group: chararray,a: {(null)}}
>>>> grunt> x = foreach b generate COUNT(a.$0, a.$0);
>>>> grunt> dump x;
>>>> 2011-03-09 00:07:58,715 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>>>> ERROR 1045: Could not infer the matching function for
>>>> org.apache.pig.builtin.COUNT as multiple or none of them fit. Please use an
>>>> explicit cast.
>>>>
>>>>
>>>> Changes seems to be working with both scenarios.
>>>>
>>>> Regards,
>>>> Deepak
>>>>
>>>>
>>>>
>>>> On Tue, Mar 8, 2011 at 10:45 PM, Dmitriy Ryaboy <dv...@gmail.com>wrote:
>>>>
>>>>> ant test doesn't hang, it just runs for a very long time. If you want
>>>>> to test something specific, you can name the test class like so:
>>>>>
>>>>> ant test -Dtestcase=TestBuiltins
>>>>> (this will run the tests in TestBuiltins.java)
>>>>>
>>>>> COUNT tests are probably in TestBuiltins or in TestAlgebraic. Look
>>>>> around.
>>>>>
>>>>> You definitely want to add some tests to make sure that COUNT still
>>>>> works on the cases where it's supposed to work, and that the Pig parser no
>>>>> longer allows COUNT with the wrong number or type of arguments.
>>>>>
>>>>> I would test in particular what happens when a bag is supplied for
>>>>> which a schema is known -- Pig might be making a distinction between a bag
>>>>> with a known schema and a bag with an unknown schema, and we definitely want
>>>>> both of those to work.
>>>>>
>>>>> D
>>>>>
>>>>>
>>>>> On Tue, Mar 8, 2011 at 1:58 AM, deepak kumar v <de...@gmail.com>wrote:
>>>>>
>>>>>> Hi,
>>>>>> PFA patch of fix for PIG-671. Used the approach mentioned in previous
>>>>>> email.
>>>>>> I could not find any test cases for Count.java, besides ant test just
>>>>>> hung up.
>>>>>>
>>>>>> Output:
>>>>>> grunt> a = load 'test.txt';
>>>>>> grunt> x = foreach a generate COUNT(a.$0,a.$0);
>>>>>> grunt> dump x;
>>>>>> 2011-03-08 14:45:03,408 [main] ERROR org.apache.pig.tools.grunt.Grunt
>>>>>> - ERROR 1045: Could not infer the matching function for
>>>>>> org.apache.pig.builtin.COUNT as multiple or none of them fit. Please use an
>>>>>> explicit cast.
>>>>>> Details at logfile:
>>>>>> /Users/deepakkv/Documents/opensource/pig/working/pig_1299575686422.log
>>>>>> grunt> b = group a all;
>>>>>> grunt> x = foreach b generate COUNT(a.$0,a.$0);
>>>>>> grunt> dump x;
>>>>>> 2011-03-08 14:45:19,668 [main] ERROR org.apache.pig.tools.grunt.Grunt
>>>>>> - ERROR 1045: Could not infer the matching function for
>>>>>> org.apache.pig.builtin.COUNT as multiple or none of them fit. Please use an
>>>>>> explicit cast.
>>>>>> Details at logfile:
>>>>>> /Users/deepakkv/Documents/opensource/pig/working/pig_1299575686422.log
>>>>>> grunt> quit
>>>>>>
>>>>>> Regards,
>>>>>> Deepak
>>>>>>
>>>>>> On Tue, Mar 8, 2011 at 12:12 PM, deepak kumar v <de...@gmail.com>wrote:
>>>>>>
>>>>>>> Hi Dmitriy,
>>>>>>>
>>>>>>> I was looking SUBSTRING.java and thats exactly(getArgToFuncMapping)
>>>>>>> what i am trying now with COUNT.
>>>>>>> Waiting for the build to complete and test out my changes before i
>>>>>>> could post this option.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Deepak
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Mar 8, 2011 at 11:56 AM, Dmitriy Ryaboy <dv...@gmail.com>wrote:
>>>>>>>
>>>>>>>> Actually I think if you just implement getArgToFuncMapping for
>>>>>>>> COUNT, where you only return a mapping for a single bag argument, pig will
>>>>>>>> notice that the wrong number of args is supplied during the compilation
>>>>>>>> phase and no runtime exceptions will be required.
>>>>>>>>
>>>>>>>> I haven't checked how well the funcSpec mapping works with Bags,
>>>>>>>> that's something to experiment with.
>>>>>>>>
>>>>>>>> D
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Mar 7, 2011 at 9:55 PM, deepak kumar v <deepu.pig@gmail.com
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> Hi Pig Developers,
>>>>>>>>> This is my first dive into open source contribution and i hope to
>>>>>>>>> dive deep.
>>>>>>>>>
>>>>>>>>> I was going through https://issues.apache.org/jira/browse/PIG-671and
>>>>>>>>> observed the following with COUNT.java
>>>>>>>>>
>>>>>>>>> COUNT.exec() always retrieves the first item from input tuple which
>>>>>>>>> it
>>>>>>>>> assumes is a bag and counts the numbers of items in the bag.
>>>>>>>>> Even if we pass multiple arguments to COUNT(), it will always pick
>>>>>>>>> the first
>>>>>>>>> argument.
>>>>>>>>>
>>>>>>>>> There are few ways we go through this
>>>>>>>>> a) Leave as is cause it returns correct result for counting the
>>>>>>>>> number of
>>>>>>>>> items in the first argument.
>>>>>>>>> OR
>>>>>>>>> b) Make a check for the size of the input tuple in COUNT.exec() and
>>>>>>>>> if it is
>>>>>>>>> not 1 then throw ExecException()  or IllegalArgumentException
>>>>>>>>> {might be
>>>>>>>>> correct}
>>>>>>>>> which will cause the Map job to fail.
>>>>>>>>>
>>>>>>>>> Let me know how to we go about it.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Deepak
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: PIG-671

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Deepak,
Don't create from Ant Buildfile. Just import an existing project (create new
java project -> uncheck "create in default location" -> point it at where
you have the project checked out and where you ran ant eclipse-files). You
might need to change the build path a bit, adding / removing jars and all
that, but it should be pretty straightforward from there.

D

On Tue, Mar 8, 2011 at 7:43 PM, deepak kumar v <de...@gmail.com> wrote:

> Hi,
> Sure, will do that.
>
> Meanwhile, I tried to import pig src into eclipse using New project wizard
> with "Create a Java Project from Ant Buildfile" option, i see the following
> error
> "Reference ${cp} not found" and it fails to import src code. A empty New
> project "Pig" is created as a result.
>
> Before trying above i ran
>
> ant eclipse-files
>
> which ran to successful.
>
>
> Regards,
>
> Deepak
>
>
> On Wed, Mar 9, 2011 at 1:37 AM, Dmitriy Ryaboy <dv...@gmail.com> wrote:
>
>> Nice work.
>>
>> You are going to want to make sure COUNT also works on the scenarios it's
>> supposed to work on. So far you only seem to be testing failures?
>>
>> Also, write it up as proper unit tests so we don't get regressions.
>>
>> D
>>
>>
>> On Tue, Mar 8, 2011 at 10:40 AM, deepak kumar v <de...@gmail.com>wrote:
>>
>>> Hi Dmitriy,
>>> Will checkout TestBuiltins.java once my eclipse setup is ready.
>>> Meanwhile i tried the couple of scenarios that you mentioned.
>>>
>>> 1) Schema defined for a
>>> grunt> a = load 'test.txt' as (data:chararray);
>>>  grunt> b = group a all;
>>> grunt> describe a;
>>> a: {data: chararray}
>>> grunt> describe b;
>>> b: {group: chararray,a: {(data: chararray)}}
>>> grunt> x = foreach b generate COUNT(a.data, a.data);
>>> grunt> dump x;
>>> 2011-03-09 00:06:40,953 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>>> ERROR 1045: Could not infer the matching function for
>>> org.apache.pig.builtin.COUNT as multiple or none of them fit. Please use an
>>> explicit cast.
>>>
>>> 2) Schema not defined for a
>>> grunt> a = load 'test.txt';
>>> grunt> b = group a all;
>>> grunt> describe a;
>>> Schema for a unknown.
>>> grunt> describe b;
>>> b: {group: chararray,a: {(null)}}
>>> grunt> x = foreach b generate COUNT(a.$0, a.$0);
>>> grunt> dump x;
>>> 2011-03-09 00:07:58,715 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>>> ERROR 1045: Could not infer the matching function for
>>> org.apache.pig.builtin.COUNT as multiple or none of them fit. Please use an
>>> explicit cast.
>>>
>>>
>>> Changes seems to be working with both scenarios.
>>>
>>> Regards,
>>> Deepak
>>>
>>>
>>>
>>> On Tue, Mar 8, 2011 at 10:45 PM, Dmitriy Ryaboy <dv...@gmail.com>wrote:
>>>
>>>> ant test doesn't hang, it just runs for a very long time. If you want to
>>>> test something specific, you can name the test class like so:
>>>>
>>>> ant test -Dtestcase=TestBuiltins
>>>> (this will run the tests in TestBuiltins.java)
>>>>
>>>> COUNT tests are probably in TestBuiltins or in TestAlgebraic. Look
>>>> around.
>>>>
>>>> You definitely want to add some tests to make sure that COUNT still
>>>> works on the cases where it's supposed to work, and that the Pig parser no
>>>> longer allows COUNT with the wrong number or type of arguments.
>>>>
>>>> I would test in particular what happens when a bag is supplied for which
>>>> a schema is known -- Pig might be making a distinction between a bag with a
>>>> known schema and a bag with an unknown schema, and we definitely want both
>>>> of those to work.
>>>>
>>>> D
>>>>
>>>>
>>>> On Tue, Mar 8, 2011 at 1:58 AM, deepak kumar v <de...@gmail.com>wrote:
>>>>
>>>>> Hi,
>>>>> PFA patch of fix for PIG-671. Used the approach mentioned in previous
>>>>> email.
>>>>> I could not find any test cases for Count.java, besides ant test just
>>>>> hung up.
>>>>>
>>>>> Output:
>>>>> grunt> a = load 'test.txt';
>>>>> grunt> x = foreach a generate COUNT(a.$0,a.$0);
>>>>> grunt> dump x;
>>>>> 2011-03-08 14:45:03,408 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>>>>> ERROR 1045: Could not infer the matching function for
>>>>> org.apache.pig.builtin.COUNT as multiple or none of them fit. Please use an
>>>>> explicit cast.
>>>>> Details at logfile:
>>>>> /Users/deepakkv/Documents/opensource/pig/working/pig_1299575686422.log
>>>>> grunt> b = group a all;
>>>>> grunt> x = foreach b generate COUNT(a.$0,a.$0);
>>>>> grunt> dump x;
>>>>> 2011-03-08 14:45:19,668 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>>>>> ERROR 1045: Could not infer the matching function for
>>>>> org.apache.pig.builtin.COUNT as multiple or none of them fit. Please use an
>>>>> explicit cast.
>>>>> Details at logfile:
>>>>> /Users/deepakkv/Documents/opensource/pig/working/pig_1299575686422.log
>>>>> grunt> quit
>>>>>
>>>>> Regards,
>>>>> Deepak
>>>>>
>>>>> On Tue, Mar 8, 2011 at 12:12 PM, deepak kumar v <de...@gmail.com>wrote:
>>>>>
>>>>>> Hi Dmitriy,
>>>>>>
>>>>>> I was looking SUBSTRING.java and thats exactly(getArgToFuncMapping)
>>>>>> what i am trying now with COUNT.
>>>>>> Waiting for the build to complete and test out my changes before i
>>>>>> could post this option.
>>>>>>
>>>>>> Regards,
>>>>>> Deepak
>>>>>>
>>>>>>
>>>>>> On Tue, Mar 8, 2011 at 11:56 AM, Dmitriy Ryaboy <dv...@gmail.com>wrote:
>>>>>>
>>>>>>> Actually I think if you just implement getArgToFuncMapping for COUNT,
>>>>>>> where you only return a mapping for a single bag argument, pig will notice
>>>>>>> that the wrong number of args is supplied during the compilation phase and
>>>>>>> no runtime exceptions will be required.
>>>>>>>
>>>>>>> I haven't checked how well the funcSpec mapping works with Bags,
>>>>>>> that's something to experiment with.
>>>>>>>
>>>>>>> D
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Mar 7, 2011 at 9:55 PM, deepak kumar v <de...@gmail.com>wrote:
>>>>>>>
>>>>>>>> Hi Pig Developers,
>>>>>>>> This is my first dive into open source contribution and i hope to
>>>>>>>> dive deep.
>>>>>>>>
>>>>>>>> I was going through https://issues.apache.org/jira/browse/PIG-671and
>>>>>>>> observed the following with COUNT.java
>>>>>>>>
>>>>>>>> COUNT.exec() always retrieves the first item from input tuple which
>>>>>>>> it
>>>>>>>> assumes is a bag and counts the numbers of items in the bag.
>>>>>>>> Even if we pass multiple arguments to COUNT(), it will always pick
>>>>>>>> the first
>>>>>>>> argument.
>>>>>>>>
>>>>>>>> There are few ways we go through this
>>>>>>>> a) Leave as is cause it returns correct result for counting the
>>>>>>>> number of
>>>>>>>> items in the first argument.
>>>>>>>> OR
>>>>>>>> b) Make a check for the size of the input tuple in COUNT.exec() and
>>>>>>>> if it is
>>>>>>>> not 1 then throw ExecException()  or IllegalArgumentException {might
>>>>>>>> be
>>>>>>>> correct}
>>>>>>>> which will cause the Map job to fail.
>>>>>>>>
>>>>>>>> Let me know how to we go about it.
>>>>>>>>
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Deepak
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: PIG-671

Posted by deepak kumar v <de...@gmail.com>.
Hi,
Sure, will do that.

Meanwhile, I tried to import pig src into eclipse using New project wizard
with "Create a Java Project from Ant Buildfile" option, i see the following
error
"Reference ${cp} not found" and it fails to import src code. A empty New
project "Pig" is created as a result.

Before trying above i ran

ant eclipse-files

which ran to successful.


Regards,

Deepak


On Wed, Mar 9, 2011 at 1:37 AM, Dmitriy Ryaboy <dv...@gmail.com> wrote:

> Nice work.
>
> You are going to want to make sure COUNT also works on the scenarios it's
> supposed to work on. So far you only seem to be testing failures?
>
> Also, write it up as proper unit tests so we don't get regressions.
>
> D
>
>
> On Tue, Mar 8, 2011 at 10:40 AM, deepak kumar v <de...@gmail.com>wrote:
>
>> Hi Dmitriy,
>> Will checkout TestBuiltins.java once my eclipse setup is ready.
>> Meanwhile i tried the couple of scenarios that you mentioned.
>>
>> 1) Schema defined for a
>> grunt> a = load 'test.txt' as (data:chararray);
>>  grunt> b = group a all;
>> grunt> describe a;
>> a: {data: chararray}
>> grunt> describe b;
>> b: {group: chararray,a: {(data: chararray)}}
>> grunt> x = foreach b generate COUNT(a.data, a.data);
>> grunt> dump x;
>> 2011-03-09 00:06:40,953 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>> ERROR 1045: Could not infer the matching function for
>> org.apache.pig.builtin.COUNT as multiple or none of them fit. Please use an
>> explicit cast.
>>
>> 2) Schema not defined for a
>> grunt> a = load 'test.txt';
>> grunt> b = group a all;
>> grunt> describe a;
>> Schema for a unknown.
>> grunt> describe b;
>> b: {group: chararray,a: {(null)}}
>> grunt> x = foreach b generate COUNT(a.$0, a.$0);
>> grunt> dump x;
>> 2011-03-09 00:07:58,715 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>> ERROR 1045: Could not infer the matching function for
>> org.apache.pig.builtin.COUNT as multiple or none of them fit. Please use an
>> explicit cast.
>>
>>
>> Changes seems to be working with both scenarios.
>>
>> Regards,
>> Deepak
>>
>>
>>
>> On Tue, Mar 8, 2011 at 10:45 PM, Dmitriy Ryaboy <dv...@gmail.com>wrote:
>>
>>> ant test doesn't hang, it just runs for a very long time. If you want to
>>> test something specific, you can name the test class like so:
>>>
>>> ant test -Dtestcase=TestBuiltins
>>> (this will run the tests in TestBuiltins.java)
>>>
>>> COUNT tests are probably in TestBuiltins or in TestAlgebraic. Look
>>> around.
>>>
>>> You definitely want to add some tests to make sure that COUNT still works
>>> on the cases where it's supposed to work, and that the Pig parser no longer
>>> allows COUNT with the wrong number or type of arguments.
>>>
>>> I would test in particular what happens when a bag is supplied for which
>>> a schema is known -- Pig might be making a distinction between a bag with a
>>> known schema and a bag with an unknown schema, and we definitely want both
>>> of those to work.
>>>
>>> D
>>>
>>>
>>> On Tue, Mar 8, 2011 at 1:58 AM, deepak kumar v <de...@gmail.com>wrote:
>>>
>>>> Hi,
>>>> PFA patch of fix for PIG-671. Used the approach mentioned in previous
>>>> email.
>>>> I could not find any test cases for Count.java, besides ant test just
>>>> hung up.
>>>>
>>>> Output:
>>>> grunt> a = load 'test.txt';
>>>> grunt> x = foreach a generate COUNT(a.$0,a.$0);
>>>> grunt> dump x;
>>>> 2011-03-08 14:45:03,408 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>>>> ERROR 1045: Could not infer the matching function for
>>>> org.apache.pig.builtin.COUNT as multiple or none of them fit. Please use an
>>>> explicit cast.
>>>> Details at logfile:
>>>> /Users/deepakkv/Documents/opensource/pig/working/pig_1299575686422.log
>>>> grunt> b = group a all;
>>>> grunt> x = foreach b generate COUNT(a.$0,a.$0);
>>>> grunt> dump x;
>>>> 2011-03-08 14:45:19,668 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>>>> ERROR 1045: Could not infer the matching function for
>>>> org.apache.pig.builtin.COUNT as multiple or none of them fit. Please use an
>>>> explicit cast.
>>>> Details at logfile:
>>>> /Users/deepakkv/Documents/opensource/pig/working/pig_1299575686422.log
>>>> grunt> quit
>>>>
>>>> Regards,
>>>> Deepak
>>>>
>>>> On Tue, Mar 8, 2011 at 12:12 PM, deepak kumar v <de...@gmail.com>wrote:
>>>>
>>>>> Hi Dmitriy,
>>>>>
>>>>> I was looking SUBSTRING.java and thats exactly(getArgToFuncMapping)
>>>>> what i am trying now with COUNT.
>>>>> Waiting for the build to complete and test out my changes before i
>>>>> could post this option.
>>>>>
>>>>> Regards,
>>>>> Deepak
>>>>>
>>>>>
>>>>> On Tue, Mar 8, 2011 at 11:56 AM, Dmitriy Ryaboy <dv...@gmail.com>wrote:
>>>>>
>>>>>> Actually I think if you just implement getArgToFuncMapping for COUNT,
>>>>>> where you only return a mapping for a single bag argument, pig will notice
>>>>>> that the wrong number of args is supplied during the compilation phase and
>>>>>> no runtime exceptions will be required.
>>>>>>
>>>>>> I haven't checked how well the funcSpec mapping works with Bags,
>>>>>> that's something to experiment with.
>>>>>>
>>>>>> D
>>>>>>
>>>>>>
>>>>>> On Mon, Mar 7, 2011 at 9:55 PM, deepak kumar v <de...@gmail.com>wrote:
>>>>>>
>>>>>>> Hi Pig Developers,
>>>>>>> This is my first dive into open source contribution and i hope to
>>>>>>> dive deep.
>>>>>>>
>>>>>>> I was going through https://issues.apache.org/jira/browse/PIG-671and
>>>>>>> observed the following with COUNT.java
>>>>>>>
>>>>>>> COUNT.exec() always retrieves the first item from input tuple which
>>>>>>> it
>>>>>>> assumes is a bag and counts the numbers of items in the bag.
>>>>>>> Even if we pass multiple arguments to COUNT(), it will always pick
>>>>>>> the first
>>>>>>> argument.
>>>>>>>
>>>>>>> There are few ways we go through this
>>>>>>> a) Leave as is cause it returns correct result for counting the
>>>>>>> number of
>>>>>>> items in the first argument.
>>>>>>> OR
>>>>>>> b) Make a check for the size of the input tuple in COUNT.exec() and
>>>>>>> if it is
>>>>>>> not 1 then throw ExecException()  or IllegalArgumentException {might
>>>>>>> be
>>>>>>> correct}
>>>>>>> which will cause the Map job to fail.
>>>>>>>
>>>>>>> Let me know how to we go about it.
>>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>> Deepak
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: PIG-671

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Nice work.

You are going to want to make sure COUNT also works on the scenarios it's
supposed to work on. So far you only seem to be testing failures?

Also, write it up as proper unit tests so we don't get regressions.

D

On Tue, Mar 8, 2011 at 10:40 AM, deepak kumar v <de...@gmail.com> wrote:

> Hi Dmitriy,
> Will checkout TestBuiltins.java once my eclipse setup is ready.
> Meanwhile i tried the couple of scenarios that you mentioned.
>
> 1) Schema defined for a
> grunt> a = load 'test.txt' as (data:chararray);
> grunt> b = group a all;
> grunt> describe a;
> a: {data: chararray}
> grunt> describe b;
> b: {group: chararray,a: {(data: chararray)}}
> grunt> x = foreach b generate COUNT(a.data, a.data);
> grunt> dump x;
> 2011-03-09 00:06:40,953 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1045: Could not infer the matching function for
> org.apache.pig.builtin.COUNT as multiple or none of them fit. Please use an
> explicit cast.
>
> 2) Schema not defined for a
> grunt> a = load 'test.txt';
> grunt> b = group a all;
> grunt> describe a;
> Schema for a unknown.
> grunt> describe b;
> b: {group: chararray,a: {(null)}}
> grunt> x = foreach b generate COUNT(a.$0, a.$0);
> grunt> dump x;
> 2011-03-09 00:07:58,715 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1045: Could not infer the matching function for
> org.apache.pig.builtin.COUNT as multiple or none of them fit. Please use an
> explicit cast.
>
>
> Changes seems to be working with both scenarios.
>
> Regards,
> Deepak
>
>
>
> On Tue, Mar 8, 2011 at 10:45 PM, Dmitriy Ryaboy <dv...@gmail.com>wrote:
>
>> ant test doesn't hang, it just runs for a very long time. If you want to
>> test something specific, you can name the test class like so:
>>
>> ant test -Dtestcase=TestBuiltins
>> (this will run the tests in TestBuiltins.java)
>>
>> COUNT tests are probably in TestBuiltins or in TestAlgebraic. Look around.
>>
>> You definitely want to add some tests to make sure that COUNT still works
>> on the cases where it's supposed to work, and that the Pig parser no longer
>> allows COUNT with the wrong number or type of arguments.
>>
>> I would test in particular what happens when a bag is supplied for which a
>> schema is known -- Pig might be making a distinction between a bag with a
>> known schema and a bag with an unknown schema, and we definitely want both
>> of those to work.
>>
>> D
>>
>>
>> On Tue, Mar 8, 2011 at 1:58 AM, deepak kumar v <de...@gmail.com>wrote:
>>
>>> Hi,
>>> PFA patch of fix for PIG-671. Used the approach mentioned in previous
>>> email.
>>> I could not find any test cases for Count.java, besides ant test just
>>> hung up.
>>>
>>> Output:
>>> grunt> a = load 'test.txt';
>>> grunt> x = foreach a generate COUNT(a.$0,a.$0);
>>> grunt> dump x;
>>> 2011-03-08 14:45:03,408 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>>> ERROR 1045: Could not infer the matching function for
>>> org.apache.pig.builtin.COUNT as multiple or none of them fit. Please use an
>>> explicit cast.
>>> Details at logfile:
>>> /Users/deepakkv/Documents/opensource/pig/working/pig_1299575686422.log
>>> grunt> b = group a all;
>>> grunt> x = foreach b generate COUNT(a.$0,a.$0);
>>> grunt> dump x;
>>> 2011-03-08 14:45:19,668 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>>> ERROR 1045: Could not infer the matching function for
>>> org.apache.pig.builtin.COUNT as multiple or none of them fit. Please use an
>>> explicit cast.
>>> Details at logfile:
>>> /Users/deepakkv/Documents/opensource/pig/working/pig_1299575686422.log
>>> grunt> quit
>>>
>>> Regards,
>>> Deepak
>>>
>>> On Tue, Mar 8, 2011 at 12:12 PM, deepak kumar v <de...@gmail.com>wrote:
>>>
>>>> Hi Dmitriy,
>>>>
>>>> I was looking SUBSTRING.java and thats exactly(getArgToFuncMapping) what
>>>> i am trying now with COUNT.
>>>> Waiting for the build to complete and test out my changes before i could
>>>> post this option.
>>>>
>>>> Regards,
>>>> Deepak
>>>>
>>>>
>>>> On Tue, Mar 8, 2011 at 11:56 AM, Dmitriy Ryaboy <dv...@gmail.com>wrote:
>>>>
>>>>> Actually I think if you just implement getArgToFuncMapping for COUNT,
>>>>> where you only return a mapping for a single bag argument, pig will notice
>>>>> that the wrong number of args is supplied during the compilation phase and
>>>>> no runtime exceptions will be required.
>>>>>
>>>>> I haven't checked how well the funcSpec mapping works with Bags, that's
>>>>> something to experiment with.
>>>>>
>>>>> D
>>>>>
>>>>>
>>>>> On Mon, Mar 7, 2011 at 9:55 PM, deepak kumar v <de...@gmail.com>wrote:
>>>>>
>>>>>> Hi Pig Developers,
>>>>>> This is my first dive into open source contribution and i hope to dive
>>>>>> deep.
>>>>>>
>>>>>> I was going through https://issues.apache.org/jira/browse/PIG-671 and
>>>>>> observed the following with COUNT.java
>>>>>>
>>>>>> COUNT.exec() always retrieves the first item from input tuple which it
>>>>>> assumes is a bag and counts the numbers of items in the bag.
>>>>>> Even if we pass multiple arguments to COUNT(), it will always pick the
>>>>>> first
>>>>>> argument.
>>>>>>
>>>>>> There are few ways we go through this
>>>>>> a) Leave as is cause it returns correct result for counting the number
>>>>>> of
>>>>>> items in the first argument.
>>>>>> OR
>>>>>> b) Make a check for the size of the input tuple in COUNT.exec() and if
>>>>>> it is
>>>>>> not 1 then throw ExecException()  or IllegalArgumentException {might
>>>>>> be
>>>>>> correct}
>>>>>> which will cause the Map job to fail.
>>>>>>
>>>>>> Let me know how to we go about it.
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Deepak
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: PIG-671

Posted by deepak kumar v <de...@gmail.com>.
Hi Dmitriy,
Will checkout TestBuiltins.java once my eclipse setup is ready.
Meanwhile i tried the couple of scenarios that you mentioned.

1) Schema defined for a
grunt> a = load 'test.txt' as (data:chararray);
grunt> b = group a all;
grunt> describe a;
a: {data: chararray}
grunt> describe b;
b: {group: chararray,a: {(data: chararray)}}
grunt> x = foreach b generate COUNT(a.data, a.data);
grunt> dump x;
2011-03-09 00:06:40,953 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1045: Could not infer the matching function for
org.apache.pig.builtin.COUNT as multiple or none of them fit. Please use an
explicit cast.

2) Schema not defined for a
grunt> a = load 'test.txt';
grunt> b = group a all;
grunt> describe a;
Schema for a unknown.
grunt> describe b;
b: {group: chararray,a: {(null)}}
grunt> x = foreach b generate COUNT(a.$0, a.$0);
grunt> dump x;
2011-03-09 00:07:58,715 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1045: Could not infer the matching function for
org.apache.pig.builtin.COUNT as multiple or none of them fit. Please use an
explicit cast.


Changes seems to be working with both scenarios.

Regards,
Deepak



On Tue, Mar 8, 2011 at 10:45 PM, Dmitriy Ryaboy <dv...@gmail.com> wrote:

> ant test doesn't hang, it just runs for a very long time. If you want to
> test something specific, you can name the test class like so:
>
> ant test -Dtestcase=TestBuiltins
> (this will run the tests in TestBuiltins.java)
>
> COUNT tests are probably in TestBuiltins or in TestAlgebraic. Look around.
>
> You definitely want to add some tests to make sure that COUNT still works
> on the cases where it's supposed to work, and that the Pig parser no longer
> allows COUNT with the wrong number or type of arguments.
>
> I would test in particular what happens when a bag is supplied for which a
> schema is known -- Pig might be making a distinction between a bag with a
> known schema and a bag with an unknown schema, and we definitely want both
> of those to work.
>
> D
>
>
> On Tue, Mar 8, 2011 at 1:58 AM, deepak kumar v <de...@gmail.com>wrote:
>
>> Hi,
>> PFA patch of fix for PIG-671. Used the approach mentioned in previous
>> email.
>> I could not find any test cases for Count.java, besides ant test just hung
>> up.
>>
>> Output:
>> grunt> a = load 'test.txt';
>> grunt> x = foreach a generate COUNT(a.$0,a.$0);
>> grunt> dump x;
>> 2011-03-08 14:45:03,408 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>> ERROR 1045: Could not infer the matching function for
>> org.apache.pig.builtin.COUNT as multiple or none of them fit. Please use an
>> explicit cast.
>> Details at logfile:
>> /Users/deepakkv/Documents/opensource/pig/working/pig_1299575686422.log
>> grunt> b = group a all;
>> grunt> x = foreach b generate COUNT(a.$0,a.$0);
>> grunt> dump x;
>> 2011-03-08 14:45:19,668 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>> ERROR 1045: Could not infer the matching function for
>> org.apache.pig.builtin.COUNT as multiple or none of them fit. Please use an
>> explicit cast.
>> Details at logfile:
>> /Users/deepakkv/Documents/opensource/pig/working/pig_1299575686422.log
>> grunt> quit
>>
>> Regards,
>> Deepak
>>
>> On Tue, Mar 8, 2011 at 12:12 PM, deepak kumar v <de...@gmail.com>wrote:
>>
>>> Hi Dmitriy,
>>>
>>> I was looking SUBSTRING.java and thats exactly(getArgToFuncMapping) what
>>> i am trying now with COUNT.
>>> Waiting for the build to complete and test out my changes before i could
>>> post this option.
>>>
>>> Regards,
>>> Deepak
>>>
>>>
>>> On Tue, Mar 8, 2011 at 11:56 AM, Dmitriy Ryaboy <dv...@gmail.com>wrote:
>>>
>>>> Actually I think if you just implement getArgToFuncMapping for COUNT,
>>>> where you only return a mapping for a single bag argument, pig will notice
>>>> that the wrong number of args is supplied during the compilation phase and
>>>> no runtime exceptions will be required.
>>>>
>>>> I haven't checked how well the funcSpec mapping works with Bags, that's
>>>> something to experiment with.
>>>>
>>>> D
>>>>
>>>>
>>>> On Mon, Mar 7, 2011 at 9:55 PM, deepak kumar v <de...@gmail.com>wrote:
>>>>
>>>>> Hi Pig Developers,
>>>>> This is my first dive into open source contribution and i hope to dive
>>>>> deep.
>>>>>
>>>>> I was going through https://issues.apache.org/jira/browse/PIG-671 and
>>>>> observed the following with COUNT.java
>>>>>
>>>>> COUNT.exec() always retrieves the first item from input tuple which it
>>>>> assumes is a bag and counts the numbers of items in the bag.
>>>>> Even if we pass multiple arguments to COUNT(), it will always pick the
>>>>> first
>>>>> argument.
>>>>>
>>>>> There are few ways we go through this
>>>>> a) Leave as is cause it returns correct result for counting the number
>>>>> of
>>>>> items in the first argument.
>>>>> OR
>>>>> b) Make a check for the size of the input tuple in COUNT.exec() and if
>>>>> it is
>>>>> not 1 then throw ExecException()  or IllegalArgumentException {might be
>>>>> correct}
>>>>> which will cause the Map job to fail.
>>>>>
>>>>> Let me know how to we go about it.
>>>>>
>>>>>
>>>>> Regards,
>>>>> Deepak
>>>>>
>>>>
>>>>
>>>
>>
>

Re: PIG-671

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
ant test doesn't hang, it just runs for a very long time. If you want to
test something specific, you can name the test class like so:

ant test -Dtestcase=TestBuiltins
(this will run the tests in TestBuiltins.java)

COUNT tests are probably in TestBuiltins or in TestAlgebraic. Look around.

You definitely want to add some tests to make sure that COUNT still works on
the cases where it's supposed to work, and that the Pig parser no longer
allows COUNT with the wrong number or type of arguments.

I would test in particular what happens when a bag is supplied for which a
schema is known -- Pig might be making a distinction between a bag with a
known schema and a bag with an unknown schema, and we definitely want both
of those to work.

D

On Tue, Mar 8, 2011 at 1:58 AM, deepak kumar v <de...@gmail.com> wrote:

> Hi,
> PFA patch of fix for PIG-671. Used the approach mentioned in previous
> email.
> I could not find any test cases for Count.java, besides ant test just hung
> up.
>
> Output:
> grunt> a = load 'test.txt';
> grunt> x = foreach a generate COUNT(a.$0,a.$0);
> grunt> dump x;
> 2011-03-08 14:45:03,408 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1045: Could not infer the matching function for
> org.apache.pig.builtin.COUNT as multiple or none of them fit. Please use an
> explicit cast.
> Details at logfile:
> /Users/deepakkv/Documents/opensource/pig/working/pig_1299575686422.log
> grunt> b = group a all;
> grunt> x = foreach b generate COUNT(a.$0,a.$0);
> grunt> dump x;
> 2011-03-08 14:45:19,668 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1045: Could not infer the matching function for
> org.apache.pig.builtin.COUNT as multiple or none of them fit. Please use an
> explicit cast.
> Details at logfile:
> /Users/deepakkv/Documents/opensource/pig/working/pig_1299575686422.log
> grunt> quit
>
> Regards,
> Deepak
>
> On Tue, Mar 8, 2011 at 12:12 PM, deepak kumar v <de...@gmail.com>wrote:
>
>> Hi Dmitriy,
>>
>> I was looking SUBSTRING.java and thats exactly(getArgToFuncMapping) what i
>> am trying now with COUNT.
>> Waiting for the build to complete and test out my changes before i could
>> post this option.
>>
>> Regards,
>> Deepak
>>
>>
>> On Tue, Mar 8, 2011 at 11:56 AM, Dmitriy Ryaboy <dv...@gmail.com>wrote:
>>
>>> Actually I think if you just implement getArgToFuncMapping for COUNT,
>>> where you only return a mapping for a single bag argument, pig will notice
>>> that the wrong number of args is supplied during the compilation phase and
>>> no runtime exceptions will be required.
>>>
>>> I haven't checked how well the funcSpec mapping works with Bags, that's
>>> something to experiment with.
>>>
>>> D
>>>
>>>
>>> On Mon, Mar 7, 2011 at 9:55 PM, deepak kumar v <de...@gmail.com>wrote:
>>>
>>>> Hi Pig Developers,
>>>> This is my first dive into open source contribution and i hope to dive
>>>> deep.
>>>>
>>>> I was going through https://issues.apache.org/jira/browse/PIG-671 and
>>>> observed the following with COUNT.java
>>>>
>>>> COUNT.exec() always retrieves the first item from input tuple which it
>>>> assumes is a bag and counts the numbers of items in the bag.
>>>> Even if we pass multiple arguments to COUNT(), it will always pick the
>>>> first
>>>> argument.
>>>>
>>>> There are few ways we go through this
>>>> a) Leave as is cause it returns correct result for counting the number
>>>> of
>>>> items in the first argument.
>>>> OR
>>>> b) Make a check for the size of the input tuple in COUNT.exec() and if
>>>> it is
>>>> not 1 then throw ExecException()  or IllegalArgumentException {might be
>>>> correct}
>>>> which will cause the Map job to fail.
>>>>
>>>> Let me know how to we go about it.
>>>>
>>>>
>>>> Regards,
>>>> Deepak
>>>>
>>>
>>>
>>
>

Re: PIG-671

Posted by deepak kumar v <de...@gmail.com>.
Hi,
PFA patch of fix for PIG-671. Used the approach mentioned in previous email.
I could not find any test cases for Count.java, besides ant test just hung
up.

Output:
grunt> a = load 'test.txt';
grunt> x = foreach a generate COUNT(a.$0,a.$0);
grunt> dump x;
2011-03-08 14:45:03,408 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1045: Could not infer the matching function for
org.apache.pig.builtin.COUNT as multiple or none of them fit. Please use an
explicit cast.
Details at logfile:
/Users/deepakkv/Documents/opensource/pig/working/pig_1299575686422.log
grunt> b = group a all;
grunt> x = foreach b generate COUNT(a.$0,a.$0);
grunt> dump x;
2011-03-08 14:45:19,668 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1045: Could not infer the matching function for
org.apache.pig.builtin.COUNT as multiple or none of them fit. Please use an
explicit cast.
Details at logfile:
/Users/deepakkv/Documents/opensource/pig/working/pig_1299575686422.log
grunt> quit

Regards,
Deepak

On Tue, Mar 8, 2011 at 12:12 PM, deepak kumar v <de...@gmail.com> wrote:

> Hi Dmitriy,
>
> I was looking SUBSTRING.java and thats exactly(getArgToFuncMapping) what i
> am trying now with COUNT.
> Waiting for the build to complete and test out my changes before i could
> post this option.
>
> Regards,
> Deepak
>
>
> On Tue, Mar 8, 2011 at 11:56 AM, Dmitriy Ryaboy <dv...@gmail.com>wrote:
>
>> Actually I think if you just implement getArgToFuncMapping for COUNT,
>> where you only return a mapping for a single bag argument, pig will notice
>> that the wrong number of args is supplied during the compilation phase and
>> no runtime exceptions will be required.
>>
>> I haven't checked how well the funcSpec mapping works with Bags, that's
>> something to experiment with.
>>
>> D
>>
>>
>> On Mon, Mar 7, 2011 at 9:55 PM, deepak kumar v <de...@gmail.com>wrote:
>>
>>> Hi Pig Developers,
>>> This is my first dive into open source contribution and i hope to dive
>>> deep.
>>>
>>> I was going through https://issues.apache.org/jira/browse/PIG-671 and
>>> observed the following with COUNT.java
>>>
>>> COUNT.exec() always retrieves the first item from input tuple which it
>>> assumes is a bag and counts the numbers of items in the bag.
>>> Even if we pass multiple arguments to COUNT(), it will always pick the
>>> first
>>> argument.
>>>
>>> There are few ways we go through this
>>> a) Leave as is cause it returns correct result for counting the number of
>>> items in the first argument.
>>> OR
>>> b) Make a check for the size of the input tuple in COUNT.exec() and if it
>>> is
>>> not 1 then throw ExecException()  or IllegalArgumentException {might be
>>> correct}
>>> which will cause the Map job to fail.
>>>
>>> Let me know how to we go about it.
>>>
>>>
>>> Regards,
>>> Deepak
>>>
>>
>>
>

Re: PIG-671

Posted by deepak kumar v <de...@gmail.com>.
Hi Dmitriy,

I was looking SUBSTRING.java and thats exactly(getArgToFuncMapping) what i
am trying now with COUNT.
Waiting for the build to complete and test out my changes before i could
post this option.

Regards,
Deepak

On Tue, Mar 8, 2011 at 11:56 AM, Dmitriy Ryaboy <dv...@gmail.com> wrote:

> Actually I think if you just implement getArgToFuncMapping for COUNT, where
> you only return a mapping for a single bag argument, pig will notice that
> the wrong number of args is supplied during the compilation phase and no
> runtime exceptions will be required.
>
> I haven't checked how well the funcSpec mapping works with Bags, that's
> something to experiment with.
>
> D
>
>
> On Mon, Mar 7, 2011 at 9:55 PM, deepak kumar v <de...@gmail.com>wrote:
>
>> Hi Pig Developers,
>> This is my first dive into open source contribution and i hope to dive
>> deep.
>>
>> I was going through https://issues.apache.org/jira/browse/PIG-671 and
>> observed the following with COUNT.java
>>
>> COUNT.exec() always retrieves the first item from input tuple which it
>> assumes is a bag and counts the numbers of items in the bag.
>> Even if we pass multiple arguments to COUNT(), it will always pick the
>> first
>> argument.
>>
>> There are few ways we go through this
>> a) Leave as is cause it returns correct result for counting the number of
>> items in the first argument.
>> OR
>> b) Make a check for the size of the input tuple in COUNT.exec() and if it
>> is
>> not 1 then throw ExecException()  or IllegalArgumentException {might be
>> correct}
>> which will cause the Map job to fail.
>>
>> Let me know how to we go about it.
>>
>>
>> Regards,
>> Deepak
>>
>
>

Re: PIG-671

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Actually I think if you just implement getArgToFuncMapping for COUNT, where
you only return a mapping for a single bag argument, pig will notice that
the wrong number of args is supplied during the compilation phase and no
runtime exceptions will be required.

I haven't checked how well the funcSpec mapping works with Bags, that's
something to experiment with.

D

On Mon, Mar 7, 2011 at 9:55 PM, deepak kumar v <de...@gmail.com> wrote:

> Hi Pig Developers,
> This is my first dive into open source contribution and i hope to dive
> deep.
>
> I was going through https://issues.apache.org/jira/browse/PIG-671 and
> observed the following with COUNT.java
>
> COUNT.exec() always retrieves the first item from input tuple which it
> assumes is a bag and counts the numbers of items in the bag.
> Even if we pass multiple arguments to COUNT(), it will always pick the
> first
> argument.
>
> There are few ways we go through this
> a) Leave as is cause it returns correct result for counting the number of
> items in the first argument.
> OR
> b) Make a check for the size of the input tuple in COUNT.exec() and if it
> is
> not 1 then throw ExecException()  or IllegalArgumentException {might be
> correct}
> which will cause the Map job to fail.
>
> Let me know how to we go about it.
>
>
> Regards,
> Deepak
>

Re: PIG-671

Posted by Scott Carey <sc...@richrelevance.com>.

On 3/7/11 9:55 PM, "deepak kumar v" <de...@gmail.com> wrote:

>Hi Pig Developers,
>This is my first dive into open source contribution and i hope to dive
>deep.
>
>I was going through https://issues.apache.org/jira/browse/PIG-671 and
>observed the following with COUNT.java
>
>COUNT.exec() always retrieves the first item from input tuple which it
>assumes is a bag and counts the numbers of items in the bag.
>Even if we pass multiple arguments to COUNT(), it will always pick the
>first
>argument.
>
>There are few ways we go through this
>a) Leave as is cause it returns correct result for counting the number of
>items in the first argument.
>OR
>b) Make a check for the size of the input tuple in COUNT.exec() and if it
>is
>not 1 then throw ExecException()  or IllegalArgumentException {might be
>correct}
>which will cause the Map job to fail.

What about:

c) Count the number of non-null tuples in the bag (same as COUNT_STAR as
long as null tuples are not inserted somehow).  This is what users seem to
expect; I've seen several bugs due to users doing COUNT(FOO) and not
expecting it to be equivalent to COUNT(FOO.$0).

>
>Let me know how to we go about it.
>
>
>Regards,
>Deepak