You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Cameron Gandevia <cg...@gmail.com> on 2011/11/03 19:32:43 UTC

Exception in SUM command

Hi
I am experiencing the following issues in part of my pig script.

*data = FOREACH metricLogLine GENERATE host, REGEX_EXTRACT_ALL(body,
'.*gr.perf.metrics.Count\\s*\\-\\s*([A-Za-z\\.]+)\\s*(\\d+)') AS regex;
eventCountData = FILTER data BY regex is not null;
eventCountData = FOREACH data GENERATE host, FLATTEN(regex) AS
(name:CHARARRAY, count:DOUBLE);
*
If I describe the data I get.
eventCountData: {host: chararray,name: chararray,count: double}

I then perform

*eventNameGroupPerHost = GROUP eventCountData BY (name,host);
*
and I get
eventNameGroupPerHost: {group: (name: chararray,host:
chararray),eventCountData: {host: chararray,name: chararray,count: double}}


*overviewEventsPerHost = FOREACH eventNameGroupPerHost GENERATE
    group.host,
    group.name,
    SUM(eventCountData.count);
*

and I get

*org.apache.pig.backend.executionengine.ExecException: ERROR 2103: Problem
while computing sum of doubles.
at org.apache.pig.builtin.DoubleSum.sum(DoubleSum.java:147)
at org.apache.pig.builtin.DoubleSum.exec(DoubleSum.java:46)
at org.apache.pig.builtin.DoubleSum.exec(DoubleSum.java:41)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:245)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:310)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:357)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:290)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:131)
*

we then tried running it through our own UDF to see what was happening and
we found.

java.lang.String cannot be cast to java.lang.Number, tuple:(0), field:0

It seems as though pig thinks zero is a string. Is this normal or a bug?

Thanks

Re: Exception in SUM command

Posted by Daniel Dai <da...@hortonworks.com>.
Note that:
FOREACH data GENERATE host, FLATTEN(regex) AS (name:CHARARRAY,
count:DOUBLE);
does not do convert count to DOUBLE automatically, you need to do the cast
by yourself. What "describe" tells you is a lie. It is a known bug, see
https://issues.apache.org/jira/browse/PIG-2315

Daniel

On Thu, Nov 3, 2011 at 11:55 AM, Andrea Leistra
<An...@concur.com>wrote:

> REGEX_EXTRACT_ALL returns a chararray.  If what you're getting back is a
> numerical value you will need to cast it as such before you can do math
> with it.
>
> -----Original Message-----
> From: Cameron Gandevia [mailto:cgandevia@gmail.com]
> Sent: Thursday, November 03, 2011 2:51 PM
> To: user@pig.apache.org
> Subject: Re: Exception in SUM command
>
> So it seems like this problem is isolated in the REGEX function.
>
> If I load the data using
>
> raw = LOAD '/test/testfile.txt' USING PigStorage('\t') AS
> (name:CHARARRAY,host:CHARARRAY,count:DOUBLE);
>
> everything is fine.
>
> If I load it
>
> raw = LOAD '/test/testfile.txt' USING PigStorage('\t') AS
> (name:CHARARRAY,host:CHARARRAY,count:CHARARRAY);
> regexExtract = FOREACH raw GENERATE name, host, REGEX_EXTRACT_ALL(count,
> '(\\d+)') AS regex;
>
> it fails
>
> On Thu, Nov 3, 2011 at 11:32 AM, Cameron Gandevia <cgandevia@gmail.com
> >wrote:
>
> > Hi
> > I am experiencing the following issues in part of my pig script.
> >
> > *data = FOREACH metricLogLine GENERATE host, REGEX_EXTRACT_ALL(body,
> > '.*gr.perf.metrics.Count\\s*\\-\\s*([A-Za-z\\.]+)\\s*(\\d+)') AS
> > regex; eventCountData = FILTER data BY regex is not null;
> > eventCountData = FOREACH data GENERATE host, FLATTEN(regex) AS
> > (name:CHARARRAY, count:DOUBLE);
> > *
> > If I describe the data I get.
> > eventCountData: {host: chararray,name: chararray,count: double}
> >
> > I then perform
> >
> > *eventNameGroupPerHost = GROUP eventCountData BY (name,host);
> > *
> > and I get
> > eventNameGroupPerHost: {group: (name: chararray,host:
> > chararray),eventCountData: {host: chararray,name: chararray,count:
> > double}}
> >
> >
> > *overviewEventsPerHost = FOREACH eventNameGroupPerHost GENERATE
> >     group.host,
> >     group.name,
> >     SUM(eventCountData.count);
> > *
> >
> > and I get
> >
> > *org.apache.pig.backend.executionengine.ExecException: ERROR 2103:
> > Problem while computing sum of doubles.
> > at org.apache.pig.builtin.DoubleSum.sum(DoubleSum.java:147)
> > at org.apache.pig.builtin.DoubleSum.exec(DoubleSum.java:46)
> > at org.apache.pig.builtin.DoubleSum.exec(DoubleSum.java:41)
> > at
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expression
> > Operators.POUserFunc.getNext(POUserFunc.java:245)
> > at
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expression
> > Operators.POUserFunc.getNext(POUserFunc.java:310)
> > at
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relational
> > Operators.POForEach.processPlan(POForEach.java:357)
> > at
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relational
> > Operators.POForEach.getNext(POForEach.java:290)
> > at
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOp
> > erator.processInput(PhysicalOperator.java:276)
> > at
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relational
> > Operators.POStore.getNext(POStore.java:131)
> > *
> >
> > we then tried running it through our own UDF to see what was happening
> > and we found.
> >
> > java.lang.String cannot be cast to java.lang.Number, tuple:(0),
> > field:0
> >
> > It seems as though pig thinks zero is a string. Is this normal or a bug?
> >
> > Thanks
> >
>
>
>
> --
> Thanks
>
> Cameron Gandevia
>
> This e-mail message is authorized for use by the intended recipient only
> and may contain information that is privileged and confidential. If you
> received this message in error, please call us immediately at (425)
> 702-8808 and ask to speak to the message sender. Please do not copy,
> disseminate, or retain this message unless you are the intended recipient.
> In addition, to ensure the security of your data, please do not send any
> unencrypted credit card or personally identifiable information to this
> email address. Thank you.
>

RE: Exception in SUM command

Posted by Andrea Leistra <An...@concur.com>.
REGEX_EXTRACT_ALL returns a chararray.  If what you're getting back is a numerical value you will need to cast it as such before you can do math with it.

-----Original Message-----
From: Cameron Gandevia [mailto:cgandevia@gmail.com]
Sent: Thursday, November 03, 2011 2:51 PM
To: user@pig.apache.org
Subject: Re: Exception in SUM command

So it seems like this problem is isolated in the REGEX function.

If I load the data using

raw = LOAD '/test/testfile.txt' USING PigStorage('\t') AS (name:CHARARRAY,host:CHARARRAY,count:DOUBLE);

everything is fine.

If I load it

raw = LOAD '/test/testfile.txt' USING PigStorage('\t') AS (name:CHARARRAY,host:CHARARRAY,count:CHARARRAY);
regexExtract = FOREACH raw GENERATE name, host, REGEX_EXTRACT_ALL(count,
'(\\d+)') AS regex;

it fails

On Thu, Nov 3, 2011 at 11:32 AM, Cameron Gandevia <cg...@gmail.com>wrote:

> Hi
> I am experiencing the following issues in part of my pig script.
>
> *data = FOREACH metricLogLine GENERATE host, REGEX_EXTRACT_ALL(body,
> '.*gr.perf.metrics.Count\\s*\\-\\s*([A-Za-z\\.]+)\\s*(\\d+)') AS
> regex; eventCountData = FILTER data BY regex is not null;
> eventCountData = FOREACH data GENERATE host, FLATTEN(regex) AS
> (name:CHARARRAY, count:DOUBLE);
> *
> If I describe the data I get.
> eventCountData: {host: chararray,name: chararray,count: double}
>
> I then perform
>
> *eventNameGroupPerHost = GROUP eventCountData BY (name,host);
> *
> and I get
> eventNameGroupPerHost: {group: (name: chararray,host:
> chararray),eventCountData: {host: chararray,name: chararray,count:
> double}}
>
>
> *overviewEventsPerHost = FOREACH eventNameGroupPerHost GENERATE
>     group.host,
>     group.name,
>     SUM(eventCountData.count);
> *
>
> and I get
>
> *org.apache.pig.backend.executionengine.ExecException: ERROR 2103:
> Problem while computing sum of doubles.
> at org.apache.pig.builtin.DoubleSum.sum(DoubleSum.java:147)
> at org.apache.pig.builtin.DoubleSum.exec(DoubleSum.java:46)
> at org.apache.pig.builtin.DoubleSum.exec(DoubleSum.java:41)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expression
> Operators.POUserFunc.getNext(POUserFunc.java:245)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expression
> Operators.POUserFunc.getNext(POUserFunc.java:310)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relational
> Operators.POForEach.processPlan(POForEach.java:357)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relational
> Operators.POForEach.getNext(POForEach.java:290)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOp
> erator.processInput(PhysicalOperator.java:276)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relational
> Operators.POStore.getNext(POStore.java:131)
> *
>
> we then tried running it through our own UDF to see what was happening
> and we found.
>
> java.lang.String cannot be cast to java.lang.Number, tuple:(0),
> field:0
>
> It seems as though pig thinks zero is a string. Is this normal or a bug?
>
> Thanks
>



--
Thanks

Cameron Gandevia

This e-mail message is authorized for use by the intended recipient only and may contain information that is privileged and confidential. If you received this message in error, please call us immediately at (425) 702-8808 and ask to speak to the message sender. Please do not copy, disseminate, or retain this message unless you are the intended recipient. In addition, to ensure the security of your data, please do not send any unencrypted credit card or personally identifiable information to this email address. Thank you.

Re: Exception in SUM command

Posted by Cameron Gandevia <cg...@gmail.com>.
So it seems like this problem is isolated in the REGEX function.

If I load the data using

raw = LOAD '/test/testfile.txt' USING PigStorage('\t') AS
(name:CHARARRAY,host:CHARARRAY,count:DOUBLE);

everything is fine.

If I load it

raw = LOAD '/test/testfile.txt' USING PigStorage('\t') AS
(name:CHARARRAY,host:CHARARRAY,count:CHARARRAY);
regexExtract = FOREACH raw GENERATE name, host, REGEX_EXTRACT_ALL(count,
'(\\d+)') AS regex;

it fails

On Thu, Nov 3, 2011 at 11:32 AM, Cameron Gandevia <cg...@gmail.com>wrote:

> Hi
> I am experiencing the following issues in part of my pig script.
>
> *data = FOREACH metricLogLine GENERATE host, REGEX_EXTRACT_ALL(body,
> '.*gr.perf.metrics.Count\\s*\\-\\s*([A-Za-z\\.]+)\\s*(\\d+)') AS regex;
> eventCountData = FILTER data BY regex is not null;
> eventCountData = FOREACH data GENERATE host, FLATTEN(regex) AS
> (name:CHARARRAY, count:DOUBLE);
> *
> If I describe the data I get.
> eventCountData: {host: chararray,name: chararray,count: double}
>
> I then perform
>
> *eventNameGroupPerHost = GROUP eventCountData BY (name,host);
> *
> and I get
> eventNameGroupPerHost: {group: (name: chararray,host:
> chararray),eventCountData: {host: chararray,name: chararray,count: double}}
>
>
> *overviewEventsPerHost = FOREACH eventNameGroupPerHost GENERATE
>     group.host,
>     group.name,
>     SUM(eventCountData.count);
> *
>
> and I get
>
> *org.apache.pig.backend.executionengine.ExecException: ERROR 2103:
> Problem while computing sum of doubles.
> at org.apache.pig.builtin.DoubleSum.sum(DoubleSum.java:147)
> at org.apache.pig.builtin.DoubleSum.exec(DoubleSum.java:46)
> at org.apache.pig.builtin.DoubleSum.exec(DoubleSum.java:41)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:245)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:310)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:357)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:290)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:131)
> *
>
> we then tried running it through our own UDF to see what was happening and
> we found.
>
> java.lang.String cannot be cast to java.lang.Number, tuple:(0), field:0
>
> It seems as though pig thinks zero is a string. Is this normal or a bug?
>
> Thanks
>



-- 
Thanks

Cameron Gandevia