You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by da...@ya.ru on 2009/01/06 04:12:57 UTC

SUM() fails on longs

SUM can fail on a column declared as long. It seems to occur when
actual values in the column are small enough for int. They are
serialized and probably passed to SUM as ints.  Then the LongSum()
code throws exception when meets an int value.  Is it a known issue?

Re: SUM() fails on longs

Posted by da...@ya.ru.
Still don't see my answer in the archive, so I'm resending it.  Sorry
for possible repeat.

This is the code:
onePhrases = load 'onePhrases' as (
   id: chararray, use: bag{t: tuple(date: int, region: int, count: long)});
otherPhrases = load 'otherPhrases' as (
   id: chararray, use: bag{t: tuple(date: int, region: int, count: long)});
a = cogroup onePhrases by id, otherPhrases by id;
b = foreach a generate
   group as id,
   flatten(JoinBagsOfBags(onePhrases.use, otherPhrases.use));
c = foreach b generate
   id,
   use::date as date,
   use::region as region,
   use::count as count;
d = group c by (id, date, region);
e = foreach d generate
   group.id as id,
   group.date as date,
   group.region as region,
   SUM(c.count) as count;

It gives the following stack:
2009-01-07 20:16:28,294 [main] ERROR
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher
- Error message from task (reduce)
task_200901050110_0136_r_000078java.io.IOException: Received Error
while processing the reduce plan: Caught error from UDF
org.apache.pig.builtin.LongSum[null]
       at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:307)
       at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:247)
       at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:224)
       at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:136)
       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:430)
       at org.apache.hadoop.mapred.Child.main(Child.java:155)

The actual data in input files looks as:
!ThkA&se~       {(20080827,2,1),(20080827,84,1)}

The count field values do not have 'L' suffix, they were produced
earlier by COUNT and serialized.

If I use custom sum function, the error goes away:
           if (o instanceof Long)
               sum += (Long)o;
           else if (o instanceof Integer)
               sum += (Integer)o;
           else
               throw

By the way, I'm using COGROUP because UNION fails on this sample:

grunt> a = union onePhrases, otherPhrases;
grunt> describe a
2009-01-07 20:33:59,810 [main] ERROR org.apache.pig.PigServer - Cannot
cast bag with schema use: bag({t: (date: int,region: int,count:
long)}) to tuple with schema tuple
2009-01-07 20:33:59,810 [main] ERROR org.apache.pig.PigServer -
Problem resolving LOForEach schema Cannot cast bag with schema use:
bag({t: (date: int,region: int,count: long)}) to tuple with schema
tuple
2009-01-07 20:33:59,810 [main] ERROR org.apache.pig.PigServer -
Problem while casting inputs of UNION
2009-01-07 20:33:59,810 [main] ERROR org.apache.pig.PigServer - Severe
problem found during validation
org.apache.pig.impl.plan.PlanValidationException: An unexpected
exception caused the validation to stop
2009-01-07 20:33:59,811 [main] ERROR
org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Unable
to describe schema for alias a [Cannot cast bag with schema use:
bag({t: (date: int,region: int,count: long)}) to tuple with schema
tupleProblem resolving LOForEach schema Cannot cast bag with schema
use: bag({t: (date: int,region: int,count: long)}) to tuple with
schema tupleProblem while casting inputs of UNIONSevere problem found
during validation org.apache.pig.impl.plan.PlanValidationException: An
unexpected exception caused the validation to stop]
       at org.apache.pig.PigServer.dumpSchema(PigServer.java:358)
       at org.apache.pig.tools.grunt.GruntParser.processDescribe(GruntParser.java:151)
       at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:188)
       at org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:94)
       at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:58)
       at org.apache.pig.Main.main(Main.java:282)
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: Cannot
cast bag with schema use: bag({t: (date: int,region: int,count:
long)}) to tuple with schema tupleProblem resolving LOForEach schema
Cannot cast bag with schema use: bag({t: (date: int,region: int,count:
long)}) to tuple with schema tupleProblem while casting inputs of
UNIONSevere problem found during validation
org.apache.pig.impl.plan.PlanValidationException: An unexpected
exception caused the validation to stop
       ... 6 more

2009-01-07 20:33:59,811 [main] ERROR
org.apache.pig.tools.grunt.GruntParser - Unable to describe schema for
alias a [Cannot cast bag with schema use: bag({t: (date: int,region:
int,count: long)}) to tuple with schema tupleProblem resolving
LOForEach schema Cannot cast bag with schema use: bag({t: (date:
int,region: int,count: long)}) to tuple with schema tupleProblem while
casting inputs of UNIONSevere problem found during validation
org.apache.pig.impl.plan.PlanValidationException: An unexpected
exception caused the validation to stop]
2009-01-07 20:33:59,811 [main] ERROR
org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Unable
to describe schema for alias a [Cannot cast bag with schema use:
bag({t: (date: int,region: int,count: long)}) to tuple with schema
tupleProblem resolving LOForEach schema Cannot cast bag with schema
use: bag({t: (date: int,region: int,count: long)}) to tuple with
schema tupleProblem while casting inputs of UNIONSevere problem found
during validation org.apache.pig.impl.plan.PlanValidationException: An
unexpected exception caused the validation to stop]


2009/1/6 Alan Gates <ga...@yahoo-inc.com>:
> What load function are you using?  Are you explicitly declaring the type to
> be long?
>
> Alan.
>
> On Jan 5, 2009, at 7:12 PM, <da...@ya.ru> <da...@ya.ru> wrote:
>
>> SUM can fail on a column declared as long. It seems to occur when
>> actual values in the column are small enough for int. They are
>> serialized and probably passed to SUM as ints.  Then the LongSum()
>> code throws exception when meets an int value.  Is it a known issue?
>
>

Re: SUM() fails on longs

Posted by Alan Gates <ga...@yahoo-inc.com>.
What load function are you using?  Are you explicitly declaring the  
type to be long?

Alan.

On Jan 5, 2009, at 7:12 PM, <da...@ya.ru> <da...@ya.ru> wrote:

> SUM can fail on a column declared as long. It seems to occur when
> actual values in the column are small enough for int. They are
> serialized and probably passed to SUM as ints.  Then the LongSum()
> code throws exception when meets an int value.  Is it a known issue?