You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Matt Tanquary <ma...@gmail.com> on 2010/12/20 18:48:59 UTC

Summing Bags

After some grouping, re-tupling, and grouping again, I end up with the
following:

grunt> describe I2L2;
I2L2: {group: (lvl2: {B::lvl2: int}),I2L2_tuple:
{org.apache.pig.piggybank.evaluation.util.totuple_lvl2_18: (lvl2: {B::lvl2:
int}),sum: {A::sum: long}}}

Sample Data:
({(8000),(7682)}),{({(8000),(7682)},{(3),(3)}),({(8000),(7682)},{(213),(213)})}

To finalize my output, I want to pull the group and sum up the A::sum
values.

After my proposed processing the above record would yield:

({(8000),(7682)}),({(216),(216)});

(well, ultimately I'd like to store the data as 8000|7682,216)

I use the following Pig Latin on the I2L2 alias above:
I2L2b = FOREACH I2L2 GENERATE group, SUM($1.sum);
and get an error in the M/R job:
java.lang.ClassCastException: org.apache.pig.data.DefaultDataBag cannot be
cast to java.lang.Long

I have tried other various ways to reference the A::sum values in the SUM()
function, but have had no success.

I hope someone might be able to help me find the proper solution. Thanks!
-M@