You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by pob <pe...@gmail.com> on 2011/04/24 18:02:51 UTC

SUM

x = foreach g2 generate group, data.(size);
dump x;

((drm,0),{(464868)})
((drm,1),{(464868)})
((snezz,0),{(8073),(8073)})

but:
x = foreach g2 generate group, SUM(data.size);




2011-04-24 18:02:18,910 [Thread-793] WARN
 org.apache.hadoop.mapred.LocalJobRunner - job_local_0038
org.apache.pig.backend.executionengine.ExecException: ERROR 2106: Error
while computing sum in Initial
at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:87)
at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:65)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
Caused by: java.lang.ClassCastException: org.apache.pig.data.DataByteArray
cannot be cast to java.lang.Long
at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:79)
... 14 more
2011-04-24 18:02:19,213 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_local_0038
2011-04-24 18:02:19,213 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete
2011-04-24 18:02:24,215 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- job job_local_0038 has failed! Stop running all dependent jobs
2011-04-24 18:02:24,216 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete
2011-04-24 18:02:24,216 [main] ERROR
org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2011-04-24 18:02:24,216 [main] INFO  org.apache.pig.tools.pigstats.PigStats
- Detected Local mode. Stats reported below may be incomplete
2011-04-24 18:02:24,216 [main] INFO  org.apache.pig.tools.pigstats.PigStats
- Script Statistics:




Pig Stack Trace
---------------
ERROR 1066: Unable to open iterator for alias x

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
open iterator for alias x
        at org.apache.pig.PigServer.openIterator(PigServer.java:754)
        at
org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
        at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
        at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
        at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
        at org.apache.pig.Main.run(Main.java:465)
        at org.apache.pig.Main.main(Main.java:107)
Caused by: java.io.IOException: Job terminated with anomalous status FAILED
        at org.apache.pig.PigServer.openIterator(PigServer.java:744)
        ... 7 more

Re: SUM

Posted by pob <pe...@gmail.com>.
Even tho this simple thing doesnt work too:

grunt> z = foreach data generate time+size;


WARN  org.apache.hadoop.mapred.LocalJobRunner - job_local_0004
java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be
cast to java.lang.Float
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.getNext(Add.java:97)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)



2011/4/24 pob <pe...@gmail.com>

> This one doesnt work...
>
>
>
> pom = foreach rows generate myUDF.toTuple($1) AS
> (b:bag{t:tuple(domain:chararray,spam:int,size:long,time:float)});
>
> 2011-04-24 18:40:15,622 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1000: Error during parsing. Encountered "" at line 1, column 50
> Was expecting one of:
>
>
>
> 2011/4/24 Jacob Perkins <ja...@gmail.com>
>
>> Strange, that looks right to me. What happens if you try the 'AS'
>> statement anyhow?
>>
>> --jacob
>> @thedatachef
>>
>> On Sun, 2011-04-24 at 18:22 +0200, pob wrote:
>> > Hello,
>> >
>> > pom = foreach rows generate myUDF.toTuple($1); -- reading data
>> > describe pom
>> > pom: {y: {t: (domain: chararray,spam: int,size: long,time: float)}}
>> >
>> > data = foreach pom generate flatten($0);
>> > grunt> describe data;
>> > data: {y::domain: chararray,y::spam: int,y::size: long,y::time: float}
>> >
>> >
>> > I thing they are casted fine, right?
>> >
>> > UDF is python one with decorator
>> > @outputSchema("y:bag{t:tuple(domain:chararray, spam:int, size:long,
>> > time:float)}")
>> >
>> > Thanks
>> >
>> >
>> >
>> > 2011/4/24 Jacob Perkins <ja...@gmail.com>
>> >
>> > > You're getting a 'ClassCastException' because the contents of the bags
>> > > are DataByteArray and not long (or cannot be cast to long). I suspect
>> > > that you're generating the contents of the bag in some way from a UDF,
>> > > no?
>> > >
>> > > You need to either declare the output schema explicitly in the UDF or
>> > > just use the 'AS' statement. For example, say you have a UDF that sums
>> > > two numbers:
>> > >
>> > > data   = LOAD 'foobar' AS (int:a, int:b);
>> > > summed = FOREACH data GENERATE MyFancySummingUDF(a,b) AS (sum:int);
>> > > DUMP summed;
>> > >
>> > > --jacob
>> > > @thedatachef
>> > >
>> > > On Sun, 2011-04-24 at 18:02 +0200, pob wrote:
>> > > > x = foreach g2 generate group, data.(size);
>> > > > dump x;
>> > > >
>> > > > ((drm,0),{(464868)})
>> > > > ((drm,1),{(464868)})
>> > > > ((snezz,0),{(8073),(8073)})
>> > > >
>> > > > but:
>> > > > x = foreach g2 generate group, SUM(data.size);
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > 2011-04-24 18:02:18,910 [Thread-793] WARN
>> > > >  org.apache.hadoop.mapred.LocalJobRunner - job_local_0038
>> > > > org.apache.pig.backend.executionengine.ExecException: ERROR 2106:
>> Error
>> > > > while computing sum in Initial
>> > > > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:87)
>> > > > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:65)
>> > > > at
>> > > >
>> > >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
>> > > > at
>> > > >
>> > >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273)
>> > > > at
>> > > >
>> > >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
>> > > > at
>> > > >
>> > >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
>> > > > at
>> > > >
>> > >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
>> > > > at
>> > > >
>> > >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
>> > > > at
>> > > >
>> > >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
>> > > > at
>> > > >
>> > >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
>> > > > at
>> > > >
>> > >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
>> > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>> > > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>> > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>> > > > at
>> > >
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>> > > > Caused by: java.lang.ClassCastException:
>> > > org.apache.pig.data.DataByteArray
>> > > > cannot be cast to java.lang.Long
>> > > > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:79)
>> > > > ... 14 more
>> > > > 2011-04-24 18:02:19,213 [main] INFO
>> > > >
>> > >
>>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> > > > - HadoopJobId: job_local_0038
>> > > > 2011-04-24 18:02:19,213 [main] INFO
>> > > >
>> > >
>>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> > > > - 0% complete
>> > > > 2011-04-24 18:02:24,215 [main] INFO
>> > > >
>> > >
>>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> > > > - job job_local_0038 has failed! Stop running all dependent jobs
>> > > > 2011-04-24 18:02:24,216 [main] INFO
>> > > >
>> > >
>>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> > > > - 100% complete
>> > > > 2011-04-24 18:02:24,216 [main] ERROR
>> > > > org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s)
>> failed!
>> > > > 2011-04-24 18:02:24,216 [main] INFO
>> > >  org.apache.pig.tools.pigstats.PigStats
>> > > > - Detected Local mode. Stats reported below may be incomplete
>> > > > 2011-04-24 18:02:24,216 [main] INFO
>> > >  org.apache.pig.tools.pigstats.PigStats
>> > > > - Script Statistics:
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > Pig Stack Trace
>> > > > ---------------
>> > > > ERROR 1066: Unable to open iterator for alias x
>> > > >
>> > > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066:
>> Unable to
>> > > > open iterator for alias x
>> > > >         at org.apache.pig.PigServer.openIterator(PigServer.java:754)
>> > > >         at
>> > > >
>> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
>> > > >         at
>> > > >
>> > >
>> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
>> > > >         at
>> > > >
>> > >
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>> > > >         at
>> > > >
>> > >
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
>> > > >         at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
>> > > >         at org.apache.pig.Main.run(Main.java:465)
>> > > >         at org.apache.pig.Main.main(Main.java:107)
>> > > > Caused by: java.io.IOException: Job terminated with anomalous status
>> > > FAILED
>> > > >         at org.apache.pig.PigServer.openIterator(PigServer.java:744)
>> > > >         ... 7 more
>> > >
>> > >
>> > >
>>
>>
>>
>

Re: SUM

Posted by pob <pe...@gmail.com>.
This one doesnt work...



pom = foreach rows generate myUDF.toTuple($1) AS
(b:bag{t:tuple(domain:chararray,spam:int,size:long,time:float)});

2011-04-24 18:40:15,622 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1000: Error during parsing. Encountered "" at line 1, column 50
Was expecting one of:



2011/4/24 Jacob Perkins <ja...@gmail.com>

> Strange, that looks right to me. What happens if you try the 'AS'
> statement anyhow?
>
> --jacob
> @thedatachef
>
> On Sun, 2011-04-24 at 18:22 +0200, pob wrote:
> > Hello,
> >
> > pom = foreach rows generate myUDF.toTuple($1); -- reading data
> > describe pom
> > pom: {y: {t: (domain: chararray,spam: int,size: long,time: float)}}
> >
> > data = foreach pom generate flatten($0);
> > grunt> describe data;
> > data: {y::domain: chararray,y::spam: int,y::size: long,y::time: float}
> >
> >
> > I thing they are casted fine, right?
> >
> > UDF is python one with decorator
> > @outputSchema("y:bag{t:tuple(domain:chararray, spam:int, size:long,
> > time:float)}")
> >
> > Thanks
> >
> >
> >
> > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> >
> > > You're getting a 'ClassCastException' because the contents of the bags
> > > are DataByteArray and not long (or cannot be cast to long). I suspect
> > > that you're generating the contents of the bag in some way from a UDF,
> > > no?
> > >
> > > You need to either declare the output schema explicitly in the UDF or
> > > just use the 'AS' statement. For example, say you have a UDF that sums
> > > two numbers:
> > >
> > > data   = LOAD 'foobar' AS (int:a, int:b);
> > > summed = FOREACH data GENERATE MyFancySummingUDF(a,b) AS (sum:int);
> > > DUMP summed;
> > >
> > > --jacob
> > > @thedatachef
> > >
> > > On Sun, 2011-04-24 at 18:02 +0200, pob wrote:
> > > > x = foreach g2 generate group, data.(size);
> > > > dump x;
> > > >
> > > > ((drm,0),{(464868)})
> > > > ((drm,1),{(464868)})
> > > > ((snezz,0),{(8073),(8073)})
> > > >
> > > > but:
> > > > x = foreach g2 generate group, SUM(data.size);
> > > >
> > > >
> > > >
> > > >
> > > > 2011-04-24 18:02:18,910 [Thread-793] WARN
> > > >  org.apache.hadoop.mapred.LocalJobRunner - job_local_0038
> > > > org.apache.pig.backend.executionengine.ExecException: ERROR 2106:
> Error
> > > > while computing sum in Initial
> > > > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:87)
> > > > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:65)
> > > > at
> > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
> > > > at
> > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273)
> > > > at
> > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
> > > > at
> > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> > > > at
> > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
> > > > at
> > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
> > > > at
> > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> > > > at
> > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> > > > at
> > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > > > at
> > >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> > > > Caused by: java.lang.ClassCastException:
> > > org.apache.pig.data.DataByteArray
> > > > cannot be cast to java.lang.Long
> > > > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:79)
> > > > ... 14 more
> > > > 2011-04-24 18:02:19,213 [main] INFO
> > > >
> > >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > - HadoopJobId: job_local_0038
> > > > 2011-04-24 18:02:19,213 [main] INFO
> > > >
> > >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > - 0% complete
> > > > 2011-04-24 18:02:24,215 [main] INFO
> > > >
> > >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > - job job_local_0038 has failed! Stop running all dependent jobs
> > > > 2011-04-24 18:02:24,216 [main] INFO
> > > >
> > >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > - 100% complete
> > > > 2011-04-24 18:02:24,216 [main] ERROR
> > > > org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s)
> failed!
> > > > 2011-04-24 18:02:24,216 [main] INFO
> > >  org.apache.pig.tools.pigstats.PigStats
> > > > - Detected Local mode. Stats reported below may be incomplete
> > > > 2011-04-24 18:02:24,216 [main] INFO
> > >  org.apache.pig.tools.pigstats.PigStats
> > > > - Script Statistics:
> > > >
> > > >
> > > >
> > > >
> > > > Pig Stack Trace
> > > > ---------------
> > > > ERROR 1066: Unable to open iterator for alias x
> > > >
> > > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066:
> Unable to
> > > > open iterator for alias x
> > > >         at org.apache.pig.PigServer.openIterator(PigServer.java:754)
> > > >         at
> > > >
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
> > > >         at
> > > >
> > >
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
> > > >         at
> > > >
> > >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> > > >         at
> > > >
> > >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> > > >         at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> > > >         at org.apache.pig.Main.run(Main.java:465)
> > > >         at org.apache.pig.Main.main(Main.java:107)
> > > > Caused by: java.io.IOException: Job terminated with anomalous status
> > > FAILED
> > > >         at org.apache.pig.PigServer.openIterator(PigServer.java:744)
> > > >         ... 7 more
> > >
> > >
> > >
>
>
>

Re: SUM

Posted by pob <pe...@gmail.com>.
If think if i switch to 0.9 something another gets broken with Cassandra

2011/4/24 Dmitriy Ryaboy <dv...@gmail.com>

> I think it's the deep-casting issue from
> https://issues.apache.org/jira/browse/PIG-1758 .
> Should work in 0.9 but didn't get into 0.8 or 0.8.1
>
> D
>
> On Sun, Apr 24, 2011 at 9:52 AM, pob <pe...@gmail.com> wrote:
>
> > Thats stramge, pygmalion works fine (but there are any numerical
> > operations).
> >
> > I think Im using C* 0.7.5 where it suppose to be patched ;/ so idk :(
> >
> >
> > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> >
> > > That changes things entirely. There's some weirdness in the way data is
> > > read from Cassandra. Have you applied the latest patches (eg.
> > > https://issues.apache.org/jira/browse/CASSANDRA-2387) ?
> > >
> > > See also some UDFs for working with Cassandra data that Jeremy Hanna
> > > (@jeromatron) wrote:
> > >
> > > https://github.com/jeromatron/pygmalion
> > >
> > >
> > > Best of luck!
> > >
> > > --jacob
> > > @thedatachef
> > >
> > > On Sun, 2011-04-24 at 18:31 +0200, pob wrote:
> > > > Maybe I forget one more thing, rows are taken from Cassandra.
> > > >
> > > > rows = LOAD 'cassandra://emailArchive/messagesMetaData' USING
> > > > CassandraStorage() AS (key, columns: bag {T: tuple(name, value)});
> > > >
> > > > I have no idea how to format AS for bag in foreach.
> > > >
> > > >
> > > > P.
> > > >
> > > > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> > > >
> > > > > Strange, that looks right to me. What happens if you try the 'AS'
> > > > > statement anyhow?
> > > > >
> > > > > --jacob
> > > > > @thedatachef
> > > > >
> > > > > On Sun, 2011-04-24 at 18:22 +0200, pob wrote:
> > > > > > Hello,
> > > > > >
> > > > > > pom = foreach rows generate myUDF.toTuple($1); -- reading data
> > > > > > describe pom
> > > > > > pom: {y: {t: (domain: chararray,spam: int,size: long,time:
> float)}}
> > > > > >
> > > > > > data = foreach pom generate flatten($0);
> > > > > > grunt> describe data;
> > > > > > data: {y::domain: chararray,y::spam: int,y::size: long,y::time:
> > > float}
> > > > > >
> > > > > >
> > > > > > I thing they are casted fine, right?
> > > > > >
> > > > > > UDF is python one with decorator
> > > > > > @outputSchema("y:bag{t:tuple(domain:chararray, spam:int,
> size:long,
> > > > > > time:float)}")
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > >
> > > > > >
> > > > > > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> > > > > >
> > > > > > > You're getting a 'ClassCastException' because the contents of
> the
> > > bags
> > > > > > > are DataByteArray and not long (or cannot be cast to long). I
> > > suspect
> > > > > > > that you're generating the contents of the bag in some way from
> a
> > > UDF,
> > > > > > > no?
> > > > > > >
> > > > > > > You need to either declare the output schema explicitly in the
> > UDF
> > > or
> > > > > > > just use the 'AS' statement. For example, say you have a UDF
> that
> > > sums
> > > > > > > two numbers:
> > > > > > >
> > > > > > > data   = LOAD 'foobar' AS (int:a, int:b);
> > > > > > > summed = FOREACH data GENERATE MyFancySummingUDF(a,b) AS
> > (sum:int);
> > > > > > > DUMP summed;
> > > > > > >
> > > > > > > --jacob
> > > > > > > @thedatachef
> > > > > > >
> > > > > > > On Sun, 2011-04-24 at 18:02 +0200, pob wrote:
> > > > > > > > x = foreach g2 generate group, data.(size);
> > > > > > > > dump x;
> > > > > > > >
> > > > > > > > ((drm,0),{(464868)})
> > > > > > > > ((drm,1),{(464868)})
> > > > > > > > ((snezz,0),{(8073),(8073)})
> > > > > > > >
> > > > > > > > but:
> > > > > > > > x = foreach g2 generate group, SUM(data.size);
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > 2011-04-24 18:02:18,910 [Thread-793] WARN
> > > > > > > >  org.apache.hadoop.mapred.LocalJobRunner - job_local_0038
> > > > > > > > org.apache.pig.backend.executionengine.ExecException: ERROR
> > 2106:
> > > > > Error
> > > > > > > > while computing sum in Initial
> > > > > > > > at
> org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:87)
> > > > > > > > at
> org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:65)
> > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
> > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273)
> > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
> > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
> > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
> > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> > > > > > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > > > > > > at
> > > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> > > > > > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > > > > > > > at
> > > > > > >
> > > > >
> > >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> > > > > > > > Caused by: java.lang.ClassCastException:
> > > > > > > org.apache.pig.data.DataByteArray
> > > > > > > > cannot be cast to java.lang.Long
> > > > > > > > at
> org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:79)
> > > > > > > > ... 14 more
> > > > > > > > 2011-04-24 18:02:19,213 [main] INFO
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > - HadoopJobId: job_local_0038
> > > > > > > > 2011-04-24 18:02:19,213 [main] INFO
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > - 0% complete
> > > > > > > > 2011-04-24 18:02:24,215 [main] INFO
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > - job job_local_0038 has failed! Stop running all dependent
> > jobs
> > > > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > - 100% complete
> > > > > > > > 2011-04-24 18:02:24,216 [main] ERROR
> > > > > > > > org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce
> > job(s)
> > > > > failed!
> > > > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > > >  org.apache.pig.tools.pigstats.PigStats
> > > > > > > > - Detected Local mode. Stats reported below may be incomplete
> > > > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > > >  org.apache.pig.tools.pigstats.PigStats
> > > > > > > > - Script Statistics:
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Pig Stack Trace
> > > > > > > > ---------------
> > > > > > > > ERROR 1066: Unable to open iterator for alias x
> > > > > > > >
> > > > > > > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR
> 1066:
> > > > > Unable to
> > > > > > > > open iterator for alias x
> > > > > > > >         at
> > > org.apache.pig.PigServer.openIterator(PigServer.java:754)
> > > > > > > >         at
> > > > > > > >
> > > > >
> > >
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
> > > > > > > >         at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
> > > > > > > >         at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> > > > > > > >         at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> > > > > > > >         at
> org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> > > > > > > >         at org.apache.pig.Main.run(Main.java:465)
> > > > > > > >         at org.apache.pig.Main.main(Main.java:107)
> > > > > > > > Caused by: java.io.IOException: Job terminated with anomalous
> > > status
> > > > > > > FAILED
> > > > > > > >         at
> > > org.apache.pig.PigServer.openIterator(PigServer.java:744)
> > > > > > > >         ... 7 more
> > > > > > >
> > > > > > >
> > > > > > >
> > > > >
> > > > >
> > > > >
> > >
> > >
> > >
> >
>

Re: SUM

Posted by Jeremy Hanna <je...@gmail.com>.
Looks like you just need a validation class in Cassandra and it should work.  That is, look at the column_metadata for that column below.

For example here's the script to create the data on the cassandra cli:
create keyspace pygmalion;
use pygmalion;
create column family account with comparator = UTF8Type and default_validation_class = UTF8Type with column_metadata=[{column_name: num_heads, validation_class: LongType}];
create column family betelgeuse with comparator = UTF8Type and default_validation_class = UTF8Type;

set account['hipcat']['first_name'] = 'Zaphod';
set account['hipcat']['last_name'] = 'Beeblebrox';
set account['hipcat']['birth_place'] = 'Betelgeuse Five';
set account['hipcat']['num_heads'] = '2';

set account['hoopyfrood']['first_name'] = 'Ford';
set account['hoopyfrood']['last_name'] = 'Prefect';
set account['hoopyfrood']['birth_place'] = 'Betelgeuse Five';
set account['hoopyfrood']['num_heads'] = '1';

set account['earthman']['first_name'] = 'Arthur';
set account['earthman']['last_name'] = 'Dent';
set account['earthman']['birth_place'] = 'Earth';
set account['earthman']['num_heads'] = '1';


And here's the pig script:
register '/Users/jeremyhanna/Work/pygmalion/udf/target/pygmalion-1.0.0-SNAPSHOT.jar';
raw =  LOAD 'cassandra://pygmalion/account' USING CassandraStorage() AS (key:chararray, columns:bag {column:tuple (name, value)});
rows = FOREACH raw GENERATE key, FLATTEN(org.pygmalion.udf.FromCassandraBag('first_name, last_name, birth_place, num_heads', columns)) AS (
    first_name:chararray,
    last_name:chararray,
    birth_place:chararray,
    num_heads:long
);
b = group rows by key;
x = foreach b generate group, SUM(rows.num_heads);
dump x;


That works and returns:
(hipcat,2)
(earthman,1)
(hoopyfrood,1)

That should work the same with your python UDF.

On Apr 25, 2011, at 9:59 AM, Jeremy Hanna wrote:

> Sorry - I've been kind of out of it this weekend.  Talking about it on IRC.  What I'd like to do is get a small set of data and a script that can reproduce what you're trying to do and then try various things in my own environment.  That way we can more easily log a Cassandra ticket if it can't be worked into what's currently there.  I'll respond to this thread when we have something to go forward with.
> 
> On Apr 24, 2011, at 3:28 PM, Dmitriy Ryaboy wrote:
> 
>> Sigh. @jeromatron , @thedatachef -- this one's on you :). Toldya you need
>> the LoadCaster...
>> 
>> 
>> D
>> 
>> On Sun, Apr 24, 2011 at 1:17 PM, pob <pe...@gmail.com> wrote:
>> 
>>> hello,
>>> 
>>> thanks but w/out sucess ;/
>>> 
>>> 
>>> grunt> pom = foreach rows generate myUDF.toTuple($1);
>>> grunt> describe pom
>>> pom: {y: {t: (domain: bytearray,spam: bytearray,size: bytearray,time:
>>> bytearray)}}
>>> grunt> data = foreach pom generate flatten($0) as (domain, spam, size,
>>> time);
>>> grunt> data = foreach data generate (chararray) domain, (int) spam, (long)
>>> size,
>>>>> (float) time;
>>> grunt> describe data;
>>> data: {domain: chararray,spam: int,size: long,time: float}
>>> 
>>> z = foreach data generate time+size;
>>> 
>>> 
>>> org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received
>>> a
>>> bytearray from the UDF. Cannot determine how to convert the bytearray to
>>> float.
>>> at
>>> 
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:529)
>>> at
>>> 
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.getNext(Add.java:92)
>>> at
>>> 
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
>>> at
>>> 
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
>>> at
>>> 
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
>>> at
>>> 
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
>>> at
>>> 
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>>> 2011-04-24 22:16:06,129 [main] INFO
>>> 
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>> - job job_local_0001 has failed! Stop running all dependent jobs
>>> 
>>> 
>>> 
>>> 
>>> z = foreach data generate time
>>> 
>>> 
>>> org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received
>>> a
>>> bytearray from the UDF. Cannot determine how to convert the bytearray to
>>> float.
>>> at
>>> 
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:529)
>>> at
>>> 
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
>>> at
>>> 
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
>>> at
>>> 
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
>>> at
>>> 
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
>>> at
>>> 
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>>> 
>>> 
>>> 2011/4/24 Dmitriy Ryaboy <dv...@gmail.com>
>>> 
>>>> Try this:
>>>> 
>>>> data = foreach pom generate flatten($0) as (domain, spam, size, time);
>>>> data = foreach data generate (chararray) domain, (int) spam, (long) size,
>>>> (float) time;
>>>> 
>>>> Pig is inconsistent in what "as foo:type" does vs " (type) foo"
>>>> 
>>>> D
>>>> 
>>>> On Sun, Apr 24, 2011 at 10:44 AM, pob <pe...@gmail.com> wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> but why i cant re-cast it during flatten?
>>>>> 
>>>>> 
>>>>> data = foreach pom generate flatten($0) AS (domain:chararray, spam:int,
>>>>> size:long, time:float);
>>>>> 
>>>>> grunt> z = foreach data generate time+size;
>>>>> 
>>>>> 
>>>>> java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot
>>> be
>>>>> cast to java.lang.Float
>>>>> at
>>>>> 
>>>>> 
>>>> 
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.getNext(Add.java:97)
>>>>> at
>>>>> 
>>>>> 
>>>> 
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
>>>>> at
>>>>> 
>>>>> 
>>>> 
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
>>>>> at
>>>>> 
>>>>> 
>>>> 
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
>>>>> at
>>>>> 
>>>>> 
>>>> 
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
>>>>> at
>>>>> 
>>>>> 
>>>> 
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
>>>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>>>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>>> at
>>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>>>>> 
>>>>> 
>>>>> 
>>>>> 2011/4/24 Dmitriy Ryaboy <dv...@gmail.com>
>>>>> 
>>>>>> I think it's the deep-casting issue from
>>>>>> https://issues.apache.org/jira/browse/PIG-1758 .
>>>>>> Should work in 0.9 but didn't get into 0.8 or 0.8.1
>>>>>> 
>>>>>> D
>>>>>> 
>>>>>> On Sun, Apr 24, 2011 at 9:52 AM, pob <pe...@gmail.com> wrote:
>>>>>> 
>>>>>>> Thats stramge, pygmalion works fine (but there are any numerical
>>>>>>> operations).
>>>>>>> 
>>>>>>> I think Im using C* 0.7.5 where it suppose to be patched ;/ so idk
>>> :(
>>>>>>> 
>>>>>>> 
>>>>>>> 2011/4/24 Jacob Perkins <ja...@gmail.com>
>>>>>>> 
>>>>>>>> That changes things entirely. There's some weirdness in the way
>>>> data
>>>>> is
>>>>>>>> read from Cassandra. Have you applied the latest patches (eg.
>>>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-2387) ?
>>>>>>>> 
>>>>>>>> See also some UDFs for working with Cassandra data that Jeremy
>>>> Hanna
>>>>>>>> (@jeromatron) wrote:
>>>>>>>> 
>>>>>>>> https://github.com/jeromatron/pygmalion
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Best of luck!
>>>>>>>> 
>>>>>>>> --jacob
>>>>>>>> @thedatachef
>>>>>>>> 
>>>>>>>> On Sun, 2011-04-24 at 18:31 +0200, pob wrote:
>>>>>>>>> Maybe I forget one more thing, rows are taken from Cassandra.
>>>>>>>>> 
>>>>>>>>> rows = LOAD 'cassandra://emailArchive/messagesMetaData' USING
>>>>>>>>> CassandraStorage() AS (key, columns: bag {T: tuple(name,
>>>> value)});
>>>>>>>>> 
>>>>>>>>> I have no idea how to format AS for bag in foreach.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> P.
>>>>>>>>> 
>>>>>>>>> 2011/4/24 Jacob Perkins <ja...@gmail.com>
>>>>>>>>> 
>>>>>>>>>> Strange, that looks right to me. What happens if you try the
>>>> 'AS'
>>>>>>>>>> statement anyhow?
>>>>>>>>>> 
>>>>>>>>>> --jacob
>>>>>>>>>> @thedatachef
>>>>>>>>>> 
>>>>>>>>>> On Sun, 2011-04-24 at 18:22 +0200, pob wrote:
>>>>>>>>>>> Hello,
>>>>>>>>>>> 
>>>>>>>>>>> pom = foreach rows generate myUDF.toTuple($1); -- reading
>>>> data
>>>>>>>>>>> describe pom
>>>>>>>>>>> pom: {y: {t: (domain: chararray,spam: int,size: long,time:
>>>>>> float)}}
>>>>>>>>>>> 
>>>>>>>>>>> data = foreach pom generate flatten($0);
>>>>>>>>>>> grunt> describe data;
>>>>>>>>>>> data: {y::domain: chararray,y::spam: int,y::size:
>>>> long,y::time:
>>>>>>>> float}
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> I thing they are casted fine, right?
>>>>>>>>>>> 
>>>>>>>>>>> UDF is python one with decorator
>>>>>>>>>>> @outputSchema("y:bag{t:tuple(domain:chararray, spam:int,
>>>>>> size:long,
>>>>>>>>>>> time:float)}")
>>>>>>>>>>> 
>>>>>>>>>>> Thanks
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 2011/4/24 Jacob Perkins <ja...@gmail.com>
>>>>>>>>>>> 
>>>>>>>>>>>> You're getting a 'ClassCastException' because the
>>> contents
>>>> of
>>>>>> the
>>>>>>>> bags
>>>>>>>>>>>> are DataByteArray and not long (or cannot be cast to
>>> long).
>>>> I
>>>>>>>> suspect
>>>>>>>>>>>> that you're generating the contents of the bag in some
>>> way
>>>>> from
>>>>>> a
>>>>>>>> UDF,
>>>>>>>>>>>> no?
>>>>>>>>>>>> 
>>>>>>>>>>>> You need to either declare the output schema explicitly
>>> in
>>>>> the
>>>>>>> UDF
>>>>>>>> or
>>>>>>>>>>>> just use the 'AS' statement. For example, say you have a
>>>> UDF
>>>>>> that
>>>>>>>> sums
>>>>>>>>>>>> two numbers:
>>>>>>>>>>>> 
>>>>>>>>>>>> data   = LOAD 'foobar' AS (int:a, int:b);
>>>>>>>>>>>> summed = FOREACH data GENERATE MyFancySummingUDF(a,b) AS
>>>>>>> (sum:int);
>>>>>>>>>>>> DUMP summed;
>>>>>>>>>>>> 
>>>>>>>>>>>> --jacob
>>>>>>>>>>>> @thedatachef
>>>>>>>>>>>> 
>>>>>>>>>>>> On Sun, 2011-04-24 at 18:02 +0200, pob wrote:
>>>>>>>>>>>>> x = foreach g2 generate group, data.(size);
>>>>>>>>>>>>> dump x;
>>>>>>>>>>>>> 
>>>>>>>>>>>>> ((drm,0),{(464868)})
>>>>>>>>>>>>> ((drm,1),{(464868)})
>>>>>>>>>>>>> ((snezz,0),{(8073),(8073)})
>>>>>>>>>>>>> 
>>>>>>>>>>>>> but:
>>>>>>>>>>>>> x = foreach g2 generate group, SUM(data.size);
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 2011-04-24 18:02:18,910 [Thread-793] WARN
>>>>>>>>>>>>> org.apache.hadoop.mapred.LocalJobRunner -
>>> job_local_0038
>>>>>>>>>>>>> org.apache.pig.backend.executionengine.ExecException:
>>>> ERROR
>>>>>>> 2106:
>>>>>>>>>> Error
>>>>>>>>>>>>> while computing sum in Initial
>>>>>>>>>>>>> at
>>>>>> org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:87)
>>>>>>>>>>>>> at
>>>>>> org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:65)
>>>>>>>>>>>>> at
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
>>>>>>>>>>>>> at
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273)
>>>>>>>>>>>>> at
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
>>>>>>>>>>>>> at
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
>>>>>>>>>>>>> at
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
>>>>>>>>>>>>> at
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
>>>>>>>>>>>>> at
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
>>>>>>>>>>>>> at
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
>>>>>>>>>>>>> at
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
>>>>>>>>>>>>> at
>>>> org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>>>>>>>>>>>>> at
>>>>>>>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>>>>>>>>>>> at
>>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>>>>>>>>>>> at
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>>>>>>>>>>>>> Caused by: java.lang.ClassCastException:
>>>>>>>>>>>> org.apache.pig.data.DataByteArray
>>>>>>>>>>>>> cannot be cast to java.lang.Long
>>>>>>>>>>>>> at
>>>>>> org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:79)
>>>>>>>>>>>>> ... 14 more
>>>>>>>>>>>>> 2011-04-24 18:02:19,213 [main] INFO
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>>>>>>> - HadoopJobId: job_local_0038
>>>>>>>>>>>>> 2011-04-24 18:02:19,213 [main] INFO
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>>>>>>> - 0% complete
>>>>>>>>>>>>> 2011-04-24 18:02:24,215 [main] INFO
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>>>>>>> - job job_local_0038 has failed! Stop running all
>>>> dependent
>>>>>>> jobs
>>>>>>>>>>>>> 2011-04-24 18:02:24,216 [main] INFO
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>>>>>>> - 100% complete
>>>>>>>>>>>>> 2011-04-24 18:02:24,216 [main] ERROR
>>>>>>>>>>>>> org.apache.pig.tools.pigstats.PigStatsUtil - 1 map
>>> reduce
>>>>>>> job(s)
>>>>>>>>>> failed!
>>>>>>>>>>>>> 2011-04-24 18:02:24,216 [main] INFO
>>>>>>>>>>>> org.apache.pig.tools.pigstats.PigStats
>>>>>>>>>>>>> - Detected Local mode. Stats reported below may be
>>>>> incomplete
>>>>>>>>>>>>> 2011-04-24 18:02:24,216 [main] INFO
>>>>>>>>>>>> org.apache.pig.tools.pigstats.PigStats
>>>>>>>>>>>>> - Script Statistics:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Pig Stack Trace
>>>>>>>>>>>>> ---------------
>>>>>>>>>>>>> ERROR 1066: Unable to open iterator for alias x
>>>>>>>>>>>>> 
>>>>>>>>>>>>> org.apache.pig.impl.logicalLayer.FrontendException:
>>> ERROR
>>>>>> 1066:
>>>>>>>>>> Unable to
>>>>>>>>>>>>> open iterator for alias x
>>>>>>>>>>>>>       at
>>>>>>>> org.apache.pig.PigServer.openIterator(PigServer.java:754)
>>>>>>>>>>>>>       at
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
>>>>>>>>>>>>>       at
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
>>>>>>>>>>>>>       at
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>>>>>>>>>>>>>       at
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
>>>>>>>>>>>>>       at
>>>>>> org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
>>>>>>>>>>>>>       at org.apache.pig.Main.run(Main.java:465)
>>>>>>>>>>>>>       at org.apache.pig.Main.main(Main.java:107)
>>>>>>>>>>>>> Caused by: java.io.IOException: Job terminated with
>>>>> anomalous
>>>>>>>> status
>>>>>>>>>>>> FAILED
>>>>>>>>>>>>>       at
>>>>>>>> org.apache.pig.PigServer.openIterator(PigServer.java:744)
>>>>>>>>>>>>>       ... 7 more
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
> 


Re: SUM

Posted by Jeremy Hanna <je...@gmail.com>.
Sorry - I've been kind of out of it this weekend.  Talking about it on IRC.  What I'd like to do is get a small set of data and a script that can reproduce what you're trying to do and then try various things in my own environment.  That way we can more easily log a Cassandra ticket if it can't be worked into what's currently there.  I'll respond to this thread when we have something to go forward with.

On Apr 24, 2011, at 3:28 PM, Dmitriy Ryaboy wrote:

> Sigh. @jeromatron , @thedatachef -- this one's on you :). Toldya you need
> the LoadCaster...
> 
> 
> D
> 
> On Sun, Apr 24, 2011 at 1:17 PM, pob <pe...@gmail.com> wrote:
> 
>> hello,
>> 
>> thanks but w/out sucess ;/
>> 
>> 
>> grunt> pom = foreach rows generate myUDF.toTuple($1);
>> grunt> describe pom
>> pom: {y: {t: (domain: bytearray,spam: bytearray,size: bytearray,time:
>> bytearray)}}
>> grunt> data = foreach pom generate flatten($0) as (domain, spam, size,
>> time);
>> grunt> data = foreach data generate (chararray) domain, (int) spam, (long)
>> size,
>>>> (float) time;
>> grunt> describe data;
>> data: {domain: chararray,spam: int,size: long,time: float}
>> 
>> z = foreach data generate time+size;
>> 
>> 
>> org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received
>> a
>> bytearray from the UDF. Cannot determine how to convert the bytearray to
>> float.
>> at
>> 
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:529)
>> at
>> 
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.getNext(Add.java:92)
>> at
>> 
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
>> at
>> 
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
>> at
>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
>> at
>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
>> at
>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>> 2011-04-24 22:16:06,129 [main] INFO
>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> - job job_local_0001 has failed! Stop running all dependent jobs
>> 
>> 
>> 
>> 
>> z = foreach data generate time
>> 
>> 
>> org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received
>> a
>> bytearray from the UDF. Cannot determine how to convert the bytearray to
>> float.
>> at
>> 
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:529)
>> at
>> 
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
>> at
>> 
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
>> at
>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
>> at
>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
>> at
>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>> 
>> 
>> 2011/4/24 Dmitriy Ryaboy <dv...@gmail.com>
>> 
>>> Try this:
>>> 
>>> data = foreach pom generate flatten($0) as (domain, spam, size, time);
>>> data = foreach data generate (chararray) domain, (int) spam, (long) size,
>>> (float) time;
>>> 
>>> Pig is inconsistent in what "as foo:type" does vs " (type) foo"
>>> 
>>> D
>>> 
>>> On Sun, Apr 24, 2011 at 10:44 AM, pob <pe...@gmail.com> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> but why i cant re-cast it during flatten?
>>>> 
>>>> 
>>>> data = foreach pom generate flatten($0) AS (domain:chararray, spam:int,
>>>> size:long, time:float);
>>>> 
>>>> grunt> z = foreach data generate time+size;
>>>> 
>>>> 
>>>> java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot
>> be
>>>> cast to java.lang.Float
>>>> at
>>>> 
>>>> 
>>> 
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.getNext(Add.java:97)
>>>> at
>>>> 
>>>> 
>>> 
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
>>>> at
>>>> 
>>>> 
>>> 
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
>>>> at
>>>> 
>>>> 
>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
>>>> at
>>>> 
>>>> 
>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
>>>> at
>>>> 
>>>> 
>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
>>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>> at
>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>>>> 
>>>> 
>>>> 
>>>> 2011/4/24 Dmitriy Ryaboy <dv...@gmail.com>
>>>> 
>>>>> I think it's the deep-casting issue from
>>>>> https://issues.apache.org/jira/browse/PIG-1758 .
>>>>> Should work in 0.9 but didn't get into 0.8 or 0.8.1
>>>>> 
>>>>> D
>>>>> 
>>>>> On Sun, Apr 24, 2011 at 9:52 AM, pob <pe...@gmail.com> wrote:
>>>>> 
>>>>>> Thats stramge, pygmalion works fine (but there are any numerical
>>>>>> operations).
>>>>>> 
>>>>>> I think Im using C* 0.7.5 where it suppose to be patched ;/ so idk
>> :(
>>>>>> 
>>>>>> 
>>>>>> 2011/4/24 Jacob Perkins <ja...@gmail.com>
>>>>>> 
>>>>>>> That changes things entirely. There's some weirdness in the way
>>> data
>>>> is
>>>>>>> read from Cassandra. Have you applied the latest patches (eg.
>>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-2387) ?
>>>>>>> 
>>>>>>> See also some UDFs for working with Cassandra data that Jeremy
>>> Hanna
>>>>>>> (@jeromatron) wrote:
>>>>>>> 
>>>>>>> https://github.com/jeromatron/pygmalion
>>>>>>> 
>>>>>>> 
>>>>>>> Best of luck!
>>>>>>> 
>>>>>>> --jacob
>>>>>>> @thedatachef
>>>>>>> 
>>>>>>> On Sun, 2011-04-24 at 18:31 +0200, pob wrote:
>>>>>>>> Maybe I forget one more thing, rows are taken from Cassandra.
>>>>>>>> 
>>>>>>>> rows = LOAD 'cassandra://emailArchive/messagesMetaData' USING
>>>>>>>> CassandraStorage() AS (key, columns: bag {T: tuple(name,
>>> value)});
>>>>>>>> 
>>>>>>>> I have no idea how to format AS for bag in foreach.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> P.
>>>>>>>> 
>>>>>>>> 2011/4/24 Jacob Perkins <ja...@gmail.com>
>>>>>>>> 
>>>>>>>>> Strange, that looks right to me. What happens if you try the
>>> 'AS'
>>>>>>>>> statement anyhow?
>>>>>>>>> 
>>>>>>>>> --jacob
>>>>>>>>> @thedatachef
>>>>>>>>> 
>>>>>>>>> On Sun, 2011-04-24 at 18:22 +0200, pob wrote:
>>>>>>>>>> Hello,
>>>>>>>>>> 
>>>>>>>>>> pom = foreach rows generate myUDF.toTuple($1); -- reading
>>> data
>>>>>>>>>> describe pom
>>>>>>>>>> pom: {y: {t: (domain: chararray,spam: int,size: long,time:
>>>>> float)}}
>>>>>>>>>> 
>>>>>>>>>> data = foreach pom generate flatten($0);
>>>>>>>>>> grunt> describe data;
>>>>>>>>>> data: {y::domain: chararray,y::spam: int,y::size:
>>> long,y::time:
>>>>>>> float}
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> I thing they are casted fine, right?
>>>>>>>>>> 
>>>>>>>>>> UDF is python one with decorator
>>>>>>>>>> @outputSchema("y:bag{t:tuple(domain:chararray, spam:int,
>>>>> size:long,
>>>>>>>>>> time:float)}")
>>>>>>>>>> 
>>>>>>>>>> Thanks
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 2011/4/24 Jacob Perkins <ja...@gmail.com>
>>>>>>>>>> 
>>>>>>>>>>> You're getting a 'ClassCastException' because the
>> contents
>>> of
>>>>> the
>>>>>>> bags
>>>>>>>>>>> are DataByteArray and not long (or cannot be cast to
>> long).
>>> I
>>>>>>> suspect
>>>>>>>>>>> that you're generating the contents of the bag in some
>> way
>>>> from
>>>>> a
>>>>>>> UDF,
>>>>>>>>>>> no?
>>>>>>>>>>> 
>>>>>>>>>>> You need to either declare the output schema explicitly
>> in
>>>> the
>>>>>> UDF
>>>>>>> or
>>>>>>>>>>> just use the 'AS' statement. For example, say you have a
>>> UDF
>>>>> that
>>>>>>> sums
>>>>>>>>>>> two numbers:
>>>>>>>>>>> 
>>>>>>>>>>> data   = LOAD 'foobar' AS (int:a, int:b);
>>>>>>>>>>> summed = FOREACH data GENERATE MyFancySummingUDF(a,b) AS
>>>>>> (sum:int);
>>>>>>>>>>> DUMP summed;
>>>>>>>>>>> 
>>>>>>>>>>> --jacob
>>>>>>>>>>> @thedatachef
>>>>>>>>>>> 
>>>>>>>>>>> On Sun, 2011-04-24 at 18:02 +0200, pob wrote:
>>>>>>>>>>>> x = foreach g2 generate group, data.(size);
>>>>>>>>>>>> dump x;
>>>>>>>>>>>> 
>>>>>>>>>>>> ((drm,0),{(464868)})
>>>>>>>>>>>> ((drm,1),{(464868)})
>>>>>>>>>>>> ((snezz,0),{(8073),(8073)})
>>>>>>>>>>>> 
>>>>>>>>>>>> but:
>>>>>>>>>>>> x = foreach g2 generate group, SUM(data.size);
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 2011-04-24 18:02:18,910 [Thread-793] WARN
>>>>>>>>>>>> org.apache.hadoop.mapred.LocalJobRunner -
>> job_local_0038
>>>>>>>>>>>> org.apache.pig.backend.executionengine.ExecException:
>>> ERROR
>>>>>> 2106:
>>>>>>>>> Error
>>>>>>>>>>>> while computing sum in Initial
>>>>>>>>>>>> at
>>>>> org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:87)
>>>>>>>>>>>> at
>>>>> org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:65)
>>>>>>>>>>>> at
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
>>>>>>>>>>>> at
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273)
>>>>>>>>>>>> at
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
>>>>>>>>>>>> at
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
>>>>>>>>>>>> at
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
>>>>>>>>>>>> at
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
>>>>>>>>>>>> at
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
>>>>>>>>>>>> at
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
>>>>>>>>>>>> at
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
>>>>>>>>>>>> at
>>> org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>>>>>>>>>>>> at
>>>>>>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>>>>>>>>>> at
>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>>>>>>>>>> at
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>>>>>>>>>>>> Caused by: java.lang.ClassCastException:
>>>>>>>>>>> org.apache.pig.data.DataByteArray
>>>>>>>>>>>> cannot be cast to java.lang.Long
>>>>>>>>>>>> at
>>>>> org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:79)
>>>>>>>>>>>> ... 14 more
>>>>>>>>>>>> 2011-04-24 18:02:19,213 [main] INFO
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>>>>>> - HadoopJobId: job_local_0038
>>>>>>>>>>>> 2011-04-24 18:02:19,213 [main] INFO
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>>>>>> - 0% complete
>>>>>>>>>>>> 2011-04-24 18:02:24,215 [main] INFO
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>>>>>> - job job_local_0038 has failed! Stop running all
>>> dependent
>>>>>> jobs
>>>>>>>>>>>> 2011-04-24 18:02:24,216 [main] INFO
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>>>>>> - 100% complete
>>>>>>>>>>>> 2011-04-24 18:02:24,216 [main] ERROR
>>>>>>>>>>>> org.apache.pig.tools.pigstats.PigStatsUtil - 1 map
>> reduce
>>>>>> job(s)
>>>>>>>>> failed!
>>>>>>>>>>>> 2011-04-24 18:02:24,216 [main] INFO
>>>>>>>>>>> org.apache.pig.tools.pigstats.PigStats
>>>>>>>>>>>> - Detected Local mode. Stats reported below may be
>>>> incomplete
>>>>>>>>>>>> 2011-04-24 18:02:24,216 [main] INFO
>>>>>>>>>>> org.apache.pig.tools.pigstats.PigStats
>>>>>>>>>>>> - Script Statistics:
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Pig Stack Trace
>>>>>>>>>>>> ---------------
>>>>>>>>>>>> ERROR 1066: Unable to open iterator for alias x
>>>>>>>>>>>> 
>>>>>>>>>>>> org.apache.pig.impl.logicalLayer.FrontendException:
>> ERROR
>>>>> 1066:
>>>>>>>>> Unable to
>>>>>>>>>>>> open iterator for alias x
>>>>>>>>>>>>        at
>>>>>>> org.apache.pig.PigServer.openIterator(PigServer.java:754)
>>>>>>>>>>>>        at
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
>>>>>>>>>>>>        at
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
>>>>>>>>>>>>        at
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>>>>>>>>>>>>        at
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
>>>>>>>>>>>>        at
>>>>> org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
>>>>>>>>>>>>        at org.apache.pig.Main.run(Main.java:465)
>>>>>>>>>>>>        at org.apache.pig.Main.main(Main.java:107)
>>>>>>>>>>>> Caused by: java.io.IOException: Job terminated with
>>>> anomalous
>>>>>>> status
>>>>>>>>>>> FAILED
>>>>>>>>>>>>        at
>>>>>>> org.apache.pig.PigServer.openIterator(PigServer.java:744)
>>>>>>>>>>>>        ... 7 more
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 


Re: SUM

Posted by pob <pe...@gmail.com>.
with pig 0.9.0

grunt> data = foreach data generate (chararray) domain, (long) spam, (long)
size, (long) time;
2011-04-25 00:48:15,093 [main] WARN  org.apache.pig.PigServer - Encountered
Warning IMPLICIT_CAST_TO_FLOAT 1 time(s).
grunt> describe data;
2011-04-25 00:48:20,354 [main] WARN  org.apache.pig.PigServer - Encountered
Warning IMPLICIT_CAST_TO_FLOAT 1 time(s).
data: {domain: chararray,spam: long,size: long,time: long}
grunt> z = foreach data generate time+size;
2011-04-25 00:48:31,557 [main] WARN  org.apache.pig.PigServer - Encountered
Warning IMPLICIT_CAST_TO_FLOAT 1 time(s).




org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received a
bytearray from the UDF. Cannot determine how to convert the bytearray to
float.
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:534)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:341)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:330)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.genericGetNext(Add.java:84)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.getNext(Add.java:119)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:330)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:261)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:256)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:58)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)



2011/4/24 Dmitriy Ryaboy <dv...@gmail.com>

> Sigh. @jeromatron , @thedatachef -- this one's on you :). Toldya you need
> the LoadCaster...
>
>
> D
>
> On Sun, Apr 24, 2011 at 1:17 PM, pob <pe...@gmail.com> wrote:
>
> > hello,
> >
> > thanks but w/out sucess ;/
> >
> >
> > grunt> pom = foreach rows generate myUDF.toTuple($1);
> > grunt> describe pom
> > pom: {y: {t: (domain: bytearray,spam: bytearray,size: bytearray,time:
> > bytearray)}}
> > grunt> data = foreach pom generate flatten($0) as (domain, spam, size,
> > time);
> > grunt> data = foreach data generate (chararray) domain, (int) spam,
> (long)
> > size,
> > >> (float) time;
> > grunt> describe data;
> > data: {domain: chararray,spam: int,size: long,time: float}
> >
> > z = foreach data generate time+size;
> >
> >
> > org.apache.pig.backend.executionengine.ExecException: ERROR 1075:
> Received
> > a
> > bytearray from the UDF. Cannot determine how to convert the bytearray to
> > float.
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:529)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.getNext(Add.java:92)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> > 2011-04-24 22:16:06,129 [main] INFO
> >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - job job_local_0001 has failed! Stop running all dependent jobs
> >
> >
> >
> >
> > z = foreach data generate time
> >
> >
> > org.apache.pig.backend.executionengine.ExecException: ERROR 1075:
> Received
> > a
> > bytearray from the UDF. Cannot determine how to convert the bytearray to
> > float.
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:529)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> >
> >
> > 2011/4/24 Dmitriy Ryaboy <dv...@gmail.com>
> >
> > > Try this:
> > >
> > > data = foreach pom generate flatten($0) as (domain, spam, size, time);
> > > data = foreach data generate (chararray) domain, (int) spam, (long)
> size,
> > > (float) time;
> > >
> > > Pig is inconsistent in what "as foo:type" does vs " (type) foo"
> > >
> > > D
> > >
> > > On Sun, Apr 24, 2011 at 10:44 AM, pob <pe...@gmail.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > but why i cant re-cast it during flatten?
> > > >
> > > >
> > > > data = foreach pom generate flatten($0) AS (domain:chararray,
> spam:int,
> > > > size:long, time:float);
> > > >
> > > > grunt> z = foreach data generate time+size;
> > > >
> > > >
> > > > java.lang.ClassCastException: org.apache.pig.data.DataByteArray
> cannot
> > be
> > > > cast to java.lang.Float
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.getNext(Add.java:97)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > > > at
> > >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> > > >
> > > >
> > > >
> > > > 2011/4/24 Dmitriy Ryaboy <dv...@gmail.com>
> > > >
> > > > > I think it's the deep-casting issue from
> > > > > https://issues.apache.org/jira/browse/PIG-1758 .
> > > > > Should work in 0.9 but didn't get into 0.8 or 0.8.1
> > > > >
> > > > > D
> > > > >
> > > > > On Sun, Apr 24, 2011 at 9:52 AM, pob <pe...@gmail.com> wrote:
> > > > >
> > > > > > Thats stramge, pygmalion works fine (but there are any numerical
> > > > > > operations).
> > > > > >
> > > > > > I think Im using C* 0.7.5 where it suppose to be patched ;/ so
> idk
> > :(
> > > > > >
> > > > > >
> > > > > > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> > > > > >
> > > > > > > That changes things entirely. There's some weirdness in the way
> > > data
> > > > is
> > > > > > > read from Cassandra. Have you applied the latest patches (eg.
> > > > > > > https://issues.apache.org/jira/browse/CASSANDRA-2387) ?
> > > > > > >
> > > > > > > See also some UDFs for working with Cassandra data that Jeremy
> > > Hanna
> > > > > > > (@jeromatron) wrote:
> > > > > > >
> > > > > > > https://github.com/jeromatron/pygmalion
> > > > > > >
> > > > > > >
> > > > > > > Best of luck!
> > > > > > >
> > > > > > > --jacob
> > > > > > > @thedatachef
> > > > > > >
> > > > > > > On Sun, 2011-04-24 at 18:31 +0200, pob wrote:
> > > > > > > > Maybe I forget one more thing, rows are taken from Cassandra.
> > > > > > > >
> > > > > > > > rows = LOAD 'cassandra://emailArchive/messagesMetaData' USING
> > > > > > > > CassandraStorage() AS (key, columns: bag {T: tuple(name,
> > > value)});
> > > > > > > >
> > > > > > > > I have no idea how to format AS for bag in foreach.
> > > > > > > >
> > > > > > > >
> > > > > > > > P.
> > > > > > > >
> > > > > > > > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> > > > > > > >
> > > > > > > > > Strange, that looks right to me. What happens if you try
> the
> > > 'AS'
> > > > > > > > > statement anyhow?
> > > > > > > > >
> > > > > > > > > --jacob
> > > > > > > > > @thedatachef
> > > > > > > > >
> > > > > > > > > On Sun, 2011-04-24 at 18:22 +0200, pob wrote:
> > > > > > > > > > Hello,
> > > > > > > > > >
> > > > > > > > > > pom = foreach rows generate myUDF.toTuple($1); -- reading
> > > data
> > > > > > > > > > describe pom
> > > > > > > > > > pom: {y: {t: (domain: chararray,spam: int,size:
> long,time:
> > > > > float)}}
> > > > > > > > > >
> > > > > > > > > > data = foreach pom generate flatten($0);
> > > > > > > > > > grunt> describe data;
> > > > > > > > > > data: {y::domain: chararray,y::spam: int,y::size:
> > > long,y::time:
> > > > > > > float}
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I thing they are casted fine, right?
> > > > > > > > > >
> > > > > > > > > > UDF is python one with decorator
> > > > > > > > > > @outputSchema("y:bag{t:tuple(domain:chararray, spam:int,
> > > > > size:long,
> > > > > > > > > > time:float)}")
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> > > > > > > > > >
> > > > > > > > > > > You're getting a 'ClassCastException' because the
> > contents
> > > of
> > > > > the
> > > > > > > bags
> > > > > > > > > > > are DataByteArray and not long (or cannot be cast to
> > long).
> > > I
> > > > > > > suspect
> > > > > > > > > > > that you're generating the contents of the bag in some
> > way
> > > > from
> > > > > a
> > > > > > > UDF,
> > > > > > > > > > > no?
> > > > > > > > > > >
> > > > > > > > > > > You need to either declare the output schema explicitly
> > in
> > > > the
> > > > > > UDF
> > > > > > > or
> > > > > > > > > > > just use the 'AS' statement. For example, say you have
> a
> > > UDF
> > > > > that
> > > > > > > sums
> > > > > > > > > > > two numbers:
> > > > > > > > > > >
> > > > > > > > > > > data   = LOAD 'foobar' AS (int:a, int:b);
> > > > > > > > > > > summed = FOREACH data GENERATE MyFancySummingUDF(a,b)
> AS
> > > > > > (sum:int);
> > > > > > > > > > > DUMP summed;
> > > > > > > > > > >
> > > > > > > > > > > --jacob
> > > > > > > > > > > @thedatachef
> > > > > > > > > > >
> > > > > > > > > > > On Sun, 2011-04-24 at 18:02 +0200, pob wrote:
> > > > > > > > > > > > x = foreach g2 generate group, data.(size);
> > > > > > > > > > > > dump x;
> > > > > > > > > > > >
> > > > > > > > > > > > ((drm,0),{(464868)})
> > > > > > > > > > > > ((drm,1),{(464868)})
> > > > > > > > > > > > ((snezz,0),{(8073),(8073)})
> > > > > > > > > > > >
> > > > > > > > > > > > but:
> > > > > > > > > > > > x = foreach g2 generate group, SUM(data.size);
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > 2011-04-24 18:02:18,910 [Thread-793] WARN
> > > > > > > > > > > >  org.apache.hadoop.mapred.LocalJobRunner -
> > job_local_0038
> > > > > > > > > > > > org.apache.pig.backend.executionengine.ExecException:
> > > ERROR
> > > > > > 2106:
> > > > > > > > > Error
> > > > > > > > > > > > while computing sum in Initial
> > > > > > > > > > > > at
> > > > > org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:87)
> > > > > > > > > > > > at
> > > > > org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:65)
> > > > > > > > > > > > at
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
> > > > > > > > > > > > at
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273)
> > > > > > > > > > > > at
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
> > > > > > > > > > > > at
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> > > > > > > > > > > > at
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
> > > > > > > > > > > > at
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
> > > > > > > > > > > > at
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> > > > > > > > > > > > at
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> > > > > > > > > > > > at
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> > > > > > > > > > > > at
> > > org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > > > > > > > > > > at
> > > > > > > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> > > > > > > > > > > > at
> > org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > > > > > > > > > > > at
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> > > > > > > > > > > > Caused by: java.lang.ClassCastException:
> > > > > > > > > > > org.apache.pig.data.DataByteArray
> > > > > > > > > > > > cannot be cast to java.lang.Long
> > > > > > > > > > > > at
> > > > > org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:79)
> > > > > > > > > > > > ... 14 more
> > > > > > > > > > > > 2011-04-24 18:02:19,213 [main] INFO
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > > > > > - HadoopJobId: job_local_0038
> > > > > > > > > > > > 2011-04-24 18:02:19,213 [main] INFO
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > > > > > - 0% complete
> > > > > > > > > > > > 2011-04-24 18:02:24,215 [main] INFO
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > > > > > - job job_local_0038 has failed! Stop running all
> > > dependent
> > > > > > jobs
> > > > > > > > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > > > > > - 100% complete
> > > > > > > > > > > > 2011-04-24 18:02:24,216 [main] ERROR
> > > > > > > > > > > > org.apache.pig.tools.pigstats.PigStatsUtil - 1 map
> > reduce
> > > > > > job(s)
> > > > > > > > > failed!
> > > > > > > > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > > > > > > >  org.apache.pig.tools.pigstats.PigStats
> > > > > > > > > > > > - Detected Local mode. Stats reported below may be
> > > > incomplete
> > > > > > > > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > > > > > > >  org.apache.pig.tools.pigstats.PigStats
> > > > > > > > > > > > - Script Statistics:
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Pig Stack Trace
> > > > > > > > > > > > ---------------
> > > > > > > > > > > > ERROR 1066: Unable to open iterator for alias x
> > > > > > > > > > > >
> > > > > > > > > > > > org.apache.pig.impl.logicalLayer.FrontendException:
> > ERROR
> > > > > 1066:
> > > > > > > > > Unable to
> > > > > > > > > > > > open iterator for alias x
> > > > > > > > > > > >         at
> > > > > > > org.apache.pig.PigServer.openIterator(PigServer.java:754)
> > > > > > > > > > > >         at
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
> > > > > > > > > > > >         at
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
> > > > > > > > > > > >         at
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> > > > > > > > > > > >         at
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> > > > > > > > > > > >         at
> > > > > org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> > > > > > > > > > > >         at org.apache.pig.Main.run(Main.java:465)
> > > > > > > > > > > >         at org.apache.pig.Main.main(Main.java:107)
> > > > > > > > > > > > Caused by: java.io.IOException: Job terminated with
> > > > anomalous
> > > > > > > status
> > > > > > > > > > > FAILED
> > > > > > > > > > > >         at
> > > > > > > org.apache.pig.PigServer.openIterator(PigServer.java:744)
> > > > > > > > > > > >         ... 7 more
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: SUM

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Sigh. @jeromatron , @thedatachef -- this one's on you :). Toldya you need
the LoadCaster...


D

On Sun, Apr 24, 2011 at 1:17 PM, pob <pe...@gmail.com> wrote:

> hello,
>
> thanks but w/out sucess ;/
>
>
> grunt> pom = foreach rows generate myUDF.toTuple($1);
> grunt> describe pom
> pom: {y: {t: (domain: bytearray,spam: bytearray,size: bytearray,time:
> bytearray)}}
> grunt> data = foreach pom generate flatten($0) as (domain, spam, size,
> time);
> grunt> data = foreach data generate (chararray) domain, (int) spam, (long)
> size,
> >> (float) time;
> grunt> describe data;
> data: {domain: chararray,spam: int,size: long,time: float}
>
> z = foreach data generate time+size;
>
>
> org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received
> a
> bytearray from the UDF. Cannot determine how to convert the bytearray to
> float.
> at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:529)
> at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.getNext(Add.java:92)
> at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
> at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> 2011-04-24 22:16:06,129 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - job job_local_0001 has failed! Stop running all dependent jobs
>
>
>
>
> z = foreach data generate time
>
>
> org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received
> a
> bytearray from the UDF. Cannot determine how to convert the bytearray to
> float.
> at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:529)
> at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
> at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>
>
> 2011/4/24 Dmitriy Ryaboy <dv...@gmail.com>
>
> > Try this:
> >
> > data = foreach pom generate flatten($0) as (domain, spam, size, time);
> > data = foreach data generate (chararray) domain, (int) spam, (long) size,
> > (float) time;
> >
> > Pig is inconsistent in what "as foo:type" does vs " (type) foo"
> >
> > D
> >
> > On Sun, Apr 24, 2011 at 10:44 AM, pob <pe...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > but why i cant re-cast it during flatten?
> > >
> > >
> > > data = foreach pom generate flatten($0) AS (domain:chararray, spam:int,
> > > size:long, time:float);
> > >
> > > grunt> z = foreach data generate time+size;
> > >
> > >
> > > java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot
> be
> > > cast to java.lang.Float
> > > at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.getNext(Add.java:97)
> > > at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
> > > at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> > > at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> > > at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> > > at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > > at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> > >
> > >
> > >
> > > 2011/4/24 Dmitriy Ryaboy <dv...@gmail.com>
> > >
> > > > I think it's the deep-casting issue from
> > > > https://issues.apache.org/jira/browse/PIG-1758 .
> > > > Should work in 0.9 but didn't get into 0.8 or 0.8.1
> > > >
> > > > D
> > > >
> > > > On Sun, Apr 24, 2011 at 9:52 AM, pob <pe...@gmail.com> wrote:
> > > >
> > > > > Thats stramge, pygmalion works fine (but there are any numerical
> > > > > operations).
> > > > >
> > > > > I think Im using C* 0.7.5 where it suppose to be patched ;/ so idk
> :(
> > > > >
> > > > >
> > > > > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> > > > >
> > > > > > That changes things entirely. There's some weirdness in the way
> > data
> > > is
> > > > > > read from Cassandra. Have you applied the latest patches (eg.
> > > > > > https://issues.apache.org/jira/browse/CASSANDRA-2387) ?
> > > > > >
> > > > > > See also some UDFs for working with Cassandra data that Jeremy
> > Hanna
> > > > > > (@jeromatron) wrote:
> > > > > >
> > > > > > https://github.com/jeromatron/pygmalion
> > > > > >
> > > > > >
> > > > > > Best of luck!
> > > > > >
> > > > > > --jacob
> > > > > > @thedatachef
> > > > > >
> > > > > > On Sun, 2011-04-24 at 18:31 +0200, pob wrote:
> > > > > > > Maybe I forget one more thing, rows are taken from Cassandra.
> > > > > > >
> > > > > > > rows = LOAD 'cassandra://emailArchive/messagesMetaData' USING
> > > > > > > CassandraStorage() AS (key, columns: bag {T: tuple(name,
> > value)});
> > > > > > >
> > > > > > > I have no idea how to format AS for bag in foreach.
> > > > > > >
> > > > > > >
> > > > > > > P.
> > > > > > >
> > > > > > > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> > > > > > >
> > > > > > > > Strange, that looks right to me. What happens if you try the
> > 'AS'
> > > > > > > > statement anyhow?
> > > > > > > >
> > > > > > > > --jacob
> > > > > > > > @thedatachef
> > > > > > > >
> > > > > > > > On Sun, 2011-04-24 at 18:22 +0200, pob wrote:
> > > > > > > > > Hello,
> > > > > > > > >
> > > > > > > > > pom = foreach rows generate myUDF.toTuple($1); -- reading
> > data
> > > > > > > > > describe pom
> > > > > > > > > pom: {y: {t: (domain: chararray,spam: int,size: long,time:
> > > > float)}}
> > > > > > > > >
> > > > > > > > > data = foreach pom generate flatten($0);
> > > > > > > > > grunt> describe data;
> > > > > > > > > data: {y::domain: chararray,y::spam: int,y::size:
> > long,y::time:
> > > > > > float}
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I thing they are casted fine, right?
> > > > > > > > >
> > > > > > > > > UDF is python one with decorator
> > > > > > > > > @outputSchema("y:bag{t:tuple(domain:chararray, spam:int,
> > > > size:long,
> > > > > > > > > time:float)}")
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> > > > > > > > >
> > > > > > > > > > You're getting a 'ClassCastException' because the
> contents
> > of
> > > > the
> > > > > > bags
> > > > > > > > > > are DataByteArray and not long (or cannot be cast to
> long).
> > I
> > > > > > suspect
> > > > > > > > > > that you're generating the contents of the bag in some
> way
> > > from
> > > > a
> > > > > > UDF,
> > > > > > > > > > no?
> > > > > > > > > >
> > > > > > > > > > You need to either declare the output schema explicitly
> in
> > > the
> > > > > UDF
> > > > > > or
> > > > > > > > > > just use the 'AS' statement. For example, say you have a
> > UDF
> > > > that
> > > > > > sums
> > > > > > > > > > two numbers:
> > > > > > > > > >
> > > > > > > > > > data   = LOAD 'foobar' AS (int:a, int:b);
> > > > > > > > > > summed = FOREACH data GENERATE MyFancySummingUDF(a,b) AS
> > > > > (sum:int);
> > > > > > > > > > DUMP summed;
> > > > > > > > > >
> > > > > > > > > > --jacob
> > > > > > > > > > @thedatachef
> > > > > > > > > >
> > > > > > > > > > On Sun, 2011-04-24 at 18:02 +0200, pob wrote:
> > > > > > > > > > > x = foreach g2 generate group, data.(size);
> > > > > > > > > > > dump x;
> > > > > > > > > > >
> > > > > > > > > > > ((drm,0),{(464868)})
> > > > > > > > > > > ((drm,1),{(464868)})
> > > > > > > > > > > ((snezz,0),{(8073),(8073)})
> > > > > > > > > > >
> > > > > > > > > > > but:
> > > > > > > > > > > x = foreach g2 generate group, SUM(data.size);
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > 2011-04-24 18:02:18,910 [Thread-793] WARN
> > > > > > > > > > >  org.apache.hadoop.mapred.LocalJobRunner -
> job_local_0038
> > > > > > > > > > > org.apache.pig.backend.executionengine.ExecException:
> > ERROR
> > > > > 2106:
> > > > > > > > Error
> > > > > > > > > > > while computing sum in Initial
> > > > > > > > > > > at
> > > > org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:87)
> > > > > > > > > > > at
> > > > org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:65)
> > > > > > > > > > > at
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
> > > > > > > > > > > at
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273)
> > > > > > > > > > > at
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
> > > > > > > > > > > at
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> > > > > > > > > > > at
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
> > > > > > > > > > > at
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
> > > > > > > > > > > at
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> > > > > > > > > > > at
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> > > > > > > > > > > at
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> > > > > > > > > > > at
> > org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > > > > > > > > > at
> > > > > > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> > > > > > > > > > > at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > > > > > > > > > > at
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> > > > > > > > > > > Caused by: java.lang.ClassCastException:
> > > > > > > > > > org.apache.pig.data.DataByteArray
> > > > > > > > > > > cannot be cast to java.lang.Long
> > > > > > > > > > > at
> > > > org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:79)
> > > > > > > > > > > ... 14 more
> > > > > > > > > > > 2011-04-24 18:02:19,213 [main] INFO
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > > > > - HadoopJobId: job_local_0038
> > > > > > > > > > > 2011-04-24 18:02:19,213 [main] INFO
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > > > > - 0% complete
> > > > > > > > > > > 2011-04-24 18:02:24,215 [main] INFO
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > > > > - job job_local_0038 has failed! Stop running all
> > dependent
> > > > > jobs
> > > > > > > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > > > > - 100% complete
> > > > > > > > > > > 2011-04-24 18:02:24,216 [main] ERROR
> > > > > > > > > > > org.apache.pig.tools.pigstats.PigStatsUtil - 1 map
> reduce
> > > > > job(s)
> > > > > > > > failed!
> > > > > > > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > > > > > >  org.apache.pig.tools.pigstats.PigStats
> > > > > > > > > > > - Detected Local mode. Stats reported below may be
> > > incomplete
> > > > > > > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > > > > > >  org.apache.pig.tools.pigstats.PigStats
> > > > > > > > > > > - Script Statistics:
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Pig Stack Trace
> > > > > > > > > > > ---------------
> > > > > > > > > > > ERROR 1066: Unable to open iterator for alias x
> > > > > > > > > > >
> > > > > > > > > > > org.apache.pig.impl.logicalLayer.FrontendException:
> ERROR
> > > > 1066:
> > > > > > > > Unable to
> > > > > > > > > > > open iterator for alias x
> > > > > > > > > > >         at
> > > > > > org.apache.pig.PigServer.openIterator(PigServer.java:754)
> > > > > > > > > > >         at
> > > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
> > > > > > > > > > >         at
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
> > > > > > > > > > >         at
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> > > > > > > > > > >         at
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> > > > > > > > > > >         at
> > > > org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> > > > > > > > > > >         at org.apache.pig.Main.run(Main.java:465)
> > > > > > > > > > >         at org.apache.pig.Main.main(Main.java:107)
> > > > > > > > > > > Caused by: java.io.IOException: Job terminated with
> > > anomalous
> > > > > > status
> > > > > > > > > > FAILED
> > > > > > > > > > >         at
> > > > > > org.apache.pig.PigServer.openIterator(PigServer.java:744)
> > > > > > > > > > >         ... 7 more
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: SUM

Posted by pob <pe...@gmail.com>.
hello,

thanks but w/out sucess ;/


grunt> pom = foreach rows generate myUDF.toTuple($1);
grunt> describe pom
pom: {y: {t: (domain: bytearray,spam: bytearray,size: bytearray,time:
bytearray)}}
grunt> data = foreach pom generate flatten($0) as (domain, spam, size,
time);
grunt> data = foreach data generate (chararray) domain, (int) spam, (long)
size,
>> (float) time;
grunt> describe data;
data: {domain: chararray,spam: int,size: long,time: float}

z = foreach data generate time+size;


org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received a
bytearray from the UDF. Cannot determine how to convert the bytearray to
float.
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:529)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.getNext(Add.java:92)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
2011-04-24 22:16:06,129 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- job job_local_0001 has failed! Stop running all dependent jobs




z = foreach data generate time


org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received a
bytearray from the UDF. Cannot determine how to convert the bytearray to
float.
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:529)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)


2011/4/24 Dmitriy Ryaboy <dv...@gmail.com>

> Try this:
>
> data = foreach pom generate flatten($0) as (domain, spam, size, time);
> data = foreach data generate (chararray) domain, (int) spam, (long) size,
> (float) time;
>
> Pig is inconsistent in what "as foo:type" does vs " (type) foo"
>
> D
>
> On Sun, Apr 24, 2011 at 10:44 AM, pob <pe...@gmail.com> wrote:
>
> > Hi,
> >
> > but why i cant re-cast it during flatten?
> >
> >
> > data = foreach pom generate flatten($0) AS (domain:chararray, spam:int,
> > size:long, time:float);
> >
> > grunt> z = foreach data generate time+size;
> >
> >
> > java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be
> > cast to java.lang.Float
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.getNext(Add.java:97)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> >
> >
> >
> > 2011/4/24 Dmitriy Ryaboy <dv...@gmail.com>
> >
> > > I think it's the deep-casting issue from
> > > https://issues.apache.org/jira/browse/PIG-1758 .
> > > Should work in 0.9 but didn't get into 0.8 or 0.8.1
> > >
> > > D
> > >
> > > On Sun, Apr 24, 2011 at 9:52 AM, pob <pe...@gmail.com> wrote:
> > >
> > > > Thats stramge, pygmalion works fine (but there are any numerical
> > > > operations).
> > > >
> > > > I think Im using C* 0.7.5 where it suppose to be patched ;/ so idk :(
> > > >
> > > >
> > > > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> > > >
> > > > > That changes things entirely. There's some weirdness in the way
> data
> > is
> > > > > read from Cassandra. Have you applied the latest patches (eg.
> > > > > https://issues.apache.org/jira/browse/CASSANDRA-2387) ?
> > > > >
> > > > > See also some UDFs for working with Cassandra data that Jeremy
> Hanna
> > > > > (@jeromatron) wrote:
> > > > >
> > > > > https://github.com/jeromatron/pygmalion
> > > > >
> > > > >
> > > > > Best of luck!
> > > > >
> > > > > --jacob
> > > > > @thedatachef
> > > > >
> > > > > On Sun, 2011-04-24 at 18:31 +0200, pob wrote:
> > > > > > Maybe I forget one more thing, rows are taken from Cassandra.
> > > > > >
> > > > > > rows = LOAD 'cassandra://emailArchive/messagesMetaData' USING
> > > > > > CassandraStorage() AS (key, columns: bag {T: tuple(name,
> value)});
> > > > > >
> > > > > > I have no idea how to format AS for bag in foreach.
> > > > > >
> > > > > >
> > > > > > P.
> > > > > >
> > > > > > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> > > > > >
> > > > > > > Strange, that looks right to me. What happens if you try the
> 'AS'
> > > > > > > statement anyhow?
> > > > > > >
> > > > > > > --jacob
> > > > > > > @thedatachef
> > > > > > >
> > > > > > > On Sun, 2011-04-24 at 18:22 +0200, pob wrote:
> > > > > > > > Hello,
> > > > > > > >
> > > > > > > > pom = foreach rows generate myUDF.toTuple($1); -- reading
> data
> > > > > > > > describe pom
> > > > > > > > pom: {y: {t: (domain: chararray,spam: int,size: long,time:
> > > float)}}
> > > > > > > >
> > > > > > > > data = foreach pom generate flatten($0);
> > > > > > > > grunt> describe data;
> > > > > > > > data: {y::domain: chararray,y::spam: int,y::size:
> long,y::time:
> > > > > float}
> > > > > > > >
> > > > > > > >
> > > > > > > > I thing they are casted fine, right?
> > > > > > > >
> > > > > > > > UDF is python one with decorator
> > > > > > > > @outputSchema("y:bag{t:tuple(domain:chararray, spam:int,
> > > size:long,
> > > > > > > > time:float)}")
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> > > > > > > >
> > > > > > > > > You're getting a 'ClassCastException' because the contents
> of
> > > the
> > > > > bags
> > > > > > > > > are DataByteArray and not long (or cannot be cast to long).
> I
> > > > > suspect
> > > > > > > > > that you're generating the contents of the bag in some way
> > from
> > > a
> > > > > UDF,
> > > > > > > > > no?
> > > > > > > > >
> > > > > > > > > You need to either declare the output schema explicitly in
> > the
> > > > UDF
> > > > > or
> > > > > > > > > just use the 'AS' statement. For example, say you have a
> UDF
> > > that
> > > > > sums
> > > > > > > > > two numbers:
> > > > > > > > >
> > > > > > > > > data   = LOAD 'foobar' AS (int:a, int:b);
> > > > > > > > > summed = FOREACH data GENERATE MyFancySummingUDF(a,b) AS
> > > > (sum:int);
> > > > > > > > > DUMP summed;
> > > > > > > > >
> > > > > > > > > --jacob
> > > > > > > > > @thedatachef
> > > > > > > > >
> > > > > > > > > On Sun, 2011-04-24 at 18:02 +0200, pob wrote:
> > > > > > > > > > x = foreach g2 generate group, data.(size);
> > > > > > > > > > dump x;
> > > > > > > > > >
> > > > > > > > > > ((drm,0),{(464868)})
> > > > > > > > > > ((drm,1),{(464868)})
> > > > > > > > > > ((snezz,0),{(8073),(8073)})
> > > > > > > > > >
> > > > > > > > > > but:
> > > > > > > > > > x = foreach g2 generate group, SUM(data.size);
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 2011-04-24 18:02:18,910 [Thread-793] WARN
> > > > > > > > > >  org.apache.hadoop.mapred.LocalJobRunner - job_local_0038
> > > > > > > > > > org.apache.pig.backend.executionengine.ExecException:
> ERROR
> > > > 2106:
> > > > > > > Error
> > > > > > > > > > while computing sum in Initial
> > > > > > > > > > at
> > > org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:87)
> > > > > > > > > > at
> > > org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:65)
> > > > > > > > > > at
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
> > > > > > > > > > at
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273)
> > > > > > > > > > at
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
> > > > > > > > > > at
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> > > > > > > > > > at
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
> > > > > > > > > > at
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
> > > > > > > > > > at
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> > > > > > > > > > at
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> > > > > > > > > > at
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> > > > > > > > > > at
> org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > > > > > > > > at
> > > > > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> > > > > > > > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > > > > > > > > > at
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> > > > > > > > > > Caused by: java.lang.ClassCastException:
> > > > > > > > > org.apache.pig.data.DataByteArray
> > > > > > > > > > cannot be cast to java.lang.Long
> > > > > > > > > > at
> > > org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:79)
> > > > > > > > > > ... 14 more
> > > > > > > > > > 2011-04-24 18:02:19,213 [main] INFO
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > > > - HadoopJobId: job_local_0038
> > > > > > > > > > 2011-04-24 18:02:19,213 [main] INFO
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > > > - 0% complete
> > > > > > > > > > 2011-04-24 18:02:24,215 [main] INFO
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > > > - job job_local_0038 has failed! Stop running all
> dependent
> > > > jobs
> > > > > > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > > > - 100% complete
> > > > > > > > > > 2011-04-24 18:02:24,216 [main] ERROR
> > > > > > > > > > org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce
> > > > job(s)
> > > > > > > failed!
> > > > > > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > > > > >  org.apache.pig.tools.pigstats.PigStats
> > > > > > > > > > - Detected Local mode. Stats reported below may be
> > incomplete
> > > > > > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > > > > >  org.apache.pig.tools.pigstats.PigStats
> > > > > > > > > > - Script Statistics:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Pig Stack Trace
> > > > > > > > > > ---------------
> > > > > > > > > > ERROR 1066: Unable to open iterator for alias x
> > > > > > > > > >
> > > > > > > > > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR
> > > 1066:
> > > > > > > Unable to
> > > > > > > > > > open iterator for alias x
> > > > > > > > > >         at
> > > > > org.apache.pig.PigServer.openIterator(PigServer.java:754)
> > > > > > > > > >         at
> > > > > > > > > >
> > > > > > >
> > > > >
> > >
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
> > > > > > > > > >         at
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
> > > > > > > > > >         at
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> > > > > > > > > >         at
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> > > > > > > > > >         at
> > > org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> > > > > > > > > >         at org.apache.pig.Main.run(Main.java:465)
> > > > > > > > > >         at org.apache.pig.Main.main(Main.java:107)
> > > > > > > > > > Caused by: java.io.IOException: Job terminated with
> > anomalous
> > > > > status
> > > > > > > > > FAILED
> > > > > > > > > >         at
> > > > > org.apache.pig.PigServer.openIterator(PigServer.java:744)
> > > > > > > > > >         ... 7 more
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: SUM

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Try this:

data = foreach pom generate flatten($0) as (domain, spam, size, time);
data = foreach data generate (chararray) domain, (int) spam, (long) size,
(float) time;

Pig is inconsistent in what "as foo:type" does vs " (type) foo"

D

On Sun, Apr 24, 2011 at 10:44 AM, pob <pe...@gmail.com> wrote:

> Hi,
>
> but why i cant re-cast it during flatten?
>
>
> data = foreach pom generate flatten($0) AS (domain:chararray, spam:int,
> size:long, time:float);
>
> grunt> z = foreach data generate time+size;
>
>
> java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be
> cast to java.lang.Float
> at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.getNext(Add.java:97)
> at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
> at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>
>
>
> 2011/4/24 Dmitriy Ryaboy <dv...@gmail.com>
>
> > I think it's the deep-casting issue from
> > https://issues.apache.org/jira/browse/PIG-1758 .
> > Should work in 0.9 but didn't get into 0.8 or 0.8.1
> >
> > D
> >
> > On Sun, Apr 24, 2011 at 9:52 AM, pob <pe...@gmail.com> wrote:
> >
> > > Thats stramge, pygmalion works fine (but there are any numerical
> > > operations).
> > >
> > > I think Im using C* 0.7.5 where it suppose to be patched ;/ so idk :(
> > >
> > >
> > > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> > >
> > > > That changes things entirely. There's some weirdness in the way data
> is
> > > > read from Cassandra. Have you applied the latest patches (eg.
> > > > https://issues.apache.org/jira/browse/CASSANDRA-2387) ?
> > > >
> > > > See also some UDFs for working with Cassandra data that Jeremy Hanna
> > > > (@jeromatron) wrote:
> > > >
> > > > https://github.com/jeromatron/pygmalion
> > > >
> > > >
> > > > Best of luck!
> > > >
> > > > --jacob
> > > > @thedatachef
> > > >
> > > > On Sun, 2011-04-24 at 18:31 +0200, pob wrote:
> > > > > Maybe I forget one more thing, rows are taken from Cassandra.
> > > > >
> > > > > rows = LOAD 'cassandra://emailArchive/messagesMetaData' USING
> > > > > CassandraStorage() AS (key, columns: bag {T: tuple(name, value)});
> > > > >
> > > > > I have no idea how to format AS for bag in foreach.
> > > > >
> > > > >
> > > > > P.
> > > > >
> > > > > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> > > > >
> > > > > > Strange, that looks right to me. What happens if you try the 'AS'
> > > > > > statement anyhow?
> > > > > >
> > > > > > --jacob
> > > > > > @thedatachef
> > > > > >
> > > > > > On Sun, 2011-04-24 at 18:22 +0200, pob wrote:
> > > > > > > Hello,
> > > > > > >
> > > > > > > pom = foreach rows generate myUDF.toTuple($1); -- reading data
> > > > > > > describe pom
> > > > > > > pom: {y: {t: (domain: chararray,spam: int,size: long,time:
> > float)}}
> > > > > > >
> > > > > > > data = foreach pom generate flatten($0);
> > > > > > > grunt> describe data;
> > > > > > > data: {y::domain: chararray,y::spam: int,y::size: long,y::time:
> > > > float}
> > > > > > >
> > > > > > >
> > > > > > > I thing they are casted fine, right?
> > > > > > >
> > > > > > > UDF is python one with decorator
> > > > > > > @outputSchema("y:bag{t:tuple(domain:chararray, spam:int,
> > size:long,
> > > > > > > time:float)}")
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> > > > > > >
> > > > > > > > You're getting a 'ClassCastException' because the contents of
> > the
> > > > bags
> > > > > > > > are DataByteArray and not long (or cannot be cast to long). I
> > > > suspect
> > > > > > > > that you're generating the contents of the bag in some way
> from
> > a
> > > > UDF,
> > > > > > > > no?
> > > > > > > >
> > > > > > > > You need to either declare the output schema explicitly in
> the
> > > UDF
> > > > or
> > > > > > > > just use the 'AS' statement. For example, say you have a UDF
> > that
> > > > sums
> > > > > > > > two numbers:
> > > > > > > >
> > > > > > > > data   = LOAD 'foobar' AS (int:a, int:b);
> > > > > > > > summed = FOREACH data GENERATE MyFancySummingUDF(a,b) AS
> > > (sum:int);
> > > > > > > > DUMP summed;
> > > > > > > >
> > > > > > > > --jacob
> > > > > > > > @thedatachef
> > > > > > > >
> > > > > > > > On Sun, 2011-04-24 at 18:02 +0200, pob wrote:
> > > > > > > > > x = foreach g2 generate group, data.(size);
> > > > > > > > > dump x;
> > > > > > > > >
> > > > > > > > > ((drm,0),{(464868)})
> > > > > > > > > ((drm,1),{(464868)})
> > > > > > > > > ((snezz,0),{(8073),(8073)})
> > > > > > > > >
> > > > > > > > > but:
> > > > > > > > > x = foreach g2 generate group, SUM(data.size);
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 2011-04-24 18:02:18,910 [Thread-793] WARN
> > > > > > > > >  org.apache.hadoop.mapred.LocalJobRunner - job_local_0038
> > > > > > > > > org.apache.pig.backend.executionengine.ExecException: ERROR
> > > 2106:
> > > > > > Error
> > > > > > > > > while computing sum in Initial
> > > > > > > > > at
> > org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:87)
> > > > > > > > > at
> > org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:65)
> > > > > > > > > at
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
> > > > > > > > > at
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273)
> > > > > > > > > at
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
> > > > > > > > > at
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> > > > > > > > > at
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
> > > > > > > > > at
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
> > > > > > > > > at
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> > > > > > > > > at
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> > > > > > > > > at
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> > > > > > > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > > > > > > > at
> > > > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> > > > > > > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > > > > > > > > at
> > > > > > > >
> > > > > >
> > > >
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> > > > > > > > > Caused by: java.lang.ClassCastException:
> > > > > > > > org.apache.pig.data.DataByteArray
> > > > > > > > > cannot be cast to java.lang.Long
> > > > > > > > > at
> > org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:79)
> > > > > > > > > ... 14 more
> > > > > > > > > 2011-04-24 18:02:19,213 [main] INFO
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > > - HadoopJobId: job_local_0038
> > > > > > > > > 2011-04-24 18:02:19,213 [main] INFO
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > > - 0% complete
> > > > > > > > > 2011-04-24 18:02:24,215 [main] INFO
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > > - job job_local_0038 has failed! Stop running all dependent
> > > jobs
> > > > > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > > - 100% complete
> > > > > > > > > 2011-04-24 18:02:24,216 [main] ERROR
> > > > > > > > > org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce
> > > job(s)
> > > > > > failed!
> > > > > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > > > >  org.apache.pig.tools.pigstats.PigStats
> > > > > > > > > - Detected Local mode. Stats reported below may be
> incomplete
> > > > > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > > > >  org.apache.pig.tools.pigstats.PigStats
> > > > > > > > > - Script Statistics:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Pig Stack Trace
> > > > > > > > > ---------------
> > > > > > > > > ERROR 1066: Unable to open iterator for alias x
> > > > > > > > >
> > > > > > > > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR
> > 1066:
> > > > > > Unable to
> > > > > > > > > open iterator for alias x
> > > > > > > > >         at
> > > > org.apache.pig.PigServer.openIterator(PigServer.java:754)
> > > > > > > > >         at
> > > > > > > > >
> > > > > >
> > > >
> > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
> > > > > > > > >         at
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
> > > > > > > > >         at
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> > > > > > > > >         at
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> > > > > > > > >         at
> > org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> > > > > > > > >         at org.apache.pig.Main.run(Main.java:465)
> > > > > > > > >         at org.apache.pig.Main.main(Main.java:107)
> > > > > > > > > Caused by: java.io.IOException: Job terminated with
> anomalous
> > > > status
> > > > > > > > FAILED
> > > > > > > > >         at
> > > > org.apache.pig.PigServer.openIterator(PigServer.java:744)
> > > > > > > > >         ... 7 more
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > >
> > > >
> > > >
> > >
> >
>

Re: SUM

Posted by pob <pe...@gmail.com>.
Hi,

but why i cant re-cast it during flatten?


data = foreach pom generate flatten($0) AS (domain:chararray, spam:int,
size:long, time:float);

grunt> z = foreach data generate time+size;


java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be
cast to java.lang.Float
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.getNext(Add.java:97)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)



2011/4/24 Dmitriy Ryaboy <dv...@gmail.com>

> I think it's the deep-casting issue from
> https://issues.apache.org/jira/browse/PIG-1758 .
> Should work in 0.9 but didn't get into 0.8 or 0.8.1
>
> D
>
> On Sun, Apr 24, 2011 at 9:52 AM, pob <pe...@gmail.com> wrote:
>
> > Thats stramge, pygmalion works fine (but there are any numerical
> > operations).
> >
> > I think Im using C* 0.7.5 where it suppose to be patched ;/ so idk :(
> >
> >
> > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> >
> > > That changes things entirely. There's some weirdness in the way data is
> > > read from Cassandra. Have you applied the latest patches (eg.
> > > https://issues.apache.org/jira/browse/CASSANDRA-2387) ?
> > >
> > > See also some UDFs for working with Cassandra data that Jeremy Hanna
> > > (@jeromatron) wrote:
> > >
> > > https://github.com/jeromatron/pygmalion
> > >
> > >
> > > Best of luck!
> > >
> > > --jacob
> > > @thedatachef
> > >
> > > On Sun, 2011-04-24 at 18:31 +0200, pob wrote:
> > > > Maybe I forget one more thing, rows are taken from Cassandra.
> > > >
> > > > rows = LOAD 'cassandra://emailArchive/messagesMetaData' USING
> > > > CassandraStorage() AS (key, columns: bag {T: tuple(name, value)});
> > > >
> > > > I have no idea how to format AS for bag in foreach.
> > > >
> > > >
> > > > P.
> > > >
> > > > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> > > >
> > > > > Strange, that looks right to me. What happens if you try the 'AS'
> > > > > statement anyhow?
> > > > >
> > > > > --jacob
> > > > > @thedatachef
> > > > >
> > > > > On Sun, 2011-04-24 at 18:22 +0200, pob wrote:
> > > > > > Hello,
> > > > > >
> > > > > > pom = foreach rows generate myUDF.toTuple($1); -- reading data
> > > > > > describe pom
> > > > > > pom: {y: {t: (domain: chararray,spam: int,size: long,time:
> float)}}
> > > > > >
> > > > > > data = foreach pom generate flatten($0);
> > > > > > grunt> describe data;
> > > > > > data: {y::domain: chararray,y::spam: int,y::size: long,y::time:
> > > float}
> > > > > >
> > > > > >
> > > > > > I thing they are casted fine, right?
> > > > > >
> > > > > > UDF is python one with decorator
> > > > > > @outputSchema("y:bag{t:tuple(domain:chararray, spam:int,
> size:long,
> > > > > > time:float)}")
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > >
> > > > > >
> > > > > > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> > > > > >
> > > > > > > You're getting a 'ClassCastException' because the contents of
> the
> > > bags
> > > > > > > are DataByteArray and not long (or cannot be cast to long). I
> > > suspect
> > > > > > > that you're generating the contents of the bag in some way from
> a
> > > UDF,
> > > > > > > no?
> > > > > > >
> > > > > > > You need to either declare the output schema explicitly in the
> > UDF
> > > or
> > > > > > > just use the 'AS' statement. For example, say you have a UDF
> that
> > > sums
> > > > > > > two numbers:
> > > > > > >
> > > > > > > data   = LOAD 'foobar' AS (int:a, int:b);
> > > > > > > summed = FOREACH data GENERATE MyFancySummingUDF(a,b) AS
> > (sum:int);
> > > > > > > DUMP summed;
> > > > > > >
> > > > > > > --jacob
> > > > > > > @thedatachef
> > > > > > >
> > > > > > > On Sun, 2011-04-24 at 18:02 +0200, pob wrote:
> > > > > > > > x = foreach g2 generate group, data.(size);
> > > > > > > > dump x;
> > > > > > > >
> > > > > > > > ((drm,0),{(464868)})
> > > > > > > > ((drm,1),{(464868)})
> > > > > > > > ((snezz,0),{(8073),(8073)})
> > > > > > > >
> > > > > > > > but:
> > > > > > > > x = foreach g2 generate group, SUM(data.size);
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > 2011-04-24 18:02:18,910 [Thread-793] WARN
> > > > > > > >  org.apache.hadoop.mapred.LocalJobRunner - job_local_0038
> > > > > > > > org.apache.pig.backend.executionengine.ExecException: ERROR
> > 2106:
> > > > > Error
> > > > > > > > while computing sum in Initial
> > > > > > > > at
> org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:87)
> > > > > > > > at
> org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:65)
> > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
> > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273)
> > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
> > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
> > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
> > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> > > > > > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > > > > > > at
> > > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> > > > > > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > > > > > > > at
> > > > > > >
> > > > >
> > >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> > > > > > > > Caused by: java.lang.ClassCastException:
> > > > > > > org.apache.pig.data.DataByteArray
> > > > > > > > cannot be cast to java.lang.Long
> > > > > > > > at
> org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:79)
> > > > > > > > ... 14 more
> > > > > > > > 2011-04-24 18:02:19,213 [main] INFO
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > - HadoopJobId: job_local_0038
> > > > > > > > 2011-04-24 18:02:19,213 [main] INFO
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > - 0% complete
> > > > > > > > 2011-04-24 18:02:24,215 [main] INFO
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > - job job_local_0038 has failed! Stop running all dependent
> > jobs
> > > > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > - 100% complete
> > > > > > > > 2011-04-24 18:02:24,216 [main] ERROR
> > > > > > > > org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce
> > job(s)
> > > > > failed!
> > > > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > > >  org.apache.pig.tools.pigstats.PigStats
> > > > > > > > - Detected Local mode. Stats reported below may be incomplete
> > > > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > > >  org.apache.pig.tools.pigstats.PigStats
> > > > > > > > - Script Statistics:
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Pig Stack Trace
> > > > > > > > ---------------
> > > > > > > > ERROR 1066: Unable to open iterator for alias x
> > > > > > > >
> > > > > > > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR
> 1066:
> > > > > Unable to
> > > > > > > > open iterator for alias x
> > > > > > > >         at
> > > org.apache.pig.PigServer.openIterator(PigServer.java:754)
> > > > > > > >         at
> > > > > > > >
> > > > >
> > >
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
> > > > > > > >         at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
> > > > > > > >         at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> > > > > > > >         at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> > > > > > > >         at
> org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> > > > > > > >         at org.apache.pig.Main.run(Main.java:465)
> > > > > > > >         at org.apache.pig.Main.main(Main.java:107)
> > > > > > > > Caused by: java.io.IOException: Job terminated with anomalous
> > > status
> > > > > > > FAILED
> > > > > > > >         at
> > > org.apache.pig.PigServer.openIterator(PigServer.java:744)
> > > > > > > >         ... 7 more
> > > > > > >
> > > > > > >
> > > > > > >
> > > > >
> > > > >
> > > > >
> > >
> > >
> > >
> >
>

Re: SUM

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
I think it's the deep-casting issue from
https://issues.apache.org/jira/browse/PIG-1758 .
Should work in 0.9 but didn't get into 0.8 or 0.8.1

D

On Sun, Apr 24, 2011 at 9:52 AM, pob <pe...@gmail.com> wrote:

> Thats stramge, pygmalion works fine (but there are any numerical
> operations).
>
> I think Im using C* 0.7.5 where it suppose to be patched ;/ so idk :(
>
>
> 2011/4/24 Jacob Perkins <ja...@gmail.com>
>
> > That changes things entirely. There's some weirdness in the way data is
> > read from Cassandra. Have you applied the latest patches (eg.
> > https://issues.apache.org/jira/browse/CASSANDRA-2387) ?
> >
> > See also some UDFs for working with Cassandra data that Jeremy Hanna
> > (@jeromatron) wrote:
> >
> > https://github.com/jeromatron/pygmalion
> >
> >
> > Best of luck!
> >
> > --jacob
> > @thedatachef
> >
> > On Sun, 2011-04-24 at 18:31 +0200, pob wrote:
> > > Maybe I forget one more thing, rows are taken from Cassandra.
> > >
> > > rows = LOAD 'cassandra://emailArchive/messagesMetaData' USING
> > > CassandraStorage() AS (key, columns: bag {T: tuple(name, value)});
> > >
> > > I have no idea how to format AS for bag in foreach.
> > >
> > >
> > > P.
> > >
> > > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> > >
> > > > Strange, that looks right to me. What happens if you try the 'AS'
> > > > statement anyhow?
> > > >
> > > > --jacob
> > > > @thedatachef
> > > >
> > > > On Sun, 2011-04-24 at 18:22 +0200, pob wrote:
> > > > > Hello,
> > > > >
> > > > > pom = foreach rows generate myUDF.toTuple($1); -- reading data
> > > > > describe pom
> > > > > pom: {y: {t: (domain: chararray,spam: int,size: long,time: float)}}
> > > > >
> > > > > data = foreach pom generate flatten($0);
> > > > > grunt> describe data;
> > > > > data: {y::domain: chararray,y::spam: int,y::size: long,y::time:
> > float}
> > > > >
> > > > >
> > > > > I thing they are casted fine, right?
> > > > >
> > > > > UDF is python one with decorator
> > > > > @outputSchema("y:bag{t:tuple(domain:chararray, spam:int, size:long,
> > > > > time:float)}")
> > > > >
> > > > > Thanks
> > > > >
> > > > >
> > > > >
> > > > > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> > > > >
> > > > > > You're getting a 'ClassCastException' because the contents of the
> > bags
> > > > > > are DataByteArray and not long (or cannot be cast to long). I
> > suspect
> > > > > > that you're generating the contents of the bag in some way from a
> > UDF,
> > > > > > no?
> > > > > >
> > > > > > You need to either declare the output schema explicitly in the
> UDF
> > or
> > > > > > just use the 'AS' statement. For example, say you have a UDF that
> > sums
> > > > > > two numbers:
> > > > > >
> > > > > > data   = LOAD 'foobar' AS (int:a, int:b);
> > > > > > summed = FOREACH data GENERATE MyFancySummingUDF(a,b) AS
> (sum:int);
> > > > > > DUMP summed;
> > > > > >
> > > > > > --jacob
> > > > > > @thedatachef
> > > > > >
> > > > > > On Sun, 2011-04-24 at 18:02 +0200, pob wrote:
> > > > > > > x = foreach g2 generate group, data.(size);
> > > > > > > dump x;
> > > > > > >
> > > > > > > ((drm,0),{(464868)})
> > > > > > > ((drm,1),{(464868)})
> > > > > > > ((snezz,0),{(8073),(8073)})
> > > > > > >
> > > > > > > but:
> > > > > > > x = foreach g2 generate group, SUM(data.size);
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > 2011-04-24 18:02:18,910 [Thread-793] WARN
> > > > > > >  org.apache.hadoop.mapred.LocalJobRunner - job_local_0038
> > > > > > > org.apache.pig.backend.executionengine.ExecException: ERROR
> 2106:
> > > > Error
> > > > > > > while computing sum in Initial
> > > > > > > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:87)
> > > > > > > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:65)
> > > > > > > at
> > > > > > >
> > > > > >
> > > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
> > > > > > > at
> > > > > > >
> > > > > >
> > > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273)
> > > > > > > at
> > > > > > >
> > > > > >
> > > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
> > > > > > > at
> > > > > > >
> > > > > >
> > > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> > > > > > > at
> > > > > > >
> > > > > >
> > > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
> > > > > > > at
> > > > > > >
> > > > > >
> > > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
> > > > > > > at
> > > > > > >
> > > > > >
> > > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> > > > > > > at
> > > > > > >
> > > > > >
> > > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> > > > > > > at
> > > > > > >
> > > > > >
> > > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> > > > > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > > > > > at
> > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> > > > > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > > > > > > at
> > > > > >
> > > >
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> > > > > > > Caused by: java.lang.ClassCastException:
> > > > > > org.apache.pig.data.DataByteArray
> > > > > > > cannot be cast to java.lang.Long
> > > > > > > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:79)
> > > > > > > ... 14 more
> > > > > > > 2011-04-24 18:02:19,213 [main] INFO
> > > > > > >
> > > > > >
> > > >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > - HadoopJobId: job_local_0038
> > > > > > > 2011-04-24 18:02:19,213 [main] INFO
> > > > > > >
> > > > > >
> > > >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > - 0% complete
> > > > > > > 2011-04-24 18:02:24,215 [main] INFO
> > > > > > >
> > > > > >
> > > >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > - job job_local_0038 has failed! Stop running all dependent
> jobs
> > > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > > >
> > > > > >
> > > >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > - 100% complete
> > > > > > > 2011-04-24 18:02:24,216 [main] ERROR
> > > > > > > org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce
> job(s)
> > > > failed!
> > > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > >  org.apache.pig.tools.pigstats.PigStats
> > > > > > > - Detected Local mode. Stats reported below may be incomplete
> > > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > >  org.apache.pig.tools.pigstats.PigStats
> > > > > > > - Script Statistics:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Pig Stack Trace
> > > > > > > ---------------
> > > > > > > ERROR 1066: Unable to open iterator for alias x
> > > > > > >
> > > > > > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066:
> > > > Unable to
> > > > > > > open iterator for alias x
> > > > > > >         at
> > org.apache.pig.PigServer.openIterator(PigServer.java:754)
> > > > > > >         at
> > > > > > >
> > > >
> > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
> > > > > > >         at
> > > > > > >
> > > > > >
> > > >
> >
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
> > > > > > >         at
> > > > > > >
> > > > > >
> > > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> > > > > > >         at
> > > > > > >
> > > > > >
> > > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> > > > > > >         at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> > > > > > >         at org.apache.pig.Main.run(Main.java:465)
> > > > > > >         at org.apache.pig.Main.main(Main.java:107)
> > > > > > > Caused by: java.io.IOException: Job terminated with anomalous
> > status
> > > > > > FAILED
> > > > > > >         at
> > org.apache.pig.PigServer.openIterator(PigServer.java:744)
> > > > > > >         ... 7 more
> > > > > >
> > > > > >
> > > > > >
> > > >
> > > >
> > > >
> >
> >
> >
>

Re: SUM

Posted by pob <pe...@gmail.com>.
Thats stramge, pygmalion works fine (but there are any numerical
operations).

I think Im using C* 0.7.5 where it suppose to be patched ;/ so idk :(


2011/4/24 Jacob Perkins <ja...@gmail.com>

> That changes things entirely. There's some weirdness in the way data is
> read from Cassandra. Have you applied the latest patches (eg.
> https://issues.apache.org/jira/browse/CASSANDRA-2387) ?
>
> See also some UDFs for working with Cassandra data that Jeremy Hanna
> (@jeromatron) wrote:
>
> https://github.com/jeromatron/pygmalion
>
>
> Best of luck!
>
> --jacob
> @thedatachef
>
> On Sun, 2011-04-24 at 18:31 +0200, pob wrote:
> > Maybe I forget one more thing, rows are taken from Cassandra.
> >
> > rows = LOAD 'cassandra://emailArchive/messagesMetaData' USING
> > CassandraStorage() AS (key, columns: bag {T: tuple(name, value)});
> >
> > I have no idea how to format AS for bag in foreach.
> >
> >
> > P.
> >
> > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> >
> > > Strange, that looks right to me. What happens if you try the 'AS'
> > > statement anyhow?
> > >
> > > --jacob
> > > @thedatachef
> > >
> > > On Sun, 2011-04-24 at 18:22 +0200, pob wrote:
> > > > Hello,
> > > >
> > > > pom = foreach rows generate myUDF.toTuple($1); -- reading data
> > > > describe pom
> > > > pom: {y: {t: (domain: chararray,spam: int,size: long,time: float)}}
> > > >
> > > > data = foreach pom generate flatten($0);
> > > > grunt> describe data;
> > > > data: {y::domain: chararray,y::spam: int,y::size: long,y::time:
> float}
> > > >
> > > >
> > > > I thing they are casted fine, right?
> > > >
> > > > UDF is python one with decorator
> > > > @outputSchema("y:bag{t:tuple(domain:chararray, spam:int, size:long,
> > > > time:float)}")
> > > >
> > > > Thanks
> > > >
> > > >
> > > >
> > > > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> > > >
> > > > > You're getting a 'ClassCastException' because the contents of the
> bags
> > > > > are DataByteArray and not long (or cannot be cast to long). I
> suspect
> > > > > that you're generating the contents of the bag in some way from a
> UDF,
> > > > > no?
> > > > >
> > > > > You need to either declare the output schema explicitly in the UDF
> or
> > > > > just use the 'AS' statement. For example, say you have a UDF that
> sums
> > > > > two numbers:
> > > > >
> > > > > data   = LOAD 'foobar' AS (int:a, int:b);
> > > > > summed = FOREACH data GENERATE MyFancySummingUDF(a,b) AS (sum:int);
> > > > > DUMP summed;
> > > > >
> > > > > --jacob
> > > > > @thedatachef
> > > > >
> > > > > On Sun, 2011-04-24 at 18:02 +0200, pob wrote:
> > > > > > x = foreach g2 generate group, data.(size);
> > > > > > dump x;
> > > > > >
> > > > > > ((drm,0),{(464868)})
> > > > > > ((drm,1),{(464868)})
> > > > > > ((snezz,0),{(8073),(8073)})
> > > > > >
> > > > > > but:
> > > > > > x = foreach g2 generate group, SUM(data.size);
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > 2011-04-24 18:02:18,910 [Thread-793] WARN
> > > > > >  org.apache.hadoop.mapred.LocalJobRunner - job_local_0038
> > > > > > org.apache.pig.backend.executionengine.ExecException: ERROR 2106:
> > > Error
> > > > > > while computing sum in Initial
> > > > > > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:87)
> > > > > > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:65)
> > > > > > at
> > > > > >
> > > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
> > > > > > at
> > > > > >
> > > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273)
> > > > > > at
> > > > > >
> > > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
> > > > > > at
> > > > > >
> > > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> > > > > > at
> > > > > >
> > > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
> > > > > > at
> > > > > >
> > > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
> > > > > > at
> > > > > >
> > > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> > > > > > at
> > > > > >
> > > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> > > > > > at
> > > > > >
> > > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> > > > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > > > > at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> > > > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > > > > > at
> > > > >
> > >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> > > > > > Caused by: java.lang.ClassCastException:
> > > > > org.apache.pig.data.DataByteArray
> > > > > > cannot be cast to java.lang.Long
> > > > > > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:79)
> > > > > > ... 14 more
> > > > > > 2011-04-24 18:02:19,213 [main] INFO
> > > > > >
> > > > >
> > >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > - HadoopJobId: job_local_0038
> > > > > > 2011-04-24 18:02:19,213 [main] INFO
> > > > > >
> > > > >
> > >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > - 0% complete
> > > > > > 2011-04-24 18:02:24,215 [main] INFO
> > > > > >
> > > > >
> > >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > - job job_local_0038 has failed! Stop running all dependent jobs
> > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > >
> > > > >
> > >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > - 100% complete
> > > > > > 2011-04-24 18:02:24,216 [main] ERROR
> > > > > > org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s)
> > > failed!
> > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > >  org.apache.pig.tools.pigstats.PigStats
> > > > > > - Detected Local mode. Stats reported below may be incomplete
> > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > >  org.apache.pig.tools.pigstats.PigStats
> > > > > > - Script Statistics:
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Pig Stack Trace
> > > > > > ---------------
> > > > > > ERROR 1066: Unable to open iterator for alias x
> > > > > >
> > > > > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066:
> > > Unable to
> > > > > > open iterator for alias x
> > > > > >         at
> org.apache.pig.PigServer.openIterator(PigServer.java:754)
> > > > > >         at
> > > > > >
> > >
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
> > > > > >         at
> > > > > >
> > > > >
> > >
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
> > > > > >         at
> > > > > >
> > > > >
> > >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> > > > > >         at
> > > > > >
> > > > >
> > >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> > > > > >         at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> > > > > >         at org.apache.pig.Main.run(Main.java:465)
> > > > > >         at org.apache.pig.Main.main(Main.java:107)
> > > > > > Caused by: java.io.IOException: Job terminated with anomalous
> status
> > > > > FAILED
> > > > > >         at
> org.apache.pig.PigServer.openIterator(PigServer.java:744)
> > > > > >         ... 7 more
> > > > >
> > > > >
> > > > >
> > >
> > >
> > >
>
>
>

Re: SUM

Posted by Jacob Perkins <ja...@gmail.com>.
That changes things entirely. There's some weirdness in the way data is
read from Cassandra. Have you applied the latest patches (eg.
https://issues.apache.org/jira/browse/CASSANDRA-2387) ?

See also some UDFs for working with Cassandra data that Jeremy Hanna
(@jeromatron) wrote:

https://github.com/jeromatron/pygmalion


Best of luck!

--jacob
@thedatachef

On Sun, 2011-04-24 at 18:31 +0200, pob wrote:
> Maybe I forget one more thing, rows are taken from Cassandra.
> 
> rows = LOAD 'cassandra://emailArchive/messagesMetaData' USING
> CassandraStorage() AS (key, columns: bag {T: tuple(name, value)});
> 
> I have no idea how to format AS for bag in foreach.
> 
> 
> P.
> 
> 2011/4/24 Jacob Perkins <ja...@gmail.com>
> 
> > Strange, that looks right to me. What happens if you try the 'AS'
> > statement anyhow?
> >
> > --jacob
> > @thedatachef
> >
> > On Sun, 2011-04-24 at 18:22 +0200, pob wrote:
> > > Hello,
> > >
> > > pom = foreach rows generate myUDF.toTuple($1); -- reading data
> > > describe pom
> > > pom: {y: {t: (domain: chararray,spam: int,size: long,time: float)}}
> > >
> > > data = foreach pom generate flatten($0);
> > > grunt> describe data;
> > > data: {y::domain: chararray,y::spam: int,y::size: long,y::time: float}
> > >
> > >
> > > I thing they are casted fine, right?
> > >
> > > UDF is python one with decorator
> > > @outputSchema("y:bag{t:tuple(domain:chararray, spam:int, size:long,
> > > time:float)}")
> > >
> > > Thanks
> > >
> > >
> > >
> > > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> > >
> > > > You're getting a 'ClassCastException' because the contents of the bags
> > > > are DataByteArray and not long (or cannot be cast to long). I suspect
> > > > that you're generating the contents of the bag in some way from a UDF,
> > > > no?
> > > >
> > > > You need to either declare the output schema explicitly in the UDF or
> > > > just use the 'AS' statement. For example, say you have a UDF that sums
> > > > two numbers:
> > > >
> > > > data   = LOAD 'foobar' AS (int:a, int:b);
> > > > summed = FOREACH data GENERATE MyFancySummingUDF(a,b) AS (sum:int);
> > > > DUMP summed;
> > > >
> > > > --jacob
> > > > @thedatachef
> > > >
> > > > On Sun, 2011-04-24 at 18:02 +0200, pob wrote:
> > > > > x = foreach g2 generate group, data.(size);
> > > > > dump x;
> > > > >
> > > > > ((drm,0),{(464868)})
> > > > > ((drm,1),{(464868)})
> > > > > ((snezz,0),{(8073),(8073)})
> > > > >
> > > > > but:
> > > > > x = foreach g2 generate group, SUM(data.size);
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > 2011-04-24 18:02:18,910 [Thread-793] WARN
> > > > >  org.apache.hadoop.mapred.LocalJobRunner - job_local_0038
> > > > > org.apache.pig.backend.executionengine.ExecException: ERROR 2106:
> > Error
> > > > > while computing sum in Initial
> > > > > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:87)
> > > > > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:65)
> > > > > at
> > > > >
> > > >
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
> > > > > at
> > > > >
> > > >
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273)
> > > > > at
> > > > >
> > > >
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
> > > > > at
> > > > >
> > > >
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> > > > > at
> > > > >
> > > >
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
> > > > > at
> > > > >
> > > >
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
> > > > > at
> > > > >
> > > >
> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> > > > > at
> > > > >
> > > >
> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> > > > > at
> > > > >
> > > >
> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> > > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> > > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > > > > at
> > > >
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> > > > > Caused by: java.lang.ClassCastException:
> > > > org.apache.pig.data.DataByteArray
> > > > > cannot be cast to java.lang.Long
> > > > > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:79)
> > > > > ... 14 more
> > > > > 2011-04-24 18:02:19,213 [main] INFO
> > > > >
> > > >
> >  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > - HadoopJobId: job_local_0038
> > > > > 2011-04-24 18:02:19,213 [main] INFO
> > > > >
> > > >
> >  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > - 0% complete
> > > > > 2011-04-24 18:02:24,215 [main] INFO
> > > > >
> > > >
> >  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > - job job_local_0038 has failed! Stop running all dependent jobs
> > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > >
> > > >
> >  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > - 100% complete
> > > > > 2011-04-24 18:02:24,216 [main] ERROR
> > > > > org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s)
> > failed!
> > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > >  org.apache.pig.tools.pigstats.PigStats
> > > > > - Detected Local mode. Stats reported below may be incomplete
> > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > >  org.apache.pig.tools.pigstats.PigStats
> > > > > - Script Statistics:
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Pig Stack Trace
> > > > > ---------------
> > > > > ERROR 1066: Unable to open iterator for alias x
> > > > >
> > > > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066:
> > Unable to
> > > > > open iterator for alias x
> > > > >         at org.apache.pig.PigServer.openIterator(PigServer.java:754)
> > > > >         at
> > > > >
> > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
> > > > >         at
> > > > >
> > > >
> > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
> > > > >         at
> > > > >
> > > >
> > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> > > > >         at
> > > > >
> > > >
> > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> > > > >         at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> > > > >         at org.apache.pig.Main.run(Main.java:465)
> > > > >         at org.apache.pig.Main.main(Main.java:107)
> > > > > Caused by: java.io.IOException: Job terminated with anomalous status
> > > > FAILED
> > > > >         at org.apache.pig.PigServer.openIterator(PigServer.java:744)
> > > > >         ... 7 more
> > > >
> > > >
> > > >
> >
> >
> >



Re: SUM

Posted by pob <pe...@gmail.com>.
Maybe I forget one more thing, rows are taken from Cassandra.

rows = LOAD 'cassandra://emailArchive/messagesMetaData' USING
CassandraStorage() AS (key, columns: bag {T: tuple(name, value)});

I have no idea how to format AS for bag in foreach.


P.

2011/4/24 Jacob Perkins <ja...@gmail.com>

> Strange, that looks right to me. What happens if you try the 'AS'
> statement anyhow?
>
> --jacob
> @thedatachef
>
> On Sun, 2011-04-24 at 18:22 +0200, pob wrote:
> > Hello,
> >
> > pom = foreach rows generate myUDF.toTuple($1); -- reading data
> > describe pom
> > pom: {y: {t: (domain: chararray,spam: int,size: long,time: float)}}
> >
> > data = foreach pom generate flatten($0);
> > grunt> describe data;
> > data: {y::domain: chararray,y::spam: int,y::size: long,y::time: float}
> >
> >
> > I thing they are casted fine, right?
> >
> > UDF is python one with decorator
> > @outputSchema("y:bag{t:tuple(domain:chararray, spam:int, size:long,
> > time:float)}")
> >
> > Thanks
> >
> >
> >
> > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> >
> > > You're getting a 'ClassCastException' because the contents of the bags
> > > are DataByteArray and not long (or cannot be cast to long). I suspect
> > > that you're generating the contents of the bag in some way from a UDF,
> > > no?
> > >
> > > You need to either declare the output schema explicitly in the UDF or
> > > just use the 'AS' statement. For example, say you have a UDF that sums
> > > two numbers:
> > >
> > > data   = LOAD 'foobar' AS (int:a, int:b);
> > > summed = FOREACH data GENERATE MyFancySummingUDF(a,b) AS (sum:int);
> > > DUMP summed;
> > >
> > > --jacob
> > > @thedatachef
> > >
> > > On Sun, 2011-04-24 at 18:02 +0200, pob wrote:
> > > > x = foreach g2 generate group, data.(size);
> > > > dump x;
> > > >
> > > > ((drm,0),{(464868)})
> > > > ((drm,1),{(464868)})
> > > > ((snezz,0),{(8073),(8073)})
> > > >
> > > > but:
> > > > x = foreach g2 generate group, SUM(data.size);
> > > >
> > > >
> > > >
> > > >
> > > > 2011-04-24 18:02:18,910 [Thread-793] WARN
> > > >  org.apache.hadoop.mapred.LocalJobRunner - job_local_0038
> > > > org.apache.pig.backend.executionengine.ExecException: ERROR 2106:
> Error
> > > > while computing sum in Initial
> > > > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:87)
> > > > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:65)
> > > > at
> > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
> > > > at
> > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273)
> > > > at
> > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
> > > > at
> > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> > > > at
> > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
> > > > at
> > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
> > > > at
> > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> > > > at
> > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> > > > at
> > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > > > at
> > >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> > > > Caused by: java.lang.ClassCastException:
> > > org.apache.pig.data.DataByteArray
> > > > cannot be cast to java.lang.Long
> > > > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:79)
> > > > ... 14 more
> > > > 2011-04-24 18:02:19,213 [main] INFO
> > > >
> > >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > - HadoopJobId: job_local_0038
> > > > 2011-04-24 18:02:19,213 [main] INFO
> > > >
> > >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > - 0% complete
> > > > 2011-04-24 18:02:24,215 [main] INFO
> > > >
> > >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > - job job_local_0038 has failed! Stop running all dependent jobs
> > > > 2011-04-24 18:02:24,216 [main] INFO
> > > >
> > >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > - 100% complete
> > > > 2011-04-24 18:02:24,216 [main] ERROR
> > > > org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s)
> failed!
> > > > 2011-04-24 18:02:24,216 [main] INFO
> > >  org.apache.pig.tools.pigstats.PigStats
> > > > - Detected Local mode. Stats reported below may be incomplete
> > > > 2011-04-24 18:02:24,216 [main] INFO
> > >  org.apache.pig.tools.pigstats.PigStats
> > > > - Script Statistics:
> > > >
> > > >
> > > >
> > > >
> > > > Pig Stack Trace
> > > > ---------------
> > > > ERROR 1066: Unable to open iterator for alias x
> > > >
> > > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066:
> Unable to
> > > > open iterator for alias x
> > > >         at org.apache.pig.PigServer.openIterator(PigServer.java:754)
> > > >         at
> > > >
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
> > > >         at
> > > >
> > >
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
> > > >         at
> > > >
> > >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> > > >         at
> > > >
> > >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> > > >         at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> > > >         at org.apache.pig.Main.run(Main.java:465)
> > > >         at org.apache.pig.Main.main(Main.java:107)
> > > > Caused by: java.io.IOException: Job terminated with anomalous status
> > > FAILED
> > > >         at org.apache.pig.PigServer.openIterator(PigServer.java:744)
> > > >         ... 7 more
> > >
> > >
> > >
>
>
>

Re: SUM

Posted by Jacob Perkins <ja...@gmail.com>.
Strange, that looks right to me. What happens if you try the 'AS'
statement anyhow?

--jacob
@thedatachef

On Sun, 2011-04-24 at 18:22 +0200, pob wrote:
> Hello,
> 
> pom = foreach rows generate myUDF.toTuple($1); -- reading data
> describe pom
> pom: {y: {t: (domain: chararray,spam: int,size: long,time: float)}}
> 
> data = foreach pom generate flatten($0);
> grunt> describe data;
> data: {y::domain: chararray,y::spam: int,y::size: long,y::time: float}
> 
> 
> I thing they are casted fine, right?
> 
> UDF is python one with decorator
> @outputSchema("y:bag{t:tuple(domain:chararray, spam:int, size:long,
> time:float)}")
> 
> Thanks
> 
> 
> 
> 2011/4/24 Jacob Perkins <ja...@gmail.com>
> 
> > You're getting a 'ClassCastException' because the contents of the bags
> > are DataByteArray and not long (or cannot be cast to long). I suspect
> > that you're generating the contents of the bag in some way from a UDF,
> > no?
> >
> > You need to either declare the output schema explicitly in the UDF or
> > just use the 'AS' statement. For example, say you have a UDF that sums
> > two numbers:
> >
> > data   = LOAD 'foobar' AS (int:a, int:b);
> > summed = FOREACH data GENERATE MyFancySummingUDF(a,b) AS (sum:int);
> > DUMP summed;
> >
> > --jacob
> > @thedatachef
> >
> > On Sun, 2011-04-24 at 18:02 +0200, pob wrote:
> > > x = foreach g2 generate group, data.(size);
> > > dump x;
> > >
> > > ((drm,0),{(464868)})
> > > ((drm,1),{(464868)})
> > > ((snezz,0),{(8073),(8073)})
> > >
> > > but:
> > > x = foreach g2 generate group, SUM(data.size);
> > >
> > >
> > >
> > >
> > > 2011-04-24 18:02:18,910 [Thread-793] WARN
> > >  org.apache.hadoop.mapred.LocalJobRunner - job_local_0038
> > > org.apache.pig.backend.executionengine.ExecException: ERROR 2106: Error
> > > while computing sum in Initial
> > > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:87)
> > > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:65)
> > > at
> > >
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
> > > at
> > >
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273)
> > > at
> > >
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
> > > at
> > >
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> > > at
> > >
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
> > > at
> > >
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
> > > at
> > >
> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> > > at
> > >
> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> > > at
> > >
> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > > at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> > > Caused by: java.lang.ClassCastException:
> > org.apache.pig.data.DataByteArray
> > > cannot be cast to java.lang.Long
> > > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:79)
> > > ... 14 more
> > > 2011-04-24 18:02:19,213 [main] INFO
> > >
> >  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > - HadoopJobId: job_local_0038
> > > 2011-04-24 18:02:19,213 [main] INFO
> > >
> >  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > - 0% complete
> > > 2011-04-24 18:02:24,215 [main] INFO
> > >
> >  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > - job job_local_0038 has failed! Stop running all dependent jobs
> > > 2011-04-24 18:02:24,216 [main] INFO
> > >
> >  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > - 100% complete
> > > 2011-04-24 18:02:24,216 [main] ERROR
> > > org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
> > > 2011-04-24 18:02:24,216 [main] INFO
> >  org.apache.pig.tools.pigstats.PigStats
> > > - Detected Local mode. Stats reported below may be incomplete
> > > 2011-04-24 18:02:24,216 [main] INFO
> >  org.apache.pig.tools.pigstats.PigStats
> > > - Script Statistics:
> > >
> > >
> > >
> > >
> > > Pig Stack Trace
> > > ---------------
> > > ERROR 1066: Unable to open iterator for alias x
> > >
> > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
> > > open iterator for alias x
> > >         at org.apache.pig.PigServer.openIterator(PigServer.java:754)
> > >         at
> > > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
> > >         at
> > >
> > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
> > >         at
> > >
> > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> > >         at
> > >
> > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> > >         at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> > >         at org.apache.pig.Main.run(Main.java:465)
> > >         at org.apache.pig.Main.main(Main.java:107)
> > > Caused by: java.io.IOException: Job terminated with anomalous status
> > FAILED
> > >         at org.apache.pig.PigServer.openIterator(PigServer.java:744)
> > >         ... 7 more
> >
> >
> >



Re: SUM

Posted by pob <pe...@gmail.com>.
Hello,

pom = foreach rows generate myUDF.toTuple($1); -- reading data
describe pom
pom: {y: {t: (domain: chararray,spam: int,size: long,time: float)}}

data = foreach pom generate flatten($0);
grunt> describe data;
data: {y::domain: chararray,y::spam: int,y::size: long,y::time: float}


I thing they are casted fine, right?

UDF is python one with decorator
@outputSchema("y:bag{t:tuple(domain:chararray, spam:int, size:long,
time:float)}")

Thanks



2011/4/24 Jacob Perkins <ja...@gmail.com>

> You're getting a 'ClassCastException' because the contents of the bags
> are DataByteArray and not long (or cannot be cast to long). I suspect
> that you're generating the contents of the bag in some way from a UDF,
> no?
>
> You need to either declare the output schema explicitly in the UDF or
> just use the 'AS' statement. For example, say you have a UDF that sums
> two numbers:
>
> data   = LOAD 'foobar' AS (int:a, int:b);
> summed = FOREACH data GENERATE MyFancySummingUDF(a,b) AS (sum:int);
> DUMP summed;
>
> --jacob
> @thedatachef
>
> On Sun, 2011-04-24 at 18:02 +0200, pob wrote:
> > x = foreach g2 generate group, data.(size);
> > dump x;
> >
> > ((drm,0),{(464868)})
> > ((drm,1),{(464868)})
> > ((snezz,0),{(8073),(8073)})
> >
> > but:
> > x = foreach g2 generate group, SUM(data.size);
> >
> >
> >
> >
> > 2011-04-24 18:02:18,910 [Thread-793] WARN
> >  org.apache.hadoop.mapred.LocalJobRunner - job_local_0038
> > org.apache.pig.backend.executionengine.ExecException: ERROR 2106: Error
> > while computing sum in Initial
> > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:87)
> > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:65)
> > at
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
> > at
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273)
> > at
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
> > at
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> > at
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
> > at
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
> > at
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> > at
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> > at
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> > Caused by: java.lang.ClassCastException:
> org.apache.pig.data.DataByteArray
> > cannot be cast to java.lang.Long
> > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:79)
> > ... 14 more
> > 2011-04-24 18:02:19,213 [main] INFO
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - HadoopJobId: job_local_0038
> > 2011-04-24 18:02:19,213 [main] INFO
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - 0% complete
> > 2011-04-24 18:02:24,215 [main] INFO
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - job job_local_0038 has failed! Stop running all dependent jobs
> > 2011-04-24 18:02:24,216 [main] INFO
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - 100% complete
> > 2011-04-24 18:02:24,216 [main] ERROR
> > org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
> > 2011-04-24 18:02:24,216 [main] INFO
>  org.apache.pig.tools.pigstats.PigStats
> > - Detected Local mode. Stats reported below may be incomplete
> > 2011-04-24 18:02:24,216 [main] INFO
>  org.apache.pig.tools.pigstats.PigStats
> > - Script Statistics:
> >
> >
> >
> >
> > Pig Stack Trace
> > ---------------
> > ERROR 1066: Unable to open iterator for alias x
> >
> > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
> > open iterator for alias x
> >         at org.apache.pig.PigServer.openIterator(PigServer.java:754)
> >         at
> > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
> >         at
> >
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
> >         at
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> >         at
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> >         at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> >         at org.apache.pig.Main.run(Main.java:465)
> >         at org.apache.pig.Main.main(Main.java:107)
> > Caused by: java.io.IOException: Job terminated with anomalous status
> FAILED
> >         at org.apache.pig.PigServer.openIterator(PigServer.java:744)
> >         ... 7 more
>
>
>

Re: SUM

Posted by Jacob Perkins <ja...@gmail.com>.
You're getting a 'ClassCastException' because the contents of the bags
are DataByteArray and not long (or cannot be cast to long). I suspect
that you're generating the contents of the bag in some way from a UDF,
no? 

You need to either declare the output schema explicitly in the UDF or
just use the 'AS' statement. For example, say you have a UDF that sums
two numbers:

data   = LOAD 'foobar' AS (int:a, int:b);
summed = FOREACH data GENERATE MyFancySummingUDF(a,b) AS (sum:int);
DUMP summed;

--jacob
@thedatachef

On Sun, 2011-04-24 at 18:02 +0200, pob wrote:
> x = foreach g2 generate group, data.(size);
> dump x;
> 
> ((drm,0),{(464868)})
> ((drm,1),{(464868)})
> ((snezz,0),{(8073),(8073)})
> 
> but:
> x = foreach g2 generate group, SUM(data.size);
> 
> 
> 
> 
> 2011-04-24 18:02:18,910 [Thread-793] WARN
>  org.apache.hadoop.mapred.LocalJobRunner - job_local_0038
> org.apache.pig.backend.executionengine.ExecException: ERROR 2106: Error
> while computing sum in Initial
> at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:87)
> at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:65)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> Caused by: java.lang.ClassCastException: org.apache.pig.data.DataByteArray
> cannot be cast to java.lang.Long
> at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:79)
> ... 14 more
> 2011-04-24 18:02:19,213 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - HadoopJobId: job_local_0038
> 2011-04-24 18:02:19,213 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 0% complete
> 2011-04-24 18:02:24,215 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - job job_local_0038 has failed! Stop running all dependent jobs
> 2011-04-24 18:02:24,216 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 100% complete
> 2011-04-24 18:02:24,216 [main] ERROR
> org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
> 2011-04-24 18:02:24,216 [main] INFO  org.apache.pig.tools.pigstats.PigStats
> - Detected Local mode. Stats reported below may be incomplete
> 2011-04-24 18:02:24,216 [main] INFO  org.apache.pig.tools.pigstats.PigStats
> - Script Statistics:
> 
> 
> 
> 
> Pig Stack Trace
> ---------------
> ERROR 1066: Unable to open iterator for alias x
> 
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
> open iterator for alias x
>         at org.apache.pig.PigServer.openIterator(PigServer.java:754)
>         at
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
>         at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
>         at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>         at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
>         at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
>         at org.apache.pig.Main.run(Main.java:465)
>         at org.apache.pig.Main.main(Main.java:107)
> Caused by: java.io.IOException: Job terminated with anomalous status FAILED
>         at org.apache.pig.PigServer.openIterator(PigServer.java:744)
>         ... 7 more