You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by pob <pe...@gmail.com> on 2011/04/24 18:02:51 UTC
SUM
x = foreach g2 generate group, data.(size);
dump x;
((drm,0),{(464868)})
((drm,1),{(464868)})
((snezz,0),{(8073),(8073)})
but:
x = foreach g2 generate group, SUM(data.size);
2011-04-24 18:02:18,910 [Thread-793] WARN
org.apache.hadoop.mapred.LocalJobRunner - job_local_0038
org.apache.pig.backend.executionengine.ExecException: ERROR 2106: Error
while computing sum in Initial
at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:87)
at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:65)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
Caused by: java.lang.ClassCastException: org.apache.pig.data.DataByteArray
cannot be cast to java.lang.Long
at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:79)
... 14 more
2011-04-24 18:02:19,213 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_local_0038
2011-04-24 18:02:19,213 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete
2011-04-24 18:02:24,215 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- job job_local_0038 has failed! Stop running all dependent jobs
2011-04-24 18:02:24,216 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete
2011-04-24 18:02:24,216 [main] ERROR
org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2011-04-24 18:02:24,216 [main] INFO org.apache.pig.tools.pigstats.PigStats
- Detected Local mode. Stats reported below may be incomplete
2011-04-24 18:02:24,216 [main] INFO org.apache.pig.tools.pigstats.PigStats
- Script Statistics:
Pig Stack Trace
---------------
ERROR 1066: Unable to open iterator for alias x
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
open iterator for alias x
at org.apache.pig.PigServer.openIterator(PigServer.java:754)
at
org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
at org.apache.pig.Main.run(Main.java:465)
at org.apache.pig.Main.main(Main.java:107)
Caused by: java.io.IOException: Job terminated with anomalous status FAILED
at org.apache.pig.PigServer.openIterator(PigServer.java:744)
... 7 more
Re: SUM
Posted by pob <pe...@gmail.com>.
Even tho this simple thing doesnt work too:
grunt> z = foreach data generate time+size;
WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0004
java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be
cast to java.lang.Float
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.getNext(Add.java:97)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
2011/4/24 pob <pe...@gmail.com>
> This one doesnt work...
>
>
>
> pom = foreach rows generate myUDF.toTuple($1) AS
> (b:bag{t:tuple(domain:chararray,spam:int,size:long,time:float)});
>
> 2011-04-24 18:40:15,622 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1000: Error during parsing. Encountered "" at line 1, column 50
> Was expecting one of:
>
>
>
> 2011/4/24 Jacob Perkins <ja...@gmail.com>
>
>> Strange, that looks right to me. What happens if you try the 'AS'
>> statement anyhow?
>>
>> --jacob
>> @thedatachef
>>
>> On Sun, 2011-04-24 at 18:22 +0200, pob wrote:
>> > Hello,
>> >
>> > pom = foreach rows generate myUDF.toTuple($1); -- reading data
>> > describe pom
>> > pom: {y: {t: (domain: chararray,spam: int,size: long,time: float)}}
>> >
>> > data = foreach pom generate flatten($0);
>> > grunt> describe data;
>> > data: {y::domain: chararray,y::spam: int,y::size: long,y::time: float}
>> >
>> >
>> > I thing they are casted fine, right?
>> >
>> > UDF is python one with decorator
>> > @outputSchema("y:bag{t:tuple(domain:chararray, spam:int, size:long,
>> > time:float)}")
>> >
>> > Thanks
>> >
>> >
>> >
>> > 2011/4/24 Jacob Perkins <ja...@gmail.com>
>> >
>> > > You're getting a 'ClassCastException' because the contents of the bags
>> > > are DataByteArray and not long (or cannot be cast to long). I suspect
>> > > that you're generating the contents of the bag in some way from a UDF,
>> > > no?
>> > >
>> > > You need to either declare the output schema explicitly in the UDF or
>> > > just use the 'AS' statement. For example, say you have a UDF that sums
>> > > two numbers:
>> > >
>> > > data = LOAD 'foobar' AS (int:a, int:b);
>> > > summed = FOREACH data GENERATE MyFancySummingUDF(a,b) AS (sum:int);
>> > > DUMP summed;
>> > >
>> > > --jacob
>> > > @thedatachef
>> > >
>> > > On Sun, 2011-04-24 at 18:02 +0200, pob wrote:
>> > > > x = foreach g2 generate group, data.(size);
>> > > > dump x;
>> > > >
>> > > > ((drm,0),{(464868)})
>> > > > ((drm,1),{(464868)})
>> > > > ((snezz,0),{(8073),(8073)})
>> > > >
>> > > > but:
>> > > > x = foreach g2 generate group, SUM(data.size);
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > 2011-04-24 18:02:18,910 [Thread-793] WARN
>> > > > org.apache.hadoop.mapred.LocalJobRunner - job_local_0038
>> > > > org.apache.pig.backend.executionengine.ExecException: ERROR 2106:
>> Error
>> > > > while computing sum in Initial
>> > > > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:87)
>> > > > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:65)
>> > > > at
>> > > >
>> > >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
>> > > > at
>> > > >
>> > >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273)
>> > > > at
>> > > >
>> > >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
>> > > > at
>> > > >
>> > >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
>> > > > at
>> > > >
>> > >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
>> > > > at
>> > > >
>> > >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
>> > > > at
>> > > >
>> > >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
>> > > > at
>> > > >
>> > >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
>> > > > at
>> > > >
>> > >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
>> > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>> > > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>> > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>> > > > at
>> > >
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>> > > > Caused by: java.lang.ClassCastException:
>> > > org.apache.pig.data.DataByteArray
>> > > > cannot be cast to java.lang.Long
>> > > > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:79)
>> > > > ... 14 more
>> > > > 2011-04-24 18:02:19,213 [main] INFO
>> > > >
>> > >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> > > > - HadoopJobId: job_local_0038
>> > > > 2011-04-24 18:02:19,213 [main] INFO
>> > > >
>> > >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> > > > - 0% complete
>> > > > 2011-04-24 18:02:24,215 [main] INFO
>> > > >
>> > >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> > > > - job job_local_0038 has failed! Stop running all dependent jobs
>> > > > 2011-04-24 18:02:24,216 [main] INFO
>> > > >
>> > >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> > > > - 100% complete
>> > > > 2011-04-24 18:02:24,216 [main] ERROR
>> > > > org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s)
>> failed!
>> > > > 2011-04-24 18:02:24,216 [main] INFO
>> > > org.apache.pig.tools.pigstats.PigStats
>> > > > - Detected Local mode. Stats reported below may be incomplete
>> > > > 2011-04-24 18:02:24,216 [main] INFO
>> > > org.apache.pig.tools.pigstats.PigStats
>> > > > - Script Statistics:
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > Pig Stack Trace
>> > > > ---------------
>> > > > ERROR 1066: Unable to open iterator for alias x
>> > > >
>> > > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066:
>> Unable to
>> > > > open iterator for alias x
>> > > > at org.apache.pig.PigServer.openIterator(PigServer.java:754)
>> > > > at
>> > > >
>> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
>> > > > at
>> > > >
>> > >
>> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
>> > > > at
>> > > >
>> > >
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>> > > > at
>> > > >
>> > >
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
>> > > > at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
>> > > > at org.apache.pig.Main.run(Main.java:465)
>> > > > at org.apache.pig.Main.main(Main.java:107)
>> > > > Caused by: java.io.IOException: Job terminated with anomalous status
>> > > FAILED
>> > > > at org.apache.pig.PigServer.openIterator(PigServer.java:744)
>> > > > ... 7 more
>> > >
>> > >
>> > >
>>
>>
>>
>
Re: SUM
Posted by pob <pe...@gmail.com>.
This one doesnt work...
pom = foreach rows generate myUDF.toTuple($1) AS
(b:bag{t:tuple(domain:chararray,spam:int,size:long,time:float)});
2011-04-24 18:40:15,622 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1000: Error during parsing. Encountered "" at line 1, column 50
Was expecting one of:
2011/4/24 Jacob Perkins <ja...@gmail.com>
> Strange, that looks right to me. What happens if you try the 'AS'
> statement anyhow?
>
> --jacob
> @thedatachef
>
> On Sun, 2011-04-24 at 18:22 +0200, pob wrote:
> > Hello,
> >
> > pom = foreach rows generate myUDF.toTuple($1); -- reading data
> > describe pom
> > pom: {y: {t: (domain: chararray,spam: int,size: long,time: float)}}
> >
> > data = foreach pom generate flatten($0);
> > grunt> describe data;
> > data: {y::domain: chararray,y::spam: int,y::size: long,y::time: float}
> >
> >
> > I thing they are casted fine, right?
> >
> > UDF is python one with decorator
> > @outputSchema("y:bag{t:tuple(domain:chararray, spam:int, size:long,
> > time:float)}")
> >
> > Thanks
> >
> >
> >
> > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> >
> > > You're getting a 'ClassCastException' because the contents of the bags
> > > are DataByteArray and not long (or cannot be cast to long). I suspect
> > > that you're generating the contents of the bag in some way from a UDF,
> > > no?
> > >
> > > You need to either declare the output schema explicitly in the UDF or
> > > just use the 'AS' statement. For example, say you have a UDF that sums
> > > two numbers:
> > >
> > > data = LOAD 'foobar' AS (int:a, int:b);
> > > summed = FOREACH data GENERATE MyFancySummingUDF(a,b) AS (sum:int);
> > > DUMP summed;
> > >
> > > --jacob
> > > @thedatachef
> > >
> > > On Sun, 2011-04-24 at 18:02 +0200, pob wrote:
> > > > x = foreach g2 generate group, data.(size);
> > > > dump x;
> > > >
> > > > ((drm,0),{(464868)})
> > > > ((drm,1),{(464868)})
> > > > ((snezz,0),{(8073),(8073)})
> > > >
> > > > but:
> > > > x = foreach g2 generate group, SUM(data.size);
> > > >
> > > >
> > > >
> > > >
> > > > 2011-04-24 18:02:18,910 [Thread-793] WARN
> > > > org.apache.hadoop.mapred.LocalJobRunner - job_local_0038
> > > > org.apache.pig.backend.executionengine.ExecException: ERROR 2106:
> Error
> > > > while computing sum in Initial
> > > > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:87)
> > > > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:65)
> > > > at
> > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
> > > > at
> > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273)
> > > > at
> > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
> > > > at
> > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> > > > at
> > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
> > > > at
> > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
> > > > at
> > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> > > > at
> > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> > > > at
> > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > > > at
> > >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> > > > Caused by: java.lang.ClassCastException:
> > > org.apache.pig.data.DataByteArray
> > > > cannot be cast to java.lang.Long
> > > > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:79)
> > > > ... 14 more
> > > > 2011-04-24 18:02:19,213 [main] INFO
> > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > - HadoopJobId: job_local_0038
> > > > 2011-04-24 18:02:19,213 [main] INFO
> > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > - 0% complete
> > > > 2011-04-24 18:02:24,215 [main] INFO
> > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > - job job_local_0038 has failed! Stop running all dependent jobs
> > > > 2011-04-24 18:02:24,216 [main] INFO
> > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > - 100% complete
> > > > 2011-04-24 18:02:24,216 [main] ERROR
> > > > org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s)
> failed!
> > > > 2011-04-24 18:02:24,216 [main] INFO
> > > org.apache.pig.tools.pigstats.PigStats
> > > > - Detected Local mode. Stats reported below may be incomplete
> > > > 2011-04-24 18:02:24,216 [main] INFO
> > > org.apache.pig.tools.pigstats.PigStats
> > > > - Script Statistics:
> > > >
> > > >
> > > >
> > > >
> > > > Pig Stack Trace
> > > > ---------------
> > > > ERROR 1066: Unable to open iterator for alias x
> > > >
> > > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066:
> Unable to
> > > > open iterator for alias x
> > > > at org.apache.pig.PigServer.openIterator(PigServer.java:754)
> > > > at
> > > >
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
> > > > at
> > > >
> > >
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
> > > > at
> > > >
> > >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> > > > at
> > > >
> > >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> > > > at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> > > > at org.apache.pig.Main.run(Main.java:465)
> > > > at org.apache.pig.Main.main(Main.java:107)
> > > > Caused by: java.io.IOException: Job terminated with anomalous status
> > > FAILED
> > > > at org.apache.pig.PigServer.openIterator(PigServer.java:744)
> > > > ... 7 more
> > >
> > >
> > >
>
>
>
Re: SUM
Posted by pob <pe...@gmail.com>.
If think if i switch to 0.9 something another gets broken with Cassandra
2011/4/24 Dmitriy Ryaboy <dv...@gmail.com>
> I think it's the deep-casting issue from
> https://issues.apache.org/jira/browse/PIG-1758 .
> Should work in 0.9 but didn't get into 0.8 or 0.8.1
>
> D
>
> On Sun, Apr 24, 2011 at 9:52 AM, pob <pe...@gmail.com> wrote:
>
> > Thats stramge, pygmalion works fine (but there are any numerical
> > operations).
> >
> > I think Im using C* 0.7.5 where it suppose to be patched ;/ so idk :(
> >
> >
> > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> >
> > > That changes things entirely. There's some weirdness in the way data is
> > > read from Cassandra. Have you applied the latest patches (eg.
> > > https://issues.apache.org/jira/browse/CASSANDRA-2387) ?
> > >
> > > See also some UDFs for working with Cassandra data that Jeremy Hanna
> > > (@jeromatron) wrote:
> > >
> > > https://github.com/jeromatron/pygmalion
> > >
> > >
> > > Best of luck!
> > >
> > > --jacob
> > > @thedatachef
> > >
> > > On Sun, 2011-04-24 at 18:31 +0200, pob wrote:
> > > > Maybe I forget one more thing, rows are taken from Cassandra.
> > > >
> > > > rows = LOAD 'cassandra://emailArchive/messagesMetaData' USING
> > > > CassandraStorage() AS (key, columns: bag {T: tuple(name, value)});
> > > >
> > > > I have no idea how to format AS for bag in foreach.
> > > >
> > > >
> > > > P.
> > > >
> > > > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> > > >
> > > > > Strange, that looks right to me. What happens if you try the 'AS'
> > > > > statement anyhow?
> > > > >
> > > > > --jacob
> > > > > @thedatachef
> > > > >
> > > > > On Sun, 2011-04-24 at 18:22 +0200, pob wrote:
> > > > > > Hello,
> > > > > >
> > > > > > pom = foreach rows generate myUDF.toTuple($1); -- reading data
> > > > > > describe pom
> > > > > > pom: {y: {t: (domain: chararray,spam: int,size: long,time:
> float)}}
> > > > > >
> > > > > > data = foreach pom generate flatten($0);
> > > > > > grunt> describe data;
> > > > > > data: {y::domain: chararray,y::spam: int,y::size: long,y::time:
> > > float}
> > > > > >
> > > > > >
> > > > > > I thing they are casted fine, right?
> > > > > >
> > > > > > UDF is python one with decorator
> > > > > > @outputSchema("y:bag{t:tuple(domain:chararray, spam:int,
> size:long,
> > > > > > time:float)}")
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > >
> > > > > >
> > > > > > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> > > > > >
> > > > > > > You're getting a 'ClassCastException' because the contents of
> the
> > > bags
> > > > > > > are DataByteArray and not long (or cannot be cast to long). I
> > > suspect
> > > > > > > that you're generating the contents of the bag in some way from
> a
> > > UDF,
> > > > > > > no?
> > > > > > >
> > > > > > > You need to either declare the output schema explicitly in the
> > UDF
> > > or
> > > > > > > just use the 'AS' statement. For example, say you have a UDF
> that
> > > sums
> > > > > > > two numbers:
> > > > > > >
> > > > > > > data = LOAD 'foobar' AS (int:a, int:b);
> > > > > > > summed = FOREACH data GENERATE MyFancySummingUDF(a,b) AS
> > (sum:int);
> > > > > > > DUMP summed;
> > > > > > >
> > > > > > > --jacob
> > > > > > > @thedatachef
> > > > > > >
> > > > > > > On Sun, 2011-04-24 at 18:02 +0200, pob wrote:
> > > > > > > > x = foreach g2 generate group, data.(size);
> > > > > > > > dump x;
> > > > > > > >
> > > > > > > > ((drm,0),{(464868)})
> > > > > > > > ((drm,1),{(464868)})
> > > > > > > > ((snezz,0),{(8073),(8073)})
> > > > > > > >
> > > > > > > > but:
> > > > > > > > x = foreach g2 generate group, SUM(data.size);
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > 2011-04-24 18:02:18,910 [Thread-793] WARN
> > > > > > > > org.apache.hadoop.mapred.LocalJobRunner - job_local_0038
> > > > > > > > org.apache.pig.backend.executionengine.ExecException: ERROR
> > 2106:
> > > > > Error
> > > > > > > > while computing sum in Initial
> > > > > > > > at
> org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:87)
> > > > > > > > at
> org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:65)
> > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
> > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273)
> > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
> > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
> > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
> > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> > > > > > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > > > > > > at
> > > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> > > > > > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > > > > > > > at
> > > > > > >
> > > > >
> > >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> > > > > > > > Caused by: java.lang.ClassCastException:
> > > > > > > org.apache.pig.data.DataByteArray
> > > > > > > > cannot be cast to java.lang.Long
> > > > > > > > at
> org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:79)
> > > > > > > > ... 14 more
> > > > > > > > 2011-04-24 18:02:19,213 [main] INFO
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > - HadoopJobId: job_local_0038
> > > > > > > > 2011-04-24 18:02:19,213 [main] INFO
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > - 0% complete
> > > > > > > > 2011-04-24 18:02:24,215 [main] INFO
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > - job job_local_0038 has failed! Stop running all dependent
> > jobs
> > > > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > - 100% complete
> > > > > > > > 2011-04-24 18:02:24,216 [main] ERROR
> > > > > > > > org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce
> > job(s)
> > > > > failed!
> > > > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > > > org.apache.pig.tools.pigstats.PigStats
> > > > > > > > - Detected Local mode. Stats reported below may be incomplete
> > > > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > > > org.apache.pig.tools.pigstats.PigStats
> > > > > > > > - Script Statistics:
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Pig Stack Trace
> > > > > > > > ---------------
> > > > > > > > ERROR 1066: Unable to open iterator for alias x
> > > > > > > >
> > > > > > > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR
> 1066:
> > > > > Unable to
> > > > > > > > open iterator for alias x
> > > > > > > > at
> > > org.apache.pig.PigServer.openIterator(PigServer.java:754)
> > > > > > > > at
> > > > > > > >
> > > > >
> > >
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
> > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
> > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> > > > > > > > at
> org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> > > > > > > > at org.apache.pig.Main.run(Main.java:465)
> > > > > > > > at org.apache.pig.Main.main(Main.java:107)
> > > > > > > > Caused by: java.io.IOException: Job terminated with anomalous
> > > status
> > > > > > > FAILED
> > > > > > > > at
> > > org.apache.pig.PigServer.openIterator(PigServer.java:744)
> > > > > > > > ... 7 more
> > > > > > >
> > > > > > >
> > > > > > >
> > > > >
> > > > >
> > > > >
> > >
> > >
> > >
> >
>
Re: SUM
Posted by Jeremy Hanna <je...@gmail.com>.
Looks like you just need a validation class in Cassandra and it should work. That is, look at the column_metadata for that column below.
For example here's the script to create the data on the cassandra cli:
create keyspace pygmalion;
use pygmalion;
create column family account with comparator = UTF8Type and default_validation_class = UTF8Type with column_metadata=[{column_name: num_heads, validation_class: LongType}];
create column family betelgeuse with comparator = UTF8Type and default_validation_class = UTF8Type;
set account['hipcat']['first_name'] = 'Zaphod';
set account['hipcat']['last_name'] = 'Beeblebrox';
set account['hipcat']['birth_place'] = 'Betelgeuse Five';
set account['hipcat']['num_heads'] = '2';
set account['hoopyfrood']['first_name'] = 'Ford';
set account['hoopyfrood']['last_name'] = 'Prefect';
set account['hoopyfrood']['birth_place'] = 'Betelgeuse Five';
set account['hoopyfrood']['num_heads'] = '1';
set account['earthman']['first_name'] = 'Arthur';
set account['earthman']['last_name'] = 'Dent';
set account['earthman']['birth_place'] = 'Earth';
set account['earthman']['num_heads'] = '1';
And here's the pig script:
register '/Users/jeremyhanna/Work/pygmalion/udf/target/pygmalion-1.0.0-SNAPSHOT.jar';
raw = LOAD 'cassandra://pygmalion/account' USING CassandraStorage() AS (key:chararray, columns:bag {column:tuple (name, value)});
rows = FOREACH raw GENERATE key, FLATTEN(org.pygmalion.udf.FromCassandraBag('first_name, last_name, birth_place, num_heads', columns)) AS (
first_name:chararray,
last_name:chararray,
birth_place:chararray,
num_heads:long
);
b = group rows by key;
x = foreach b generate group, SUM(rows.num_heads);
dump x;
That works and returns:
(hipcat,2)
(earthman,1)
(hoopyfrood,1)
That should work the same with your python UDF.
On Apr 25, 2011, at 9:59 AM, Jeremy Hanna wrote:
> Sorry - I've been kind of out of it this weekend. Talking about it on IRC. What I'd like to do is get a small set of data and a script that can reproduce what you're trying to do and then try various things in my own environment. That way we can more easily log a Cassandra ticket if it can't be worked into what's currently there. I'll respond to this thread when we have something to go forward with.
>
> On Apr 24, 2011, at 3:28 PM, Dmitriy Ryaboy wrote:
>
>> Sigh. @jeromatron , @thedatachef -- this one's on you :). Toldya you need
>> the LoadCaster...
>>
>>
>> D
>>
>> On Sun, Apr 24, 2011 at 1:17 PM, pob <pe...@gmail.com> wrote:
>>
>>> hello,
>>>
>>> thanks but w/out sucess ;/
>>>
>>>
>>> grunt> pom = foreach rows generate myUDF.toTuple($1);
>>> grunt> describe pom
>>> pom: {y: {t: (domain: bytearray,spam: bytearray,size: bytearray,time:
>>> bytearray)}}
>>> grunt> data = foreach pom generate flatten($0) as (domain, spam, size,
>>> time);
>>> grunt> data = foreach data generate (chararray) domain, (int) spam, (long)
>>> size,
>>>>> (float) time;
>>> grunt> describe data;
>>> data: {domain: chararray,spam: int,size: long,time: float}
>>>
>>> z = foreach data generate time+size;
>>>
>>>
>>> org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received
>>> a
>>> bytearray from the UDF. Cannot determine how to convert the bytearray to
>>> float.
>>> at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:529)
>>> at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.getNext(Add.java:92)
>>> at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
>>> at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
>>> at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
>>> at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
>>> at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>>> 2011-04-24 22:16:06,129 [main] INFO
>>>
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>> - job job_local_0001 has failed! Stop running all dependent jobs
>>>
>>>
>>>
>>>
>>> z = foreach data generate time
>>>
>>>
>>> org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received
>>> a
>>> bytearray from the UDF. Cannot determine how to convert the bytearray to
>>> float.
>>> at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:529)
>>> at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
>>> at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
>>> at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
>>> at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
>>> at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>>>
>>>
>>> 2011/4/24 Dmitriy Ryaboy <dv...@gmail.com>
>>>
>>>> Try this:
>>>>
>>>> data = foreach pom generate flatten($0) as (domain, spam, size, time);
>>>> data = foreach data generate (chararray) domain, (int) spam, (long) size,
>>>> (float) time;
>>>>
>>>> Pig is inconsistent in what "as foo:type" does vs " (type) foo"
>>>>
>>>> D
>>>>
>>>> On Sun, Apr 24, 2011 at 10:44 AM, pob <pe...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> but why i cant re-cast it during flatten?
>>>>>
>>>>>
>>>>> data = foreach pom generate flatten($0) AS (domain:chararray, spam:int,
>>>>> size:long, time:float);
>>>>>
>>>>> grunt> z = foreach data generate time+size;
>>>>>
>>>>>
>>>>> java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot
>>> be
>>>>> cast to java.lang.Float
>>>>> at
>>>>>
>>>>>
>>>>
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.getNext(Add.java:97)
>>>>> at
>>>>>
>>>>>
>>>>
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
>>>>> at
>>>>>
>>>>>
>>>>
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
>>>>> at
>>>>>
>>>>>
>>>>
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
>>>>> at
>>>>>
>>>>>
>>>>
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
>>>>> at
>>>>>
>>>>>
>>>>
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
>>>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>>>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>>> at
>>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>>>>>
>>>>>
>>>>>
>>>>> 2011/4/24 Dmitriy Ryaboy <dv...@gmail.com>
>>>>>
>>>>>> I think it's the deep-casting issue from
>>>>>> https://issues.apache.org/jira/browse/PIG-1758 .
>>>>>> Should work in 0.9 but didn't get into 0.8 or 0.8.1
>>>>>>
>>>>>> D
>>>>>>
>>>>>> On Sun, Apr 24, 2011 at 9:52 AM, pob <pe...@gmail.com> wrote:
>>>>>>
>>>>>>> Thats stramge, pygmalion works fine (but there are any numerical
>>>>>>> operations).
>>>>>>>
>>>>>>> I think Im using C* 0.7.5 where it suppose to be patched ;/ so idk
>>> :(
>>>>>>>
>>>>>>>
>>>>>>> 2011/4/24 Jacob Perkins <ja...@gmail.com>
>>>>>>>
>>>>>>>> That changes things entirely. There's some weirdness in the way
>>>> data
>>>>> is
>>>>>>>> read from Cassandra. Have you applied the latest patches (eg.
>>>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-2387) ?
>>>>>>>>
>>>>>>>> See also some UDFs for working with Cassandra data that Jeremy
>>>> Hanna
>>>>>>>> (@jeromatron) wrote:
>>>>>>>>
>>>>>>>> https://github.com/jeromatron/pygmalion
>>>>>>>>
>>>>>>>>
>>>>>>>> Best of luck!
>>>>>>>>
>>>>>>>> --jacob
>>>>>>>> @thedatachef
>>>>>>>>
>>>>>>>> On Sun, 2011-04-24 at 18:31 +0200, pob wrote:
>>>>>>>>> Maybe I forget one more thing, rows are taken from Cassandra.
>>>>>>>>>
>>>>>>>>> rows = LOAD 'cassandra://emailArchive/messagesMetaData' USING
>>>>>>>>> CassandraStorage() AS (key, columns: bag {T: tuple(name,
>>>> value)});
>>>>>>>>>
>>>>>>>>> I have no idea how to format AS for bag in foreach.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> P.
>>>>>>>>>
>>>>>>>>> 2011/4/24 Jacob Perkins <ja...@gmail.com>
>>>>>>>>>
>>>>>>>>>> Strange, that looks right to me. What happens if you try the
>>>> 'AS'
>>>>>>>>>> statement anyhow?
>>>>>>>>>>
>>>>>>>>>> --jacob
>>>>>>>>>> @thedatachef
>>>>>>>>>>
>>>>>>>>>> On Sun, 2011-04-24 at 18:22 +0200, pob wrote:
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> pom = foreach rows generate myUDF.toTuple($1); -- reading
>>>> data
>>>>>>>>>>> describe pom
>>>>>>>>>>> pom: {y: {t: (domain: chararray,spam: int,size: long,time:
>>>>>> float)}}
>>>>>>>>>>>
>>>>>>>>>>> data = foreach pom generate flatten($0);
>>>>>>>>>>> grunt> describe data;
>>>>>>>>>>> data: {y::domain: chararray,y::spam: int,y::size:
>>>> long,y::time:
>>>>>>>> float}
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I thing they are casted fine, right?
>>>>>>>>>>>
>>>>>>>>>>> UDF is python one with decorator
>>>>>>>>>>> @outputSchema("y:bag{t:tuple(domain:chararray, spam:int,
>>>>>> size:long,
>>>>>>>>>>> time:float)}")
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2011/4/24 Jacob Perkins <ja...@gmail.com>
>>>>>>>>>>>
>>>>>>>>>>>> You're getting a 'ClassCastException' because the
>>> contents
>>>> of
>>>>>> the
>>>>>>>> bags
>>>>>>>>>>>> are DataByteArray and not long (or cannot be cast to
>>> long).
>>>> I
>>>>>>>> suspect
>>>>>>>>>>>> that you're generating the contents of the bag in some
>>> way
>>>>> from
>>>>>> a
>>>>>>>> UDF,
>>>>>>>>>>>> no?
>>>>>>>>>>>>
>>>>>>>>>>>> You need to either declare the output schema explicitly
>>> in
>>>>> the
>>>>>>> UDF
>>>>>>>> or
>>>>>>>>>>>> just use the 'AS' statement. For example, say you have a
>>>> UDF
>>>>>> that
>>>>>>>> sums
>>>>>>>>>>>> two numbers:
>>>>>>>>>>>>
>>>>>>>>>>>> data = LOAD 'foobar' AS (int:a, int:b);
>>>>>>>>>>>> summed = FOREACH data GENERATE MyFancySummingUDF(a,b) AS
>>>>>>> (sum:int);
>>>>>>>>>>>> DUMP summed;
>>>>>>>>>>>>
>>>>>>>>>>>> --jacob
>>>>>>>>>>>> @thedatachef
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, 2011-04-24 at 18:02 +0200, pob wrote:
>>>>>>>>>>>>> x = foreach g2 generate group, data.(size);
>>>>>>>>>>>>> dump x;
>>>>>>>>>>>>>
>>>>>>>>>>>>> ((drm,0),{(464868)})
>>>>>>>>>>>>> ((drm,1),{(464868)})
>>>>>>>>>>>>> ((snezz,0),{(8073),(8073)})
>>>>>>>>>>>>>
>>>>>>>>>>>>> but:
>>>>>>>>>>>>> x = foreach g2 generate group, SUM(data.size);
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2011-04-24 18:02:18,910 [Thread-793] WARN
>>>>>>>>>>>>> org.apache.hadoop.mapred.LocalJobRunner -
>>> job_local_0038
>>>>>>>>>>>>> org.apache.pig.backend.executionengine.ExecException:
>>>> ERROR
>>>>>>> 2106:
>>>>>>>>>> Error
>>>>>>>>>>>>> while computing sum in Initial
>>>>>>>>>>>>> at
>>>>>> org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:87)
>>>>>>>>>>>>> at
>>>>>> org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:65)
>>>>>>>>>>>>> at
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
>>>>>>>>>>>>> at
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273)
>>>>>>>>>>>>> at
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
>>>>>>>>>>>>> at
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
>>>>>>>>>>>>> at
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
>>>>>>>>>>>>> at
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
>>>>>>>>>>>>> at
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
>>>>>>>>>>>>> at
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
>>>>>>>>>>>>> at
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
>>>>>>>>>>>>> at
>>>> org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>>>>>>>>>>>>> at
>>>>>>>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>>>>>>>>>>> at
>>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>>>>>>>>>>> at
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>>>>>>>>>>>>> Caused by: java.lang.ClassCastException:
>>>>>>>>>>>> org.apache.pig.data.DataByteArray
>>>>>>>>>>>>> cannot be cast to java.lang.Long
>>>>>>>>>>>>> at
>>>>>> org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:79)
>>>>>>>>>>>>> ... 14 more
>>>>>>>>>>>>> 2011-04-24 18:02:19,213 [main] INFO
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>>>>>>> - HadoopJobId: job_local_0038
>>>>>>>>>>>>> 2011-04-24 18:02:19,213 [main] INFO
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>>>>>>> - 0% complete
>>>>>>>>>>>>> 2011-04-24 18:02:24,215 [main] INFO
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>>>>>>> - job job_local_0038 has failed! Stop running all
>>>> dependent
>>>>>>> jobs
>>>>>>>>>>>>> 2011-04-24 18:02:24,216 [main] INFO
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>>>>>>> - 100% complete
>>>>>>>>>>>>> 2011-04-24 18:02:24,216 [main] ERROR
>>>>>>>>>>>>> org.apache.pig.tools.pigstats.PigStatsUtil - 1 map
>>> reduce
>>>>>>> job(s)
>>>>>>>>>> failed!
>>>>>>>>>>>>> 2011-04-24 18:02:24,216 [main] INFO
>>>>>>>>>>>> org.apache.pig.tools.pigstats.PigStats
>>>>>>>>>>>>> - Detected Local mode. Stats reported below may be
>>>>> incomplete
>>>>>>>>>>>>> 2011-04-24 18:02:24,216 [main] INFO
>>>>>>>>>>>> org.apache.pig.tools.pigstats.PigStats
>>>>>>>>>>>>> - Script Statistics:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Pig Stack Trace
>>>>>>>>>>>>> ---------------
>>>>>>>>>>>>> ERROR 1066: Unable to open iterator for alias x
>>>>>>>>>>>>>
>>>>>>>>>>>>> org.apache.pig.impl.logicalLayer.FrontendException:
>>> ERROR
>>>>>> 1066:
>>>>>>>>>> Unable to
>>>>>>>>>>>>> open iterator for alias x
>>>>>>>>>>>>> at
>>>>>>>> org.apache.pig.PigServer.openIterator(PigServer.java:754)
>>>>>>>>>>>>> at
>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
>>>>>>>>>>>>> at
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
>>>>>>>>>>>>> at
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>>>>>>>>>>>>> at
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
>>>>>>>>>>>>> at
>>>>>> org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
>>>>>>>>>>>>> at org.apache.pig.Main.run(Main.java:465)
>>>>>>>>>>>>> at org.apache.pig.Main.main(Main.java:107)
>>>>>>>>>>>>> Caused by: java.io.IOException: Job terminated with
>>>>> anomalous
>>>>>>>> status
>>>>>>>>>>>> FAILED
>>>>>>>>>>>>> at
>>>>>>>> org.apache.pig.PigServer.openIterator(PigServer.java:744)
>>>>>>>>>>>>> ... 7 more
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>
Re: SUM
Posted by Jeremy Hanna <je...@gmail.com>.
Sorry - I've been kind of out of it this weekend. Talking about it on IRC. What I'd like to do is get a small set of data and a script that can reproduce what you're trying to do and then try various things in my own environment. That way we can more easily log a Cassandra ticket if it can't be worked into what's currently there. I'll respond to this thread when we have something to go forward with.
On Apr 24, 2011, at 3:28 PM, Dmitriy Ryaboy wrote:
> Sigh. @jeromatron , @thedatachef -- this one's on you :). Toldya you need
> the LoadCaster...
>
>
> D
>
> On Sun, Apr 24, 2011 at 1:17 PM, pob <pe...@gmail.com> wrote:
>
>> hello,
>>
>> thanks but w/out sucess ;/
>>
>>
>> grunt> pom = foreach rows generate myUDF.toTuple($1);
>> grunt> describe pom
>> pom: {y: {t: (domain: bytearray,spam: bytearray,size: bytearray,time:
>> bytearray)}}
>> grunt> data = foreach pom generate flatten($0) as (domain, spam, size,
>> time);
>> grunt> data = foreach data generate (chararray) domain, (int) spam, (long)
>> size,
>>>> (float) time;
>> grunt> describe data;
>> data: {domain: chararray,spam: int,size: long,time: float}
>>
>> z = foreach data generate time+size;
>>
>>
>> org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received
>> a
>> bytearray from the UDF. Cannot determine how to convert the bytearray to
>> float.
>> at
>>
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:529)
>> at
>>
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.getNext(Add.java:92)
>> at
>>
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
>> at
>>
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
>> at
>>
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
>> at
>>
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
>> at
>>
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>> 2011-04-24 22:16:06,129 [main] INFO
>>
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> - job job_local_0001 has failed! Stop running all dependent jobs
>>
>>
>>
>>
>> z = foreach data generate time
>>
>>
>> org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received
>> a
>> bytearray from the UDF. Cannot determine how to convert the bytearray to
>> float.
>> at
>>
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:529)
>> at
>>
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
>> at
>>
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
>> at
>>
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
>> at
>>
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
>> at
>>
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>>
>>
>> 2011/4/24 Dmitriy Ryaboy <dv...@gmail.com>
>>
>>> Try this:
>>>
>>> data = foreach pom generate flatten($0) as (domain, spam, size, time);
>>> data = foreach data generate (chararray) domain, (int) spam, (long) size,
>>> (float) time;
>>>
>>> Pig is inconsistent in what "as foo:type" does vs " (type) foo"
>>>
>>> D
>>>
>>> On Sun, Apr 24, 2011 at 10:44 AM, pob <pe...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> but why i cant re-cast it during flatten?
>>>>
>>>>
>>>> data = foreach pom generate flatten($0) AS (domain:chararray, spam:int,
>>>> size:long, time:float);
>>>>
>>>> grunt> z = foreach data generate time+size;
>>>>
>>>>
>>>> java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot
>> be
>>>> cast to java.lang.Float
>>>> at
>>>>
>>>>
>>>
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.getNext(Add.java:97)
>>>> at
>>>>
>>>>
>>>
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
>>>> at
>>>>
>>>>
>>>
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
>>>> at
>>>>
>>>>
>>>
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
>>>> at
>>>>
>>>>
>>>
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
>>>> at
>>>>
>>>>
>>>
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
>>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>> at
>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>>>>
>>>>
>>>>
>>>> 2011/4/24 Dmitriy Ryaboy <dv...@gmail.com>
>>>>
>>>>> I think it's the deep-casting issue from
>>>>> https://issues.apache.org/jira/browse/PIG-1758 .
>>>>> Should work in 0.9 but didn't get into 0.8 or 0.8.1
>>>>>
>>>>> D
>>>>>
>>>>> On Sun, Apr 24, 2011 at 9:52 AM, pob <pe...@gmail.com> wrote:
>>>>>
>>>>>> Thats stramge, pygmalion works fine (but there are any numerical
>>>>>> operations).
>>>>>>
>>>>>> I think Im using C* 0.7.5 where it suppose to be patched ;/ so idk
>> :(
>>>>>>
>>>>>>
>>>>>> 2011/4/24 Jacob Perkins <ja...@gmail.com>
>>>>>>
>>>>>>> That changes things entirely. There's some weirdness in the way
>>> data
>>>> is
>>>>>>> read from Cassandra. Have you applied the latest patches (eg.
>>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-2387) ?
>>>>>>>
>>>>>>> See also some UDFs for working with Cassandra data that Jeremy
>>> Hanna
>>>>>>> (@jeromatron) wrote:
>>>>>>>
>>>>>>> https://github.com/jeromatron/pygmalion
>>>>>>>
>>>>>>>
>>>>>>> Best of luck!
>>>>>>>
>>>>>>> --jacob
>>>>>>> @thedatachef
>>>>>>>
>>>>>>> On Sun, 2011-04-24 at 18:31 +0200, pob wrote:
>>>>>>>> Maybe I forget one more thing, rows are taken from Cassandra.
>>>>>>>>
>>>>>>>> rows = LOAD 'cassandra://emailArchive/messagesMetaData' USING
>>>>>>>> CassandraStorage() AS (key, columns: bag {T: tuple(name,
>>> value)});
>>>>>>>>
>>>>>>>> I have no idea how to format AS for bag in foreach.
>>>>>>>>
>>>>>>>>
>>>>>>>> P.
>>>>>>>>
>>>>>>>> 2011/4/24 Jacob Perkins <ja...@gmail.com>
>>>>>>>>
>>>>>>>>> Strange, that looks right to me. What happens if you try the
>>> 'AS'
>>>>>>>>> statement anyhow?
>>>>>>>>>
>>>>>>>>> --jacob
>>>>>>>>> @thedatachef
>>>>>>>>>
>>>>>>>>> On Sun, 2011-04-24 at 18:22 +0200, pob wrote:
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> pom = foreach rows generate myUDF.toTuple($1); -- reading
>>> data
>>>>>>>>>> describe pom
>>>>>>>>>> pom: {y: {t: (domain: chararray,spam: int,size: long,time:
>>>>> float)}}
>>>>>>>>>>
>>>>>>>>>> data = foreach pom generate flatten($0);
>>>>>>>>>> grunt> describe data;
>>>>>>>>>> data: {y::domain: chararray,y::spam: int,y::size:
>>> long,y::time:
>>>>>>> float}
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I thing they are casted fine, right?
>>>>>>>>>>
>>>>>>>>>> UDF is python one with decorator
>>>>>>>>>> @outputSchema("y:bag{t:tuple(domain:chararray, spam:int,
>>>>> size:long,
>>>>>>>>>> time:float)}")
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2011/4/24 Jacob Perkins <ja...@gmail.com>
>>>>>>>>>>
>>>>>>>>>>> You're getting a 'ClassCastException' because the
>> contents
>>> of
>>>>> the
>>>>>>> bags
>>>>>>>>>>> are DataByteArray and not long (or cannot be cast to
>> long).
>>> I
>>>>>>> suspect
>>>>>>>>>>> that you're generating the contents of the bag in some
>> way
>>>> from
>>>>> a
>>>>>>> UDF,
>>>>>>>>>>> no?
>>>>>>>>>>>
>>>>>>>>>>> You need to either declare the output schema explicitly
>> in
>>>> the
>>>>>> UDF
>>>>>>> or
>>>>>>>>>>> just use the 'AS' statement. For example, say you have a
>>> UDF
>>>>> that
>>>>>>> sums
>>>>>>>>>>> two numbers:
>>>>>>>>>>>
>>>>>>>>>>> data = LOAD 'foobar' AS (int:a, int:b);
>>>>>>>>>>> summed = FOREACH data GENERATE MyFancySummingUDF(a,b) AS
>>>>>> (sum:int);
>>>>>>>>>>> DUMP summed;
>>>>>>>>>>>
>>>>>>>>>>> --jacob
>>>>>>>>>>> @thedatachef
>>>>>>>>>>>
>>>>>>>>>>> On Sun, 2011-04-24 at 18:02 +0200, pob wrote:
>>>>>>>>>>>> x = foreach g2 generate group, data.(size);
>>>>>>>>>>>> dump x;
>>>>>>>>>>>>
>>>>>>>>>>>> ((drm,0),{(464868)})
>>>>>>>>>>>> ((drm,1),{(464868)})
>>>>>>>>>>>> ((snezz,0),{(8073),(8073)})
>>>>>>>>>>>>
>>>>>>>>>>>> but:
>>>>>>>>>>>> x = foreach g2 generate group, SUM(data.size);
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2011-04-24 18:02:18,910 [Thread-793] WARN
>>>>>>>>>>>> org.apache.hadoop.mapred.LocalJobRunner -
>> job_local_0038
>>>>>>>>>>>> org.apache.pig.backend.executionengine.ExecException:
>>> ERROR
>>>>>> 2106:
>>>>>>>>> Error
>>>>>>>>>>>> while computing sum in Initial
>>>>>>>>>>>> at
>>>>> org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:87)
>>>>>>>>>>>> at
>>>>> org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:65)
>>>>>>>>>>>> at
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
>>>>>>>>>>>> at
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273)
>>>>>>>>>>>> at
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
>>>>>>>>>>>> at
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
>>>>>>>>>>>> at
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
>>>>>>>>>>>> at
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
>>>>>>>>>>>> at
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
>>>>>>>>>>>> at
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
>>>>>>>>>>>> at
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
>>>>>>>>>>>> at
>>> org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>>>>>>>>>>>> at
>>>>>>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>>>>>>>>>> at
>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>>>>>>>>>> at
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>>>>>>>>>>>> Caused by: java.lang.ClassCastException:
>>>>>>>>>>> org.apache.pig.data.DataByteArray
>>>>>>>>>>>> cannot be cast to java.lang.Long
>>>>>>>>>>>> at
>>>>> org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:79)
>>>>>>>>>>>> ... 14 more
>>>>>>>>>>>> 2011-04-24 18:02:19,213 [main] INFO
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>>>>>> - HadoopJobId: job_local_0038
>>>>>>>>>>>> 2011-04-24 18:02:19,213 [main] INFO
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>>>>>> - 0% complete
>>>>>>>>>>>> 2011-04-24 18:02:24,215 [main] INFO
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>>>>>> - job job_local_0038 has failed! Stop running all
>>> dependent
>>>>>> jobs
>>>>>>>>>>>> 2011-04-24 18:02:24,216 [main] INFO
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>>>>>> - 100% complete
>>>>>>>>>>>> 2011-04-24 18:02:24,216 [main] ERROR
>>>>>>>>>>>> org.apache.pig.tools.pigstats.PigStatsUtil - 1 map
>> reduce
>>>>>> job(s)
>>>>>>>>> failed!
>>>>>>>>>>>> 2011-04-24 18:02:24,216 [main] INFO
>>>>>>>>>>> org.apache.pig.tools.pigstats.PigStats
>>>>>>>>>>>> - Detected Local mode. Stats reported below may be
>>>> incomplete
>>>>>>>>>>>> 2011-04-24 18:02:24,216 [main] INFO
>>>>>>>>>>> org.apache.pig.tools.pigstats.PigStats
>>>>>>>>>>>> - Script Statistics:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Pig Stack Trace
>>>>>>>>>>>> ---------------
>>>>>>>>>>>> ERROR 1066: Unable to open iterator for alias x
>>>>>>>>>>>>
>>>>>>>>>>>> org.apache.pig.impl.logicalLayer.FrontendException:
>> ERROR
>>>>> 1066:
>>>>>>>>> Unable to
>>>>>>>>>>>> open iterator for alias x
>>>>>>>>>>>> at
>>>>>>> org.apache.pig.PigServer.openIterator(PigServer.java:754)
>>>>>>>>>>>> at
>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
>>>>>>>>>>>> at
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
>>>>>>>>>>>> at
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>>>>>>>>>>>> at
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
>>>>>>>>>>>> at
>>>>> org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
>>>>>>>>>>>> at org.apache.pig.Main.run(Main.java:465)
>>>>>>>>>>>> at org.apache.pig.Main.main(Main.java:107)
>>>>>>>>>>>> Caused by: java.io.IOException: Job terminated with
>>>> anomalous
>>>>>>> status
>>>>>>>>>>> FAILED
>>>>>>>>>>>> at
>>>>>>> org.apache.pig.PigServer.openIterator(PigServer.java:744)
>>>>>>>>>>>> ... 7 more
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
Re: SUM
Posted by pob <pe...@gmail.com>.
with pig 0.9.0
grunt> data = foreach data generate (chararray) domain, (long) spam, (long)
size, (long) time;
2011-04-25 00:48:15,093 [main] WARN org.apache.pig.PigServer - Encountered
Warning IMPLICIT_CAST_TO_FLOAT 1 time(s).
grunt> describe data;
2011-04-25 00:48:20,354 [main] WARN org.apache.pig.PigServer - Encountered
Warning IMPLICIT_CAST_TO_FLOAT 1 time(s).
data: {domain: chararray,spam: long,size: long,time: long}
grunt> z = foreach data generate time+size;
2011-04-25 00:48:31,557 [main] WARN org.apache.pig.PigServer - Encountered
Warning IMPLICIT_CAST_TO_FLOAT 1 time(s).
org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received a
bytearray from the UDF. Cannot determine how to convert the bytearray to
float.
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:534)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:341)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:330)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.genericGetNext(Add.java:84)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.getNext(Add.java:119)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:330)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:261)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:256)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:58)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
2011/4/24 Dmitriy Ryaboy <dv...@gmail.com>
> Sigh. @jeromatron , @thedatachef -- this one's on you :). Toldya you need
> the LoadCaster...
>
>
> D
>
> On Sun, Apr 24, 2011 at 1:17 PM, pob <pe...@gmail.com> wrote:
>
> > hello,
> >
> > thanks but w/out sucess ;/
> >
> >
> > grunt> pom = foreach rows generate myUDF.toTuple($1);
> > grunt> describe pom
> > pom: {y: {t: (domain: bytearray,spam: bytearray,size: bytearray,time:
> > bytearray)}}
> > grunt> data = foreach pom generate flatten($0) as (domain, spam, size,
> > time);
> > grunt> data = foreach data generate (chararray) domain, (int) spam,
> (long)
> > size,
> > >> (float) time;
> > grunt> describe data;
> > data: {domain: chararray,spam: int,size: long,time: float}
> >
> > z = foreach data generate time+size;
> >
> >
> > org.apache.pig.backend.executionengine.ExecException: ERROR 1075:
> Received
> > a
> > bytearray from the UDF. Cannot determine how to convert the bytearray to
> > float.
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:529)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.getNext(Add.java:92)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> > 2011-04-24 22:16:06,129 [main] INFO
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - job job_local_0001 has failed! Stop running all dependent jobs
> >
> >
> >
> >
> > z = foreach data generate time
> >
> >
> > org.apache.pig.backend.executionengine.ExecException: ERROR 1075:
> Received
> > a
> > bytearray from the UDF. Cannot determine how to convert the bytearray to
> > float.
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:529)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> >
> >
> > 2011/4/24 Dmitriy Ryaboy <dv...@gmail.com>
> >
> > > Try this:
> > >
> > > data = foreach pom generate flatten($0) as (domain, spam, size, time);
> > > data = foreach data generate (chararray) domain, (int) spam, (long)
> size,
> > > (float) time;
> > >
> > > Pig is inconsistent in what "as foo:type" does vs " (type) foo"
> > >
> > > D
> > >
> > > On Sun, Apr 24, 2011 at 10:44 AM, pob <pe...@gmail.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > but why i cant re-cast it during flatten?
> > > >
> > > >
> > > > data = foreach pom generate flatten($0) AS (domain:chararray,
> spam:int,
> > > > size:long, time:float);
> > > >
> > > > grunt> z = foreach data generate time+size;
> > > >
> > > >
> > > > java.lang.ClassCastException: org.apache.pig.data.DataByteArray
> cannot
> > be
> > > > cast to java.lang.Float
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.getNext(Add.java:97)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > > > at
> > >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> > > >
> > > >
> > > >
> > > > 2011/4/24 Dmitriy Ryaboy <dv...@gmail.com>
> > > >
> > > > > I think it's the deep-casting issue from
> > > > > https://issues.apache.org/jira/browse/PIG-1758 .
> > > > > Should work in 0.9 but didn't get into 0.8 or 0.8.1
> > > > >
> > > > > D
> > > > >
> > > > > On Sun, Apr 24, 2011 at 9:52 AM, pob <pe...@gmail.com> wrote:
> > > > >
> > > > > > Thats stramge, pygmalion works fine (but there are any numerical
> > > > > > operations).
> > > > > >
> > > > > > I think Im using C* 0.7.5 where it suppose to be patched ;/ so
> idk
> > :(
> > > > > >
> > > > > >
> > > > > > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> > > > > >
> > > > > > > That changes things entirely. There's some weirdness in the way
> > > data
> > > > is
> > > > > > > read from Cassandra. Have you applied the latest patches (eg.
> > > > > > > https://issues.apache.org/jira/browse/CASSANDRA-2387) ?
> > > > > > >
> > > > > > > See also some UDFs for working with Cassandra data that Jeremy
> > > Hanna
> > > > > > > (@jeromatron) wrote:
> > > > > > >
> > > > > > > https://github.com/jeromatron/pygmalion
> > > > > > >
> > > > > > >
> > > > > > > Best of luck!
> > > > > > >
> > > > > > > --jacob
> > > > > > > @thedatachef
> > > > > > >
> > > > > > > On Sun, 2011-04-24 at 18:31 +0200, pob wrote:
> > > > > > > > Maybe I forget one more thing, rows are taken from Cassandra.
> > > > > > > >
> > > > > > > > rows = LOAD 'cassandra://emailArchive/messagesMetaData' USING
> > > > > > > > CassandraStorage() AS (key, columns: bag {T: tuple(name,
> > > value)});
> > > > > > > >
> > > > > > > > I have no idea how to format AS for bag in foreach.
> > > > > > > >
> > > > > > > >
> > > > > > > > P.
> > > > > > > >
> > > > > > > > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> > > > > > > >
> > > > > > > > > Strange, that looks right to me. What happens if you try
> the
> > > 'AS'
> > > > > > > > > statement anyhow?
> > > > > > > > >
> > > > > > > > > --jacob
> > > > > > > > > @thedatachef
> > > > > > > > >
> > > > > > > > > On Sun, 2011-04-24 at 18:22 +0200, pob wrote:
> > > > > > > > > > Hello,
> > > > > > > > > >
> > > > > > > > > > pom = foreach rows generate myUDF.toTuple($1); -- reading
> > > data
> > > > > > > > > > describe pom
> > > > > > > > > > pom: {y: {t: (domain: chararray,spam: int,size:
> long,time:
> > > > > float)}}
> > > > > > > > > >
> > > > > > > > > > data = foreach pom generate flatten($0);
> > > > > > > > > > grunt> describe data;
> > > > > > > > > > data: {y::domain: chararray,y::spam: int,y::size:
> > > long,y::time:
> > > > > > > float}
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I thing they are casted fine, right?
> > > > > > > > > >
> > > > > > > > > > UDF is python one with decorator
> > > > > > > > > > @outputSchema("y:bag{t:tuple(domain:chararray, spam:int,
> > > > > size:long,
> > > > > > > > > > time:float)}")
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> > > > > > > > > >
> > > > > > > > > > > You're getting a 'ClassCastException' because the
> > contents
> > > of
> > > > > the
> > > > > > > bags
> > > > > > > > > > > are DataByteArray and not long (or cannot be cast to
> > long).
> > > I
> > > > > > > suspect
> > > > > > > > > > > that you're generating the contents of the bag in some
> > way
> > > > from
> > > > > a
> > > > > > > UDF,
> > > > > > > > > > > no?
> > > > > > > > > > >
> > > > > > > > > > > You need to either declare the output schema explicitly
> > in
> > > > the
> > > > > > UDF
> > > > > > > or
> > > > > > > > > > > just use the 'AS' statement. For example, say you have
> a
> > > UDF
> > > > > that
> > > > > > > sums
> > > > > > > > > > > two numbers:
> > > > > > > > > > >
> > > > > > > > > > > data = LOAD 'foobar' AS (int:a, int:b);
> > > > > > > > > > > summed = FOREACH data GENERATE MyFancySummingUDF(a,b)
> AS
> > > > > > (sum:int);
> > > > > > > > > > > DUMP summed;
> > > > > > > > > > >
> > > > > > > > > > > --jacob
> > > > > > > > > > > @thedatachef
> > > > > > > > > > >
> > > > > > > > > > > On Sun, 2011-04-24 at 18:02 +0200, pob wrote:
> > > > > > > > > > > > x = foreach g2 generate group, data.(size);
> > > > > > > > > > > > dump x;
> > > > > > > > > > > >
> > > > > > > > > > > > ((drm,0),{(464868)})
> > > > > > > > > > > > ((drm,1),{(464868)})
> > > > > > > > > > > > ((snezz,0),{(8073),(8073)})
> > > > > > > > > > > >
> > > > > > > > > > > > but:
> > > > > > > > > > > > x = foreach g2 generate group, SUM(data.size);
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > 2011-04-24 18:02:18,910 [Thread-793] WARN
> > > > > > > > > > > > org.apache.hadoop.mapred.LocalJobRunner -
> > job_local_0038
> > > > > > > > > > > > org.apache.pig.backend.executionengine.ExecException:
> > > ERROR
> > > > > > 2106:
> > > > > > > > > Error
> > > > > > > > > > > > while computing sum in Initial
> > > > > > > > > > > > at
> > > > > org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:87)
> > > > > > > > > > > > at
> > > > > org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:65)
> > > > > > > > > > > > at
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
> > > > > > > > > > > > at
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273)
> > > > > > > > > > > > at
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
> > > > > > > > > > > > at
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> > > > > > > > > > > > at
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
> > > > > > > > > > > > at
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
> > > > > > > > > > > > at
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> > > > > > > > > > > > at
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> > > > > > > > > > > > at
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> > > > > > > > > > > > at
> > > org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > > > > > > > > > > at
> > > > > > > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> > > > > > > > > > > > at
> > org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > > > > > > > > > > > at
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> > > > > > > > > > > > Caused by: java.lang.ClassCastException:
> > > > > > > > > > > org.apache.pig.data.DataByteArray
> > > > > > > > > > > > cannot be cast to java.lang.Long
> > > > > > > > > > > > at
> > > > > org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:79)
> > > > > > > > > > > > ... 14 more
> > > > > > > > > > > > 2011-04-24 18:02:19,213 [main] INFO
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > > > > > - HadoopJobId: job_local_0038
> > > > > > > > > > > > 2011-04-24 18:02:19,213 [main] INFO
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > > > > > - 0% complete
> > > > > > > > > > > > 2011-04-24 18:02:24,215 [main] INFO
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > > > > > - job job_local_0038 has failed! Stop running all
> > > dependent
> > > > > > jobs
> > > > > > > > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > > > > > - 100% complete
> > > > > > > > > > > > 2011-04-24 18:02:24,216 [main] ERROR
> > > > > > > > > > > > org.apache.pig.tools.pigstats.PigStatsUtil - 1 map
> > reduce
> > > > > > job(s)
> > > > > > > > > failed!
> > > > > > > > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > > > > > > > org.apache.pig.tools.pigstats.PigStats
> > > > > > > > > > > > - Detected Local mode. Stats reported below may be
> > > > incomplete
> > > > > > > > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > > > > > > > org.apache.pig.tools.pigstats.PigStats
> > > > > > > > > > > > - Script Statistics:
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Pig Stack Trace
> > > > > > > > > > > > ---------------
> > > > > > > > > > > > ERROR 1066: Unable to open iterator for alias x
> > > > > > > > > > > >
> > > > > > > > > > > > org.apache.pig.impl.logicalLayer.FrontendException:
> > ERROR
> > > > > 1066:
> > > > > > > > > Unable to
> > > > > > > > > > > > open iterator for alias x
> > > > > > > > > > > > at
> > > > > > > org.apache.pig.PigServer.openIterator(PigServer.java:754)
> > > > > > > > > > > > at
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
> > > > > > > > > > > > at
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
> > > > > > > > > > > > at
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> > > > > > > > > > > > at
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> > > > > > > > > > > > at
> > > > > org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> > > > > > > > > > > > at org.apache.pig.Main.run(Main.java:465)
> > > > > > > > > > > > at org.apache.pig.Main.main(Main.java:107)
> > > > > > > > > > > > Caused by: java.io.IOException: Job terminated with
> > > > anomalous
> > > > > > > status
> > > > > > > > > > > FAILED
> > > > > > > > > > > > at
> > > > > > > org.apache.pig.PigServer.openIterator(PigServer.java:744)
> > > > > > > > > > > > ... 7 more
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
Re: SUM
Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Sigh. @jeromatron , @thedatachef -- this one's on you :). Toldya you need
the LoadCaster...
D
On Sun, Apr 24, 2011 at 1:17 PM, pob <pe...@gmail.com> wrote:
> hello,
>
> thanks but w/out sucess ;/
>
>
> grunt> pom = foreach rows generate myUDF.toTuple($1);
> grunt> describe pom
> pom: {y: {t: (domain: bytearray,spam: bytearray,size: bytearray,time:
> bytearray)}}
> grunt> data = foreach pom generate flatten($0) as (domain, spam, size,
> time);
> grunt> data = foreach data generate (chararray) domain, (int) spam, (long)
> size,
> >> (float) time;
> grunt> describe data;
> data: {domain: chararray,spam: int,size: long,time: float}
>
> z = foreach data generate time+size;
>
>
> org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received
> a
> bytearray from the UDF. Cannot determine how to convert the bytearray to
> float.
> at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:529)
> at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.getNext(Add.java:92)
> at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
> at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> 2011-04-24 22:16:06,129 [main] INFO
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - job job_local_0001 has failed! Stop running all dependent jobs
>
>
>
>
> z = foreach data generate time
>
>
> org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received
> a
> bytearray from the UDF. Cannot determine how to convert the bytearray to
> float.
> at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:529)
> at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
> at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>
>
> 2011/4/24 Dmitriy Ryaboy <dv...@gmail.com>
>
> > Try this:
> >
> > data = foreach pom generate flatten($0) as (domain, spam, size, time);
> > data = foreach data generate (chararray) domain, (int) spam, (long) size,
> > (float) time;
> >
> > Pig is inconsistent in what "as foo:type" does vs " (type) foo"
> >
> > D
> >
> > On Sun, Apr 24, 2011 at 10:44 AM, pob <pe...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > but why i cant re-cast it during flatten?
> > >
> > >
> > > data = foreach pom generate flatten($0) AS (domain:chararray, spam:int,
> > > size:long, time:float);
> > >
> > > grunt> z = foreach data generate time+size;
> > >
> > >
> > > java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot
> be
> > > cast to java.lang.Float
> > > at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.getNext(Add.java:97)
> > > at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
> > > at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> > > at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> > > at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> > > at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > > at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> > >
> > >
> > >
> > > 2011/4/24 Dmitriy Ryaboy <dv...@gmail.com>
> > >
> > > > I think it's the deep-casting issue from
> > > > https://issues.apache.org/jira/browse/PIG-1758 .
> > > > Should work in 0.9 but didn't get into 0.8 or 0.8.1
> > > >
> > > > D
> > > >
> > > > On Sun, Apr 24, 2011 at 9:52 AM, pob <pe...@gmail.com> wrote:
> > > >
> > > > > Thats stramge, pygmalion works fine (but there are any numerical
> > > > > operations).
> > > > >
> > > > > I think Im using C* 0.7.5 where it suppose to be patched ;/ so idk
> :(
> > > > >
> > > > >
> > > > > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> > > > >
> > > > > > That changes things entirely. There's some weirdness in the way
> > data
> > > is
> > > > > > read from Cassandra. Have you applied the latest patches (eg.
> > > > > > https://issues.apache.org/jira/browse/CASSANDRA-2387) ?
> > > > > >
> > > > > > See also some UDFs for working with Cassandra data that Jeremy
> > Hanna
> > > > > > (@jeromatron) wrote:
> > > > > >
> > > > > > https://github.com/jeromatron/pygmalion
> > > > > >
> > > > > >
> > > > > > Best of luck!
> > > > > >
> > > > > > --jacob
> > > > > > @thedatachef
> > > > > >
> > > > > > On Sun, 2011-04-24 at 18:31 +0200, pob wrote:
> > > > > > > Maybe I forget one more thing, rows are taken from Cassandra.
> > > > > > >
> > > > > > > rows = LOAD 'cassandra://emailArchive/messagesMetaData' USING
> > > > > > > CassandraStorage() AS (key, columns: bag {T: tuple(name,
> > value)});
> > > > > > >
> > > > > > > I have no idea how to format AS for bag in foreach.
> > > > > > >
> > > > > > >
> > > > > > > P.
> > > > > > >
> > > > > > > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> > > > > > >
> > > > > > > > Strange, that looks right to me. What happens if you try the
> > 'AS'
> > > > > > > > statement anyhow?
> > > > > > > >
> > > > > > > > --jacob
> > > > > > > > @thedatachef
> > > > > > > >
> > > > > > > > On Sun, 2011-04-24 at 18:22 +0200, pob wrote:
> > > > > > > > > Hello,
> > > > > > > > >
> > > > > > > > > pom = foreach rows generate myUDF.toTuple($1); -- reading
> > data
> > > > > > > > > describe pom
> > > > > > > > > pom: {y: {t: (domain: chararray,spam: int,size: long,time:
> > > > float)}}
> > > > > > > > >
> > > > > > > > > data = foreach pom generate flatten($0);
> > > > > > > > > grunt> describe data;
> > > > > > > > > data: {y::domain: chararray,y::spam: int,y::size:
> > long,y::time:
> > > > > > float}
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I thing they are casted fine, right?
> > > > > > > > >
> > > > > > > > > UDF is python one with decorator
> > > > > > > > > @outputSchema("y:bag{t:tuple(domain:chararray, spam:int,
> > > > size:long,
> > > > > > > > > time:float)}")
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> > > > > > > > >
> > > > > > > > > > You're getting a 'ClassCastException' because the
> contents
> > of
> > > > the
> > > > > > bags
> > > > > > > > > > are DataByteArray and not long (or cannot be cast to
> long).
> > I
> > > > > > suspect
> > > > > > > > > > that you're generating the contents of the bag in some
> way
> > > from
> > > > a
> > > > > > UDF,
> > > > > > > > > > no?
> > > > > > > > > >
> > > > > > > > > > You need to either declare the output schema explicitly
> in
> > > the
> > > > > UDF
> > > > > > or
> > > > > > > > > > just use the 'AS' statement. For example, say you have a
> > UDF
> > > > that
> > > > > > sums
> > > > > > > > > > two numbers:
> > > > > > > > > >
> > > > > > > > > > data = LOAD 'foobar' AS (int:a, int:b);
> > > > > > > > > > summed = FOREACH data GENERATE MyFancySummingUDF(a,b) AS
> > > > > (sum:int);
> > > > > > > > > > DUMP summed;
> > > > > > > > > >
> > > > > > > > > > --jacob
> > > > > > > > > > @thedatachef
> > > > > > > > > >
> > > > > > > > > > On Sun, 2011-04-24 at 18:02 +0200, pob wrote:
> > > > > > > > > > > x = foreach g2 generate group, data.(size);
> > > > > > > > > > > dump x;
> > > > > > > > > > >
> > > > > > > > > > > ((drm,0),{(464868)})
> > > > > > > > > > > ((drm,1),{(464868)})
> > > > > > > > > > > ((snezz,0),{(8073),(8073)})
> > > > > > > > > > >
> > > > > > > > > > > but:
> > > > > > > > > > > x = foreach g2 generate group, SUM(data.size);
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > 2011-04-24 18:02:18,910 [Thread-793] WARN
> > > > > > > > > > > org.apache.hadoop.mapred.LocalJobRunner -
> job_local_0038
> > > > > > > > > > > org.apache.pig.backend.executionengine.ExecException:
> > ERROR
> > > > > 2106:
> > > > > > > > Error
> > > > > > > > > > > while computing sum in Initial
> > > > > > > > > > > at
> > > > org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:87)
> > > > > > > > > > > at
> > > > org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:65)
> > > > > > > > > > > at
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
> > > > > > > > > > > at
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273)
> > > > > > > > > > > at
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
> > > > > > > > > > > at
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> > > > > > > > > > > at
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
> > > > > > > > > > > at
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
> > > > > > > > > > > at
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> > > > > > > > > > > at
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> > > > > > > > > > > at
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> > > > > > > > > > > at
> > org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > > > > > > > > > at
> > > > > > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> > > > > > > > > > > at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > > > > > > > > > > at
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> > > > > > > > > > > Caused by: java.lang.ClassCastException:
> > > > > > > > > > org.apache.pig.data.DataByteArray
> > > > > > > > > > > cannot be cast to java.lang.Long
> > > > > > > > > > > at
> > > > org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:79)
> > > > > > > > > > > ... 14 more
> > > > > > > > > > > 2011-04-24 18:02:19,213 [main] INFO
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > > > > - HadoopJobId: job_local_0038
> > > > > > > > > > > 2011-04-24 18:02:19,213 [main] INFO
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > > > > - 0% complete
> > > > > > > > > > > 2011-04-24 18:02:24,215 [main] INFO
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > > > > - job job_local_0038 has failed! Stop running all
> > dependent
> > > > > jobs
> > > > > > > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > > > > - 100% complete
> > > > > > > > > > > 2011-04-24 18:02:24,216 [main] ERROR
> > > > > > > > > > > org.apache.pig.tools.pigstats.PigStatsUtil - 1 map
> reduce
> > > > > job(s)
> > > > > > > > failed!
> > > > > > > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > > > > > > org.apache.pig.tools.pigstats.PigStats
> > > > > > > > > > > - Detected Local mode. Stats reported below may be
> > > incomplete
> > > > > > > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > > > > > > org.apache.pig.tools.pigstats.PigStats
> > > > > > > > > > > - Script Statistics:
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Pig Stack Trace
> > > > > > > > > > > ---------------
> > > > > > > > > > > ERROR 1066: Unable to open iterator for alias x
> > > > > > > > > > >
> > > > > > > > > > > org.apache.pig.impl.logicalLayer.FrontendException:
> ERROR
> > > > 1066:
> > > > > > > > Unable to
> > > > > > > > > > > open iterator for alias x
> > > > > > > > > > > at
> > > > > > org.apache.pig.PigServer.openIterator(PigServer.java:754)
> > > > > > > > > > > at
> > > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
> > > > > > > > > > > at
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
> > > > > > > > > > > at
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> > > > > > > > > > > at
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> > > > > > > > > > > at
> > > > org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> > > > > > > > > > > at org.apache.pig.Main.run(Main.java:465)
> > > > > > > > > > > at org.apache.pig.Main.main(Main.java:107)
> > > > > > > > > > > Caused by: java.io.IOException: Job terminated with
> > > anomalous
> > > > > > status
> > > > > > > > > > FAILED
> > > > > > > > > > > at
> > > > > > org.apache.pig.PigServer.openIterator(PigServer.java:744)
> > > > > > > > > > > ... 7 more
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
Re: SUM
Posted by pob <pe...@gmail.com>.
hello,
thanks but w/out sucess ;/
grunt> pom = foreach rows generate myUDF.toTuple($1);
grunt> describe pom
pom: {y: {t: (domain: bytearray,spam: bytearray,size: bytearray,time:
bytearray)}}
grunt> data = foreach pom generate flatten($0) as (domain, spam, size,
time);
grunt> data = foreach data generate (chararray) domain, (int) spam, (long)
size,
>> (float) time;
grunt> describe data;
data: {domain: chararray,spam: int,size: long,time: float}
z = foreach data generate time+size;
org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received a
bytearray from the UDF. Cannot determine how to convert the bytearray to
float.
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:529)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.getNext(Add.java:92)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
2011-04-24 22:16:06,129 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- job job_local_0001 has failed! Stop running all dependent jobs
z = foreach data generate time
org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received a
bytearray from the UDF. Cannot determine how to convert the bytearray to
float.
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:529)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
2011/4/24 Dmitriy Ryaboy <dv...@gmail.com>
> Try this:
>
> data = foreach pom generate flatten($0) as (domain, spam, size, time);
> data = foreach data generate (chararray) domain, (int) spam, (long) size,
> (float) time;
>
> Pig is inconsistent in what "as foo:type" does vs " (type) foo"
>
> D
>
> On Sun, Apr 24, 2011 at 10:44 AM, pob <pe...@gmail.com> wrote:
>
> > Hi,
> >
> > but why i cant re-cast it during flatten?
> >
> >
> > data = foreach pom generate flatten($0) AS (domain:chararray, spam:int,
> > size:long, time:float);
> >
> > grunt> z = foreach data generate time+size;
> >
> >
> > java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be
> > cast to java.lang.Float
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.getNext(Add.java:97)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> >
> >
> >
> > 2011/4/24 Dmitriy Ryaboy <dv...@gmail.com>
> >
> > > I think it's the deep-casting issue from
> > > https://issues.apache.org/jira/browse/PIG-1758 .
> > > Should work in 0.9 but didn't get into 0.8 or 0.8.1
> > >
> > > D
> > >
> > > On Sun, Apr 24, 2011 at 9:52 AM, pob <pe...@gmail.com> wrote:
> > >
> > > > Thats stramge, pygmalion works fine (but there are any numerical
> > > > operations).
> > > >
> > > > I think Im using C* 0.7.5 where it suppose to be patched ;/ so idk :(
> > > >
> > > >
> > > > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> > > >
> > > > > That changes things entirely. There's some weirdness in the way
> data
> > is
> > > > > read from Cassandra. Have you applied the latest patches (eg.
> > > > > https://issues.apache.org/jira/browse/CASSANDRA-2387) ?
> > > > >
> > > > > See also some UDFs for working with Cassandra data that Jeremy
> Hanna
> > > > > (@jeromatron) wrote:
> > > > >
> > > > > https://github.com/jeromatron/pygmalion
> > > > >
> > > > >
> > > > > Best of luck!
> > > > >
> > > > > --jacob
> > > > > @thedatachef
> > > > >
> > > > > On Sun, 2011-04-24 at 18:31 +0200, pob wrote:
> > > > > > Maybe I forget one more thing, rows are taken from Cassandra.
> > > > > >
> > > > > > rows = LOAD 'cassandra://emailArchive/messagesMetaData' USING
> > > > > > CassandraStorage() AS (key, columns: bag {T: tuple(name,
> value)});
> > > > > >
> > > > > > I have no idea how to format AS for bag in foreach.
> > > > > >
> > > > > >
> > > > > > P.
> > > > > >
> > > > > > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> > > > > >
> > > > > > > Strange, that looks right to me. What happens if you try the
> 'AS'
> > > > > > > statement anyhow?
> > > > > > >
> > > > > > > --jacob
> > > > > > > @thedatachef
> > > > > > >
> > > > > > > On Sun, 2011-04-24 at 18:22 +0200, pob wrote:
> > > > > > > > Hello,
> > > > > > > >
> > > > > > > > pom = foreach rows generate myUDF.toTuple($1); -- reading
> data
> > > > > > > > describe pom
> > > > > > > > pom: {y: {t: (domain: chararray,spam: int,size: long,time:
> > > float)}}
> > > > > > > >
> > > > > > > > data = foreach pom generate flatten($0);
> > > > > > > > grunt> describe data;
> > > > > > > > data: {y::domain: chararray,y::spam: int,y::size:
> long,y::time:
> > > > > float}
> > > > > > > >
> > > > > > > >
> > > > > > > > I thing they are casted fine, right?
> > > > > > > >
> > > > > > > > UDF is python one with decorator
> > > > > > > > @outputSchema("y:bag{t:tuple(domain:chararray, spam:int,
> > > size:long,
> > > > > > > > time:float)}")
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> > > > > > > >
> > > > > > > > > You're getting a 'ClassCastException' because the contents
> of
> > > the
> > > > > bags
> > > > > > > > > are DataByteArray and not long (or cannot be cast to long).
> I
> > > > > suspect
> > > > > > > > > that you're generating the contents of the bag in some way
> > from
> > > a
> > > > > UDF,
> > > > > > > > > no?
> > > > > > > > >
> > > > > > > > > You need to either declare the output schema explicitly in
> > the
> > > > UDF
> > > > > or
> > > > > > > > > just use the 'AS' statement. For example, say you have a
> UDF
> > > that
> > > > > sums
> > > > > > > > > two numbers:
> > > > > > > > >
> > > > > > > > > data = LOAD 'foobar' AS (int:a, int:b);
> > > > > > > > > summed = FOREACH data GENERATE MyFancySummingUDF(a,b) AS
> > > > (sum:int);
> > > > > > > > > DUMP summed;
> > > > > > > > >
> > > > > > > > > --jacob
> > > > > > > > > @thedatachef
> > > > > > > > >
> > > > > > > > > On Sun, 2011-04-24 at 18:02 +0200, pob wrote:
> > > > > > > > > > x = foreach g2 generate group, data.(size);
> > > > > > > > > > dump x;
> > > > > > > > > >
> > > > > > > > > > ((drm,0),{(464868)})
> > > > > > > > > > ((drm,1),{(464868)})
> > > > > > > > > > ((snezz,0),{(8073),(8073)})
> > > > > > > > > >
> > > > > > > > > > but:
> > > > > > > > > > x = foreach g2 generate group, SUM(data.size);
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 2011-04-24 18:02:18,910 [Thread-793] WARN
> > > > > > > > > > org.apache.hadoop.mapred.LocalJobRunner - job_local_0038
> > > > > > > > > > org.apache.pig.backend.executionengine.ExecException:
> ERROR
> > > > 2106:
> > > > > > > Error
> > > > > > > > > > while computing sum in Initial
> > > > > > > > > > at
> > > org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:87)
> > > > > > > > > > at
> > > org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:65)
> > > > > > > > > > at
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
> > > > > > > > > > at
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273)
> > > > > > > > > > at
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
> > > > > > > > > > at
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> > > > > > > > > > at
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
> > > > > > > > > > at
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
> > > > > > > > > > at
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> > > > > > > > > > at
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> > > > > > > > > > at
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> > > > > > > > > > at
> org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > > > > > > > > at
> > > > > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> > > > > > > > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > > > > > > > > > at
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> > > > > > > > > > Caused by: java.lang.ClassCastException:
> > > > > > > > > org.apache.pig.data.DataByteArray
> > > > > > > > > > cannot be cast to java.lang.Long
> > > > > > > > > > at
> > > org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:79)
> > > > > > > > > > ... 14 more
> > > > > > > > > > 2011-04-24 18:02:19,213 [main] INFO
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > > > - HadoopJobId: job_local_0038
> > > > > > > > > > 2011-04-24 18:02:19,213 [main] INFO
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > > > - 0% complete
> > > > > > > > > > 2011-04-24 18:02:24,215 [main] INFO
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > > > - job job_local_0038 has failed! Stop running all
> dependent
> > > > jobs
> > > > > > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > > > - 100% complete
> > > > > > > > > > 2011-04-24 18:02:24,216 [main] ERROR
> > > > > > > > > > org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce
> > > > job(s)
> > > > > > > failed!
> > > > > > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > > > > > org.apache.pig.tools.pigstats.PigStats
> > > > > > > > > > - Detected Local mode. Stats reported below may be
> > incomplete
> > > > > > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > > > > > org.apache.pig.tools.pigstats.PigStats
> > > > > > > > > > - Script Statistics:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Pig Stack Trace
> > > > > > > > > > ---------------
> > > > > > > > > > ERROR 1066: Unable to open iterator for alias x
> > > > > > > > > >
> > > > > > > > > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR
> > > 1066:
> > > > > > > Unable to
> > > > > > > > > > open iterator for alias x
> > > > > > > > > > at
> > > > > org.apache.pig.PigServer.openIterator(PigServer.java:754)
> > > > > > > > > > at
> > > > > > > > > >
> > > > > > >
> > > > >
> > >
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
> > > > > > > > > > at
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
> > > > > > > > > > at
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> > > > > > > > > > at
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> > > > > > > > > > at
> > > org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> > > > > > > > > > at org.apache.pig.Main.run(Main.java:465)
> > > > > > > > > > at org.apache.pig.Main.main(Main.java:107)
> > > > > > > > > > Caused by: java.io.IOException: Job terminated with
> > anomalous
> > > > > status
> > > > > > > > > FAILED
> > > > > > > > > > at
> > > > > org.apache.pig.PigServer.openIterator(PigServer.java:744)
> > > > > > > > > > ... 7 more
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>
Re: SUM
Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Try this:
data = foreach pom generate flatten($0) as (domain, spam, size, time);
data = foreach data generate (chararray) domain, (int) spam, (long) size,
(float) time;
Pig is inconsistent in what "as foo:type" does vs " (type) foo"
D
On Sun, Apr 24, 2011 at 10:44 AM, pob <pe...@gmail.com> wrote:
> Hi,
>
> but why i cant re-cast it during flatten?
>
>
> data = foreach pom generate flatten($0) AS (domain:chararray, spam:int,
> size:long, time:float);
>
> grunt> z = foreach data generate time+size;
>
>
> java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be
> cast to java.lang.Float
> at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.getNext(Add.java:97)
> at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
> at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>
>
>
> 2011/4/24 Dmitriy Ryaboy <dv...@gmail.com>
>
> > I think it's the deep-casting issue from
> > https://issues.apache.org/jira/browse/PIG-1758 .
> > Should work in 0.9 but didn't get into 0.8 or 0.8.1
> >
> > D
> >
> > On Sun, Apr 24, 2011 at 9:52 AM, pob <pe...@gmail.com> wrote:
> >
> > > Thats stramge, pygmalion works fine (but there are any numerical
> > > operations).
> > >
> > > I think Im using C* 0.7.5 where it suppose to be patched ;/ so idk :(
> > >
> > >
> > > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> > >
> > > > That changes things entirely. There's some weirdness in the way data
> is
> > > > read from Cassandra. Have you applied the latest patches (eg.
> > > > https://issues.apache.org/jira/browse/CASSANDRA-2387) ?
> > > >
> > > > See also some UDFs for working with Cassandra data that Jeremy Hanna
> > > > (@jeromatron) wrote:
> > > >
> > > > https://github.com/jeromatron/pygmalion
> > > >
> > > >
> > > > Best of luck!
> > > >
> > > > --jacob
> > > > @thedatachef
> > > >
> > > > On Sun, 2011-04-24 at 18:31 +0200, pob wrote:
> > > > > Maybe I forget one more thing, rows are taken from Cassandra.
> > > > >
> > > > > rows = LOAD 'cassandra://emailArchive/messagesMetaData' USING
> > > > > CassandraStorage() AS (key, columns: bag {T: tuple(name, value)});
> > > > >
> > > > > I have no idea how to format AS for bag in foreach.
> > > > >
> > > > >
> > > > > P.
> > > > >
> > > > > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> > > > >
> > > > > > Strange, that looks right to me. What happens if you try the 'AS'
> > > > > > statement anyhow?
> > > > > >
> > > > > > --jacob
> > > > > > @thedatachef
> > > > > >
> > > > > > On Sun, 2011-04-24 at 18:22 +0200, pob wrote:
> > > > > > > Hello,
> > > > > > >
> > > > > > > pom = foreach rows generate myUDF.toTuple($1); -- reading data
> > > > > > > describe pom
> > > > > > > pom: {y: {t: (domain: chararray,spam: int,size: long,time:
> > float)}}
> > > > > > >
> > > > > > > data = foreach pom generate flatten($0);
> > > > > > > grunt> describe data;
> > > > > > > data: {y::domain: chararray,y::spam: int,y::size: long,y::time:
> > > > float}
> > > > > > >
> > > > > > >
> > > > > > > I thing they are casted fine, right?
> > > > > > >
> > > > > > > UDF is python one with decorator
> > > > > > > @outputSchema("y:bag{t:tuple(domain:chararray, spam:int,
> > size:long,
> > > > > > > time:float)}")
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> > > > > > >
> > > > > > > > You're getting a 'ClassCastException' because the contents of
> > the
> > > > bags
> > > > > > > > are DataByteArray and not long (or cannot be cast to long). I
> > > > suspect
> > > > > > > > that you're generating the contents of the bag in some way
> from
> > a
> > > > UDF,
> > > > > > > > no?
> > > > > > > >
> > > > > > > > You need to either declare the output schema explicitly in
> the
> > > UDF
> > > > or
> > > > > > > > just use the 'AS' statement. For example, say you have a UDF
> > that
> > > > sums
> > > > > > > > two numbers:
> > > > > > > >
> > > > > > > > data = LOAD 'foobar' AS (int:a, int:b);
> > > > > > > > summed = FOREACH data GENERATE MyFancySummingUDF(a,b) AS
> > > (sum:int);
> > > > > > > > DUMP summed;
> > > > > > > >
> > > > > > > > --jacob
> > > > > > > > @thedatachef
> > > > > > > >
> > > > > > > > On Sun, 2011-04-24 at 18:02 +0200, pob wrote:
> > > > > > > > > x = foreach g2 generate group, data.(size);
> > > > > > > > > dump x;
> > > > > > > > >
> > > > > > > > > ((drm,0),{(464868)})
> > > > > > > > > ((drm,1),{(464868)})
> > > > > > > > > ((snezz,0),{(8073),(8073)})
> > > > > > > > >
> > > > > > > > > but:
> > > > > > > > > x = foreach g2 generate group, SUM(data.size);
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 2011-04-24 18:02:18,910 [Thread-793] WARN
> > > > > > > > > org.apache.hadoop.mapred.LocalJobRunner - job_local_0038
> > > > > > > > > org.apache.pig.backend.executionengine.ExecException: ERROR
> > > 2106:
> > > > > > Error
> > > > > > > > > while computing sum in Initial
> > > > > > > > > at
> > org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:87)
> > > > > > > > > at
> > org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:65)
> > > > > > > > > at
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
> > > > > > > > > at
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273)
> > > > > > > > > at
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
> > > > > > > > > at
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> > > > > > > > > at
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
> > > > > > > > > at
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
> > > > > > > > > at
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> > > > > > > > > at
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> > > > > > > > > at
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> > > > > > > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > > > > > > > at
> > > > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> > > > > > > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > > > > > > > > at
> > > > > > > >
> > > > > >
> > > >
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> > > > > > > > > Caused by: java.lang.ClassCastException:
> > > > > > > > org.apache.pig.data.DataByteArray
> > > > > > > > > cannot be cast to java.lang.Long
> > > > > > > > > at
> > org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:79)
> > > > > > > > > ... 14 more
> > > > > > > > > 2011-04-24 18:02:19,213 [main] INFO
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > > - HadoopJobId: job_local_0038
> > > > > > > > > 2011-04-24 18:02:19,213 [main] INFO
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > > - 0% complete
> > > > > > > > > 2011-04-24 18:02:24,215 [main] INFO
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > > - job job_local_0038 has failed! Stop running all dependent
> > > jobs
> > > > > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > > - 100% complete
> > > > > > > > > 2011-04-24 18:02:24,216 [main] ERROR
> > > > > > > > > org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce
> > > job(s)
> > > > > > failed!
> > > > > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > > > > org.apache.pig.tools.pigstats.PigStats
> > > > > > > > > - Detected Local mode. Stats reported below may be
> incomplete
> > > > > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > > > > org.apache.pig.tools.pigstats.PigStats
> > > > > > > > > - Script Statistics:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Pig Stack Trace
> > > > > > > > > ---------------
> > > > > > > > > ERROR 1066: Unable to open iterator for alias x
> > > > > > > > >
> > > > > > > > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR
> > 1066:
> > > > > > Unable to
> > > > > > > > > open iterator for alias x
> > > > > > > > > at
> > > > org.apache.pig.PigServer.openIterator(PigServer.java:754)
> > > > > > > > > at
> > > > > > > > >
> > > > > >
> > > >
> > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
> > > > > > > > > at
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
> > > > > > > > > at
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> > > > > > > > > at
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> > > > > > > > > at
> > org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> > > > > > > > > at org.apache.pig.Main.run(Main.java:465)
> > > > > > > > > at org.apache.pig.Main.main(Main.java:107)
> > > > > > > > > Caused by: java.io.IOException: Job terminated with
> anomalous
> > > > status
> > > > > > > > FAILED
> > > > > > > > > at
> > > > org.apache.pig.PigServer.openIterator(PigServer.java:744)
> > > > > > > > > ... 7 more
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > >
> > > >
> > > >
> > >
> >
>
Re: SUM
Posted by pob <pe...@gmail.com>.
Hi,
but why i cant re-cast it during flatten?
data = foreach pom generate flatten($0) AS (domain:chararray, spam:int,
size:long, time:float);
grunt> z = foreach data generate time+size;
java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be
cast to java.lang.Float
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.getNext(Add.java:97)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
2011/4/24 Dmitriy Ryaboy <dv...@gmail.com>
> I think it's the deep-casting issue from
> https://issues.apache.org/jira/browse/PIG-1758 .
> Should work in 0.9 but didn't get into 0.8 or 0.8.1
>
> D
>
> On Sun, Apr 24, 2011 at 9:52 AM, pob <pe...@gmail.com> wrote:
>
> > Thats stramge, pygmalion works fine (but there are any numerical
> > operations).
> >
> > I think Im using C* 0.7.5 where it suppose to be patched ;/ so idk :(
> >
> >
> > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> >
> > > That changes things entirely. There's some weirdness in the way data is
> > > read from Cassandra. Have you applied the latest patches (eg.
> > > https://issues.apache.org/jira/browse/CASSANDRA-2387) ?
> > >
> > > See also some UDFs for working with Cassandra data that Jeremy Hanna
> > > (@jeromatron) wrote:
> > >
> > > https://github.com/jeromatron/pygmalion
> > >
> > >
> > > Best of luck!
> > >
> > > --jacob
> > > @thedatachef
> > >
> > > On Sun, 2011-04-24 at 18:31 +0200, pob wrote:
> > > > Maybe I forget one more thing, rows are taken from Cassandra.
> > > >
> > > > rows = LOAD 'cassandra://emailArchive/messagesMetaData' USING
> > > > CassandraStorage() AS (key, columns: bag {T: tuple(name, value)});
> > > >
> > > > I have no idea how to format AS for bag in foreach.
> > > >
> > > >
> > > > P.
> > > >
> > > > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> > > >
> > > > > Strange, that looks right to me. What happens if you try the 'AS'
> > > > > statement anyhow?
> > > > >
> > > > > --jacob
> > > > > @thedatachef
> > > > >
> > > > > On Sun, 2011-04-24 at 18:22 +0200, pob wrote:
> > > > > > Hello,
> > > > > >
> > > > > > pom = foreach rows generate myUDF.toTuple($1); -- reading data
> > > > > > describe pom
> > > > > > pom: {y: {t: (domain: chararray,spam: int,size: long,time:
> float)}}
> > > > > >
> > > > > > data = foreach pom generate flatten($0);
> > > > > > grunt> describe data;
> > > > > > data: {y::domain: chararray,y::spam: int,y::size: long,y::time:
> > > float}
> > > > > >
> > > > > >
> > > > > > I thing they are casted fine, right?
> > > > > >
> > > > > > UDF is python one with decorator
> > > > > > @outputSchema("y:bag{t:tuple(domain:chararray, spam:int,
> size:long,
> > > > > > time:float)}")
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > >
> > > > > >
> > > > > > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> > > > > >
> > > > > > > You're getting a 'ClassCastException' because the contents of
> the
> > > bags
> > > > > > > are DataByteArray and not long (or cannot be cast to long). I
> > > suspect
> > > > > > > that you're generating the contents of the bag in some way from
> a
> > > UDF,
> > > > > > > no?
> > > > > > >
> > > > > > > You need to either declare the output schema explicitly in the
> > UDF
> > > or
> > > > > > > just use the 'AS' statement. For example, say you have a UDF
> that
> > > sums
> > > > > > > two numbers:
> > > > > > >
> > > > > > > data = LOAD 'foobar' AS (int:a, int:b);
> > > > > > > summed = FOREACH data GENERATE MyFancySummingUDF(a,b) AS
> > (sum:int);
> > > > > > > DUMP summed;
> > > > > > >
> > > > > > > --jacob
> > > > > > > @thedatachef
> > > > > > >
> > > > > > > On Sun, 2011-04-24 at 18:02 +0200, pob wrote:
> > > > > > > > x = foreach g2 generate group, data.(size);
> > > > > > > > dump x;
> > > > > > > >
> > > > > > > > ((drm,0),{(464868)})
> > > > > > > > ((drm,1),{(464868)})
> > > > > > > > ((snezz,0),{(8073),(8073)})
> > > > > > > >
> > > > > > > > but:
> > > > > > > > x = foreach g2 generate group, SUM(data.size);
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > 2011-04-24 18:02:18,910 [Thread-793] WARN
> > > > > > > > org.apache.hadoop.mapred.LocalJobRunner - job_local_0038
> > > > > > > > org.apache.pig.backend.executionengine.ExecException: ERROR
> > 2106:
> > > > > Error
> > > > > > > > while computing sum in Initial
> > > > > > > > at
> org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:87)
> > > > > > > > at
> org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:65)
> > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
> > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273)
> > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
> > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
> > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
> > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> > > > > > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > > > > > > at
> > > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> > > > > > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > > > > > > > at
> > > > > > >
> > > > >
> > >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> > > > > > > > Caused by: java.lang.ClassCastException:
> > > > > > > org.apache.pig.data.DataByteArray
> > > > > > > > cannot be cast to java.lang.Long
> > > > > > > > at
> org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:79)
> > > > > > > > ... 14 more
> > > > > > > > 2011-04-24 18:02:19,213 [main] INFO
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > - HadoopJobId: job_local_0038
> > > > > > > > 2011-04-24 18:02:19,213 [main] INFO
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > - 0% complete
> > > > > > > > 2011-04-24 18:02:24,215 [main] INFO
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > - job job_local_0038 has failed! Stop running all dependent
> > jobs
> > > > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > > - 100% complete
> > > > > > > > 2011-04-24 18:02:24,216 [main] ERROR
> > > > > > > > org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce
> > job(s)
> > > > > failed!
> > > > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > > > org.apache.pig.tools.pigstats.PigStats
> > > > > > > > - Detected Local mode. Stats reported below may be incomplete
> > > > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > > > org.apache.pig.tools.pigstats.PigStats
> > > > > > > > - Script Statistics:
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Pig Stack Trace
> > > > > > > > ---------------
> > > > > > > > ERROR 1066: Unable to open iterator for alias x
> > > > > > > >
> > > > > > > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR
> 1066:
> > > > > Unable to
> > > > > > > > open iterator for alias x
> > > > > > > > at
> > > org.apache.pig.PigServer.openIterator(PigServer.java:754)
> > > > > > > > at
> > > > > > > >
> > > > >
> > >
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
> > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
> > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> > > > > > > > at
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> > > > > > > > at
> org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> > > > > > > > at org.apache.pig.Main.run(Main.java:465)
> > > > > > > > at org.apache.pig.Main.main(Main.java:107)
> > > > > > > > Caused by: java.io.IOException: Job terminated with anomalous
> > > status
> > > > > > > FAILED
> > > > > > > > at
> > > org.apache.pig.PigServer.openIterator(PigServer.java:744)
> > > > > > > > ... 7 more
> > > > > > >
> > > > > > >
> > > > > > >
> > > > >
> > > > >
> > > > >
> > >
> > >
> > >
> >
>
Re: SUM
Posted by Dmitriy Ryaboy <dv...@gmail.com>.
I think it's the deep-casting issue from
https://issues.apache.org/jira/browse/PIG-1758 .
Should work in 0.9 but didn't get into 0.8 or 0.8.1
D
On Sun, Apr 24, 2011 at 9:52 AM, pob <pe...@gmail.com> wrote:
> Thats stramge, pygmalion works fine (but there are any numerical
> operations).
>
> I think Im using C* 0.7.5 where it suppose to be patched ;/ so idk :(
>
>
> 2011/4/24 Jacob Perkins <ja...@gmail.com>
>
> > That changes things entirely. There's some weirdness in the way data is
> > read from Cassandra. Have you applied the latest patches (eg.
> > https://issues.apache.org/jira/browse/CASSANDRA-2387) ?
> >
> > See also some UDFs for working with Cassandra data that Jeremy Hanna
> > (@jeromatron) wrote:
> >
> > https://github.com/jeromatron/pygmalion
> >
> >
> > Best of luck!
> >
> > --jacob
> > @thedatachef
> >
> > On Sun, 2011-04-24 at 18:31 +0200, pob wrote:
> > > Maybe I forget one more thing, rows are taken from Cassandra.
> > >
> > > rows = LOAD 'cassandra://emailArchive/messagesMetaData' USING
> > > CassandraStorage() AS (key, columns: bag {T: tuple(name, value)});
> > >
> > > I have no idea how to format AS for bag in foreach.
> > >
> > >
> > > P.
> > >
> > > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> > >
> > > > Strange, that looks right to me. What happens if you try the 'AS'
> > > > statement anyhow?
> > > >
> > > > --jacob
> > > > @thedatachef
> > > >
> > > > On Sun, 2011-04-24 at 18:22 +0200, pob wrote:
> > > > > Hello,
> > > > >
> > > > > pom = foreach rows generate myUDF.toTuple($1); -- reading data
> > > > > describe pom
> > > > > pom: {y: {t: (domain: chararray,spam: int,size: long,time: float)}}
> > > > >
> > > > > data = foreach pom generate flatten($0);
> > > > > grunt> describe data;
> > > > > data: {y::domain: chararray,y::spam: int,y::size: long,y::time:
> > float}
> > > > >
> > > > >
> > > > > I thing they are casted fine, right?
> > > > >
> > > > > UDF is python one with decorator
> > > > > @outputSchema("y:bag{t:tuple(domain:chararray, spam:int, size:long,
> > > > > time:float)}")
> > > > >
> > > > > Thanks
> > > > >
> > > > >
> > > > >
> > > > > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> > > > >
> > > > > > You're getting a 'ClassCastException' because the contents of the
> > bags
> > > > > > are DataByteArray and not long (or cannot be cast to long). I
> > suspect
> > > > > > that you're generating the contents of the bag in some way from a
> > UDF,
> > > > > > no?
> > > > > >
> > > > > > You need to either declare the output schema explicitly in the
> UDF
> > or
> > > > > > just use the 'AS' statement. For example, say you have a UDF that
> > sums
> > > > > > two numbers:
> > > > > >
> > > > > > data = LOAD 'foobar' AS (int:a, int:b);
> > > > > > summed = FOREACH data GENERATE MyFancySummingUDF(a,b) AS
> (sum:int);
> > > > > > DUMP summed;
> > > > > >
> > > > > > --jacob
> > > > > > @thedatachef
> > > > > >
> > > > > > On Sun, 2011-04-24 at 18:02 +0200, pob wrote:
> > > > > > > x = foreach g2 generate group, data.(size);
> > > > > > > dump x;
> > > > > > >
> > > > > > > ((drm,0),{(464868)})
> > > > > > > ((drm,1),{(464868)})
> > > > > > > ((snezz,0),{(8073),(8073)})
> > > > > > >
> > > > > > > but:
> > > > > > > x = foreach g2 generate group, SUM(data.size);
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > 2011-04-24 18:02:18,910 [Thread-793] WARN
> > > > > > > org.apache.hadoop.mapred.LocalJobRunner - job_local_0038
> > > > > > > org.apache.pig.backend.executionengine.ExecException: ERROR
> 2106:
> > > > Error
> > > > > > > while computing sum in Initial
> > > > > > > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:87)
> > > > > > > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:65)
> > > > > > > at
> > > > > > >
> > > > > >
> > > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
> > > > > > > at
> > > > > > >
> > > > > >
> > > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273)
> > > > > > > at
> > > > > > >
> > > > > >
> > > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
> > > > > > > at
> > > > > > >
> > > > > >
> > > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> > > > > > > at
> > > > > > >
> > > > > >
> > > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
> > > > > > > at
> > > > > > >
> > > > > >
> > > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
> > > > > > > at
> > > > > > >
> > > > > >
> > > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> > > > > > > at
> > > > > > >
> > > > > >
> > > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> > > > > > > at
> > > > > > >
> > > > > >
> > > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> > > > > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > > > > > at
> > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> > > > > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > > > > > > at
> > > > > >
> > > >
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> > > > > > > Caused by: java.lang.ClassCastException:
> > > > > > org.apache.pig.data.DataByteArray
> > > > > > > cannot be cast to java.lang.Long
> > > > > > > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:79)
> > > > > > > ... 14 more
> > > > > > > 2011-04-24 18:02:19,213 [main] INFO
> > > > > > >
> > > > > >
> > > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > - HadoopJobId: job_local_0038
> > > > > > > 2011-04-24 18:02:19,213 [main] INFO
> > > > > > >
> > > > > >
> > > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > - 0% complete
> > > > > > > 2011-04-24 18:02:24,215 [main] INFO
> > > > > > >
> > > > > >
> > > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > - job job_local_0038 has failed! Stop running all dependent
> jobs
> > > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > > >
> > > > > >
> > > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > > - 100% complete
> > > > > > > 2011-04-24 18:02:24,216 [main] ERROR
> > > > > > > org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce
> job(s)
> > > > failed!
> > > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > > org.apache.pig.tools.pigstats.PigStats
> > > > > > > - Detected Local mode. Stats reported below may be incomplete
> > > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > > org.apache.pig.tools.pigstats.PigStats
> > > > > > > - Script Statistics:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Pig Stack Trace
> > > > > > > ---------------
> > > > > > > ERROR 1066: Unable to open iterator for alias x
> > > > > > >
> > > > > > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066:
> > > > Unable to
> > > > > > > open iterator for alias x
> > > > > > > at
> > org.apache.pig.PigServer.openIterator(PigServer.java:754)
> > > > > > > at
> > > > > > >
> > > >
> > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
> > > > > > > at
> > > > > > >
> > > > > >
> > > >
> >
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
> > > > > > > at
> > > > > > >
> > > > > >
> > > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> > > > > > > at
> > > > > > >
> > > > > >
> > > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> > > > > > > at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> > > > > > > at org.apache.pig.Main.run(Main.java:465)
> > > > > > > at org.apache.pig.Main.main(Main.java:107)
> > > > > > > Caused by: java.io.IOException: Job terminated with anomalous
> > status
> > > > > > FAILED
> > > > > > > at
> > org.apache.pig.PigServer.openIterator(PigServer.java:744)
> > > > > > > ... 7 more
> > > > > >
> > > > > >
> > > > > >
> > > >
> > > >
> > > >
> >
> >
> >
>
Re: SUM
Posted by pob <pe...@gmail.com>.
Thats stramge, pygmalion works fine (but there are any numerical
operations).
I think Im using C* 0.7.5 where it suppose to be patched ;/ so idk :(
2011/4/24 Jacob Perkins <ja...@gmail.com>
> That changes things entirely. There's some weirdness in the way data is
> read from Cassandra. Have you applied the latest patches (eg.
> https://issues.apache.org/jira/browse/CASSANDRA-2387) ?
>
> See also some UDFs for working with Cassandra data that Jeremy Hanna
> (@jeromatron) wrote:
>
> https://github.com/jeromatron/pygmalion
>
>
> Best of luck!
>
> --jacob
> @thedatachef
>
> On Sun, 2011-04-24 at 18:31 +0200, pob wrote:
> > Maybe I forget one more thing, rows are taken from Cassandra.
> >
> > rows = LOAD 'cassandra://emailArchive/messagesMetaData' USING
> > CassandraStorage() AS (key, columns: bag {T: tuple(name, value)});
> >
> > I have no idea how to format AS for bag in foreach.
> >
> >
> > P.
> >
> > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> >
> > > Strange, that looks right to me. What happens if you try the 'AS'
> > > statement anyhow?
> > >
> > > --jacob
> > > @thedatachef
> > >
> > > On Sun, 2011-04-24 at 18:22 +0200, pob wrote:
> > > > Hello,
> > > >
> > > > pom = foreach rows generate myUDF.toTuple($1); -- reading data
> > > > describe pom
> > > > pom: {y: {t: (domain: chararray,spam: int,size: long,time: float)}}
> > > >
> > > > data = foreach pom generate flatten($0);
> > > > grunt> describe data;
> > > > data: {y::domain: chararray,y::spam: int,y::size: long,y::time:
> float}
> > > >
> > > >
> > > > I thing they are casted fine, right?
> > > >
> > > > UDF is python one with decorator
> > > > @outputSchema("y:bag{t:tuple(domain:chararray, spam:int, size:long,
> > > > time:float)}")
> > > >
> > > > Thanks
> > > >
> > > >
> > > >
> > > > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> > > >
> > > > > You're getting a 'ClassCastException' because the contents of the
> bags
> > > > > are DataByteArray and not long (or cannot be cast to long). I
> suspect
> > > > > that you're generating the contents of the bag in some way from a
> UDF,
> > > > > no?
> > > > >
> > > > > You need to either declare the output schema explicitly in the UDF
> or
> > > > > just use the 'AS' statement. For example, say you have a UDF that
> sums
> > > > > two numbers:
> > > > >
> > > > > data = LOAD 'foobar' AS (int:a, int:b);
> > > > > summed = FOREACH data GENERATE MyFancySummingUDF(a,b) AS (sum:int);
> > > > > DUMP summed;
> > > > >
> > > > > --jacob
> > > > > @thedatachef
> > > > >
> > > > > On Sun, 2011-04-24 at 18:02 +0200, pob wrote:
> > > > > > x = foreach g2 generate group, data.(size);
> > > > > > dump x;
> > > > > >
> > > > > > ((drm,0),{(464868)})
> > > > > > ((drm,1),{(464868)})
> > > > > > ((snezz,0),{(8073),(8073)})
> > > > > >
> > > > > > but:
> > > > > > x = foreach g2 generate group, SUM(data.size);
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > 2011-04-24 18:02:18,910 [Thread-793] WARN
> > > > > > org.apache.hadoop.mapred.LocalJobRunner - job_local_0038
> > > > > > org.apache.pig.backend.executionengine.ExecException: ERROR 2106:
> > > Error
> > > > > > while computing sum in Initial
> > > > > > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:87)
> > > > > > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:65)
> > > > > > at
> > > > > >
> > > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
> > > > > > at
> > > > > >
> > > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273)
> > > > > > at
> > > > > >
> > > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
> > > > > > at
> > > > > >
> > > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> > > > > > at
> > > > > >
> > > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
> > > > > > at
> > > > > >
> > > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
> > > > > > at
> > > > > >
> > > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> > > > > > at
> > > > > >
> > > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> > > > > > at
> > > > > >
> > > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> > > > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > > > > at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> > > > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > > > > > at
> > > > >
> > >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> > > > > > Caused by: java.lang.ClassCastException:
> > > > > org.apache.pig.data.DataByteArray
> > > > > > cannot be cast to java.lang.Long
> > > > > > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:79)
> > > > > > ... 14 more
> > > > > > 2011-04-24 18:02:19,213 [main] INFO
> > > > > >
> > > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > - HadoopJobId: job_local_0038
> > > > > > 2011-04-24 18:02:19,213 [main] INFO
> > > > > >
> > > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > - 0% complete
> > > > > > 2011-04-24 18:02:24,215 [main] INFO
> > > > > >
> > > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > - job job_local_0038 has failed! Stop running all dependent jobs
> > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > >
> > > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > > - 100% complete
> > > > > > 2011-04-24 18:02:24,216 [main] ERROR
> > > > > > org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s)
> > > failed!
> > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > org.apache.pig.tools.pigstats.PigStats
> > > > > > - Detected Local mode. Stats reported below may be incomplete
> > > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > > org.apache.pig.tools.pigstats.PigStats
> > > > > > - Script Statistics:
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Pig Stack Trace
> > > > > > ---------------
> > > > > > ERROR 1066: Unable to open iterator for alias x
> > > > > >
> > > > > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066:
> > > Unable to
> > > > > > open iterator for alias x
> > > > > > at
> org.apache.pig.PigServer.openIterator(PigServer.java:754)
> > > > > > at
> > > > > >
> > >
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
> > > > > > at
> > > > > >
> > > > >
> > >
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
> > > > > > at
> > > > > >
> > > > >
> > >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> > > > > > at
> > > > > >
> > > > >
> > >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> > > > > > at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> > > > > > at org.apache.pig.Main.run(Main.java:465)
> > > > > > at org.apache.pig.Main.main(Main.java:107)
> > > > > > Caused by: java.io.IOException: Job terminated with anomalous
> status
> > > > > FAILED
> > > > > > at
> org.apache.pig.PigServer.openIterator(PigServer.java:744)
> > > > > > ... 7 more
> > > > >
> > > > >
> > > > >
> > >
> > >
> > >
>
>
>
Re: SUM
Posted by Jacob Perkins <ja...@gmail.com>.
That changes things entirely. There's some weirdness in the way data is
read from Cassandra. Have you applied the latest patches (eg.
https://issues.apache.org/jira/browse/CASSANDRA-2387) ?
See also some UDFs for working with Cassandra data that Jeremy Hanna
(@jeromatron) wrote:
https://github.com/jeromatron/pygmalion
Best of luck!
--jacob
@thedatachef
On Sun, 2011-04-24 at 18:31 +0200, pob wrote:
> Maybe I forget one more thing, rows are taken from Cassandra.
>
> rows = LOAD 'cassandra://emailArchive/messagesMetaData' USING
> CassandraStorage() AS (key, columns: bag {T: tuple(name, value)});
>
> I have no idea how to format AS for bag in foreach.
>
>
> P.
>
> 2011/4/24 Jacob Perkins <ja...@gmail.com>
>
> > Strange, that looks right to me. What happens if you try the 'AS'
> > statement anyhow?
> >
> > --jacob
> > @thedatachef
> >
> > On Sun, 2011-04-24 at 18:22 +0200, pob wrote:
> > > Hello,
> > >
> > > pom = foreach rows generate myUDF.toTuple($1); -- reading data
> > > describe pom
> > > pom: {y: {t: (domain: chararray,spam: int,size: long,time: float)}}
> > >
> > > data = foreach pom generate flatten($0);
> > > grunt> describe data;
> > > data: {y::domain: chararray,y::spam: int,y::size: long,y::time: float}
> > >
> > >
> > > I thing they are casted fine, right?
> > >
> > > UDF is python one with decorator
> > > @outputSchema("y:bag{t:tuple(domain:chararray, spam:int, size:long,
> > > time:float)}")
> > >
> > > Thanks
> > >
> > >
> > >
> > > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> > >
> > > > You're getting a 'ClassCastException' because the contents of the bags
> > > > are DataByteArray and not long (or cannot be cast to long). I suspect
> > > > that you're generating the contents of the bag in some way from a UDF,
> > > > no?
> > > >
> > > > You need to either declare the output schema explicitly in the UDF or
> > > > just use the 'AS' statement. For example, say you have a UDF that sums
> > > > two numbers:
> > > >
> > > > data = LOAD 'foobar' AS (int:a, int:b);
> > > > summed = FOREACH data GENERATE MyFancySummingUDF(a,b) AS (sum:int);
> > > > DUMP summed;
> > > >
> > > > --jacob
> > > > @thedatachef
> > > >
> > > > On Sun, 2011-04-24 at 18:02 +0200, pob wrote:
> > > > > x = foreach g2 generate group, data.(size);
> > > > > dump x;
> > > > >
> > > > > ((drm,0),{(464868)})
> > > > > ((drm,1),{(464868)})
> > > > > ((snezz,0),{(8073),(8073)})
> > > > >
> > > > > but:
> > > > > x = foreach g2 generate group, SUM(data.size);
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > 2011-04-24 18:02:18,910 [Thread-793] WARN
> > > > > org.apache.hadoop.mapred.LocalJobRunner - job_local_0038
> > > > > org.apache.pig.backend.executionengine.ExecException: ERROR 2106:
> > Error
> > > > > while computing sum in Initial
> > > > > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:87)
> > > > > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:65)
> > > > > at
> > > > >
> > > >
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
> > > > > at
> > > > >
> > > >
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273)
> > > > > at
> > > > >
> > > >
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
> > > > > at
> > > > >
> > > >
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> > > > > at
> > > > >
> > > >
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
> > > > > at
> > > > >
> > > >
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
> > > > > at
> > > > >
> > > >
> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> > > > > at
> > > > >
> > > >
> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> > > > > at
> > > > >
> > > >
> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> > > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> > > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > > > > at
> > > >
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> > > > > Caused by: java.lang.ClassCastException:
> > > > org.apache.pig.data.DataByteArray
> > > > > cannot be cast to java.lang.Long
> > > > > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:79)
> > > > > ... 14 more
> > > > > 2011-04-24 18:02:19,213 [main] INFO
> > > > >
> > > >
> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > - HadoopJobId: job_local_0038
> > > > > 2011-04-24 18:02:19,213 [main] INFO
> > > > >
> > > >
> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > - 0% complete
> > > > > 2011-04-24 18:02:24,215 [main] INFO
> > > > >
> > > >
> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > - job job_local_0038 has failed! Stop running all dependent jobs
> > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > >
> > > >
> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > - 100% complete
> > > > > 2011-04-24 18:02:24,216 [main] ERROR
> > > > > org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s)
> > failed!
> > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > org.apache.pig.tools.pigstats.PigStats
> > > > > - Detected Local mode. Stats reported below may be incomplete
> > > > > 2011-04-24 18:02:24,216 [main] INFO
> > > > org.apache.pig.tools.pigstats.PigStats
> > > > > - Script Statistics:
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Pig Stack Trace
> > > > > ---------------
> > > > > ERROR 1066: Unable to open iterator for alias x
> > > > >
> > > > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066:
> > Unable to
> > > > > open iterator for alias x
> > > > > at org.apache.pig.PigServer.openIterator(PigServer.java:754)
> > > > > at
> > > > >
> > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
> > > > > at
> > > > >
> > > >
> > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
> > > > > at
> > > > >
> > > >
> > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> > > > > at
> > > > >
> > > >
> > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> > > > > at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> > > > > at org.apache.pig.Main.run(Main.java:465)
> > > > > at org.apache.pig.Main.main(Main.java:107)
> > > > > Caused by: java.io.IOException: Job terminated with anomalous status
> > > > FAILED
> > > > > at org.apache.pig.PigServer.openIterator(PigServer.java:744)
> > > > > ... 7 more
> > > >
> > > >
> > > >
> >
> >
> >
Re: SUM
Posted by pob <pe...@gmail.com>.
Maybe I forget one more thing, rows are taken from Cassandra.
rows = LOAD 'cassandra://emailArchive/messagesMetaData' USING
CassandraStorage() AS (key, columns: bag {T: tuple(name, value)});
I have no idea how to format AS for bag in foreach.
P.
2011/4/24 Jacob Perkins <ja...@gmail.com>
> Strange, that looks right to me. What happens if you try the 'AS'
> statement anyhow?
>
> --jacob
> @thedatachef
>
> On Sun, 2011-04-24 at 18:22 +0200, pob wrote:
> > Hello,
> >
> > pom = foreach rows generate myUDF.toTuple($1); -- reading data
> > describe pom
> > pom: {y: {t: (domain: chararray,spam: int,size: long,time: float)}}
> >
> > data = foreach pom generate flatten($0);
> > grunt> describe data;
> > data: {y::domain: chararray,y::spam: int,y::size: long,y::time: float}
> >
> >
> > I thing they are casted fine, right?
> >
> > UDF is python one with decorator
> > @outputSchema("y:bag{t:tuple(domain:chararray, spam:int, size:long,
> > time:float)}")
> >
> > Thanks
> >
> >
> >
> > 2011/4/24 Jacob Perkins <ja...@gmail.com>
> >
> > > You're getting a 'ClassCastException' because the contents of the bags
> > > are DataByteArray and not long (or cannot be cast to long). I suspect
> > > that you're generating the contents of the bag in some way from a UDF,
> > > no?
> > >
> > > You need to either declare the output schema explicitly in the UDF or
> > > just use the 'AS' statement. For example, say you have a UDF that sums
> > > two numbers:
> > >
> > > data = LOAD 'foobar' AS (int:a, int:b);
> > > summed = FOREACH data GENERATE MyFancySummingUDF(a,b) AS (sum:int);
> > > DUMP summed;
> > >
> > > --jacob
> > > @thedatachef
> > >
> > > On Sun, 2011-04-24 at 18:02 +0200, pob wrote:
> > > > x = foreach g2 generate group, data.(size);
> > > > dump x;
> > > >
> > > > ((drm,0),{(464868)})
> > > > ((drm,1),{(464868)})
> > > > ((snezz,0),{(8073),(8073)})
> > > >
> > > > but:
> > > > x = foreach g2 generate group, SUM(data.size);
> > > >
> > > >
> > > >
> > > >
> > > > 2011-04-24 18:02:18,910 [Thread-793] WARN
> > > > org.apache.hadoop.mapred.LocalJobRunner - job_local_0038
> > > > org.apache.pig.backend.executionengine.ExecException: ERROR 2106:
> Error
> > > > while computing sum in Initial
> > > > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:87)
> > > > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:65)
> > > > at
> > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
> > > > at
> > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273)
> > > > at
> > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
> > > > at
> > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> > > > at
> > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
> > > > at
> > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
> > > > at
> > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> > > > at
> > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> > > > at
> > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > > > at
> > >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> > > > Caused by: java.lang.ClassCastException:
> > > org.apache.pig.data.DataByteArray
> > > > cannot be cast to java.lang.Long
> > > > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:79)
> > > > ... 14 more
> > > > 2011-04-24 18:02:19,213 [main] INFO
> > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > - HadoopJobId: job_local_0038
> > > > 2011-04-24 18:02:19,213 [main] INFO
> > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > - 0% complete
> > > > 2011-04-24 18:02:24,215 [main] INFO
> > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > - job job_local_0038 has failed! Stop running all dependent jobs
> > > > 2011-04-24 18:02:24,216 [main] INFO
> > > >
> > >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > - 100% complete
> > > > 2011-04-24 18:02:24,216 [main] ERROR
> > > > org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s)
> failed!
> > > > 2011-04-24 18:02:24,216 [main] INFO
> > > org.apache.pig.tools.pigstats.PigStats
> > > > - Detected Local mode. Stats reported below may be incomplete
> > > > 2011-04-24 18:02:24,216 [main] INFO
> > > org.apache.pig.tools.pigstats.PigStats
> > > > - Script Statistics:
> > > >
> > > >
> > > >
> > > >
> > > > Pig Stack Trace
> > > > ---------------
> > > > ERROR 1066: Unable to open iterator for alias x
> > > >
> > > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066:
> Unable to
> > > > open iterator for alias x
> > > > at org.apache.pig.PigServer.openIterator(PigServer.java:754)
> > > > at
> > > >
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
> > > > at
> > > >
> > >
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
> > > > at
> > > >
> > >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> > > > at
> > > >
> > >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> > > > at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> > > > at org.apache.pig.Main.run(Main.java:465)
> > > > at org.apache.pig.Main.main(Main.java:107)
> > > > Caused by: java.io.IOException: Job terminated with anomalous status
> > > FAILED
> > > > at org.apache.pig.PigServer.openIterator(PigServer.java:744)
> > > > ... 7 more
> > >
> > >
> > >
>
>
>
Re: SUM
Posted by Jacob Perkins <ja...@gmail.com>.
Strange, that looks right to me. What happens if you try the 'AS'
statement anyhow?
--jacob
@thedatachef
On Sun, 2011-04-24 at 18:22 +0200, pob wrote:
> Hello,
>
> pom = foreach rows generate myUDF.toTuple($1); -- reading data
> describe pom
> pom: {y: {t: (domain: chararray,spam: int,size: long,time: float)}}
>
> data = foreach pom generate flatten($0);
> grunt> describe data;
> data: {y::domain: chararray,y::spam: int,y::size: long,y::time: float}
>
>
> I thing they are casted fine, right?
>
> UDF is python one with decorator
> @outputSchema("y:bag{t:tuple(domain:chararray, spam:int, size:long,
> time:float)}")
>
> Thanks
>
>
>
> 2011/4/24 Jacob Perkins <ja...@gmail.com>
>
> > You're getting a 'ClassCastException' because the contents of the bags
> > are DataByteArray and not long (or cannot be cast to long). I suspect
> > that you're generating the contents of the bag in some way from a UDF,
> > no?
> >
> > You need to either declare the output schema explicitly in the UDF or
> > just use the 'AS' statement. For example, say you have a UDF that sums
> > two numbers:
> >
> > data = LOAD 'foobar' AS (int:a, int:b);
> > summed = FOREACH data GENERATE MyFancySummingUDF(a,b) AS (sum:int);
> > DUMP summed;
> >
> > --jacob
> > @thedatachef
> >
> > On Sun, 2011-04-24 at 18:02 +0200, pob wrote:
> > > x = foreach g2 generate group, data.(size);
> > > dump x;
> > >
> > > ((drm,0),{(464868)})
> > > ((drm,1),{(464868)})
> > > ((snezz,0),{(8073),(8073)})
> > >
> > > but:
> > > x = foreach g2 generate group, SUM(data.size);
> > >
> > >
> > >
> > >
> > > 2011-04-24 18:02:18,910 [Thread-793] WARN
> > > org.apache.hadoop.mapred.LocalJobRunner - job_local_0038
> > > org.apache.pig.backend.executionengine.ExecException: ERROR 2106: Error
> > > while computing sum in Initial
> > > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:87)
> > > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:65)
> > > at
> > >
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
> > > at
> > >
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273)
> > > at
> > >
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
> > > at
> > >
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> > > at
> > >
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
> > > at
> > >
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
> > > at
> > >
> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> > > at
> > >
> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> > > at
> > >
> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > > at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> > > Caused by: java.lang.ClassCastException:
> > org.apache.pig.data.DataByteArray
> > > cannot be cast to java.lang.Long
> > > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:79)
> > > ... 14 more
> > > 2011-04-24 18:02:19,213 [main] INFO
> > >
> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > - HadoopJobId: job_local_0038
> > > 2011-04-24 18:02:19,213 [main] INFO
> > >
> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > - 0% complete
> > > 2011-04-24 18:02:24,215 [main] INFO
> > >
> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > - job job_local_0038 has failed! Stop running all dependent jobs
> > > 2011-04-24 18:02:24,216 [main] INFO
> > >
> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > - 100% complete
> > > 2011-04-24 18:02:24,216 [main] ERROR
> > > org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
> > > 2011-04-24 18:02:24,216 [main] INFO
> > org.apache.pig.tools.pigstats.PigStats
> > > - Detected Local mode. Stats reported below may be incomplete
> > > 2011-04-24 18:02:24,216 [main] INFO
> > org.apache.pig.tools.pigstats.PigStats
> > > - Script Statistics:
> > >
> > >
> > >
> > >
> > > Pig Stack Trace
> > > ---------------
> > > ERROR 1066: Unable to open iterator for alias x
> > >
> > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
> > > open iterator for alias x
> > > at org.apache.pig.PigServer.openIterator(PigServer.java:754)
> > > at
> > > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
> > > at
> > >
> > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
> > > at
> > >
> > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> > > at
> > >
> > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> > > at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> > > at org.apache.pig.Main.run(Main.java:465)
> > > at org.apache.pig.Main.main(Main.java:107)
> > > Caused by: java.io.IOException: Job terminated with anomalous status
> > FAILED
> > > at org.apache.pig.PigServer.openIterator(PigServer.java:744)
> > > ... 7 more
> >
> >
> >
Re: SUM
Posted by pob <pe...@gmail.com>.
Hello,
pom = foreach rows generate myUDF.toTuple($1); -- reading data
describe pom
pom: {y: {t: (domain: chararray,spam: int,size: long,time: float)}}
data = foreach pom generate flatten($0);
grunt> describe data;
data: {y::domain: chararray,y::spam: int,y::size: long,y::time: float}
I thing they are casted fine, right?
UDF is python one with decorator
@outputSchema("y:bag{t:tuple(domain:chararray, spam:int, size:long,
time:float)}")
Thanks
2011/4/24 Jacob Perkins <ja...@gmail.com>
> You're getting a 'ClassCastException' because the contents of the bags
> are DataByteArray and not long (or cannot be cast to long). I suspect
> that you're generating the contents of the bag in some way from a UDF,
> no?
>
> You need to either declare the output schema explicitly in the UDF or
> just use the 'AS' statement. For example, say you have a UDF that sums
> two numbers:
>
> data = LOAD 'foobar' AS (int:a, int:b);
> summed = FOREACH data GENERATE MyFancySummingUDF(a,b) AS (sum:int);
> DUMP summed;
>
> --jacob
> @thedatachef
>
> On Sun, 2011-04-24 at 18:02 +0200, pob wrote:
> > x = foreach g2 generate group, data.(size);
> > dump x;
> >
> > ((drm,0),{(464868)})
> > ((drm,1),{(464868)})
> > ((snezz,0),{(8073),(8073)})
> >
> > but:
> > x = foreach g2 generate group, SUM(data.size);
> >
> >
> >
> >
> > 2011-04-24 18:02:18,910 [Thread-793] WARN
> > org.apache.hadoop.mapred.LocalJobRunner - job_local_0038
> > org.apache.pig.backend.executionengine.ExecException: ERROR 2106: Error
> > while computing sum in Initial
> > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:87)
> > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:65)
> > at
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
> > at
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273)
> > at
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
> > at
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> > at
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
> > at
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
> > at
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> > at
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> > at
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> > Caused by: java.lang.ClassCastException:
> org.apache.pig.data.DataByteArray
> > cannot be cast to java.lang.Long
> > at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:79)
> > ... 14 more
> > 2011-04-24 18:02:19,213 [main] INFO
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - HadoopJobId: job_local_0038
> > 2011-04-24 18:02:19,213 [main] INFO
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - 0% complete
> > 2011-04-24 18:02:24,215 [main] INFO
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - job job_local_0038 has failed! Stop running all dependent jobs
> > 2011-04-24 18:02:24,216 [main] INFO
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - 100% complete
> > 2011-04-24 18:02:24,216 [main] ERROR
> > org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
> > 2011-04-24 18:02:24,216 [main] INFO
> org.apache.pig.tools.pigstats.PigStats
> > - Detected Local mode. Stats reported below may be incomplete
> > 2011-04-24 18:02:24,216 [main] INFO
> org.apache.pig.tools.pigstats.PigStats
> > - Script Statistics:
> >
> >
> >
> >
> > Pig Stack Trace
> > ---------------
> > ERROR 1066: Unable to open iterator for alias x
> >
> > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
> > open iterator for alias x
> > at org.apache.pig.PigServer.openIterator(PigServer.java:754)
> > at
> > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
> > at
> >
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
> > at
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> > at
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> > at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> > at org.apache.pig.Main.run(Main.java:465)
> > at org.apache.pig.Main.main(Main.java:107)
> > Caused by: java.io.IOException: Job terminated with anomalous status
> FAILED
> > at org.apache.pig.PigServer.openIterator(PigServer.java:744)
> > ... 7 more
>
>
>
Re: SUM
Posted by Jacob Perkins <ja...@gmail.com>.
You're getting a 'ClassCastException' because the contents of the bags
are DataByteArray and not long (or cannot be cast to long). I suspect
that you're generating the contents of the bag in some way from a UDF,
no?
You need to either declare the output schema explicitly in the UDF or
just use the 'AS' statement. For example, say you have a UDF that sums
two numbers:
data = LOAD 'foobar' AS (int:a, int:b);
summed = FOREACH data GENERATE MyFancySummingUDF(a,b) AS (sum:int);
DUMP summed;
--jacob
@thedatachef
On Sun, 2011-04-24 at 18:02 +0200, pob wrote:
> x = foreach g2 generate group, data.(size);
> dump x;
>
> ((drm,0),{(464868)})
> ((drm,1),{(464868)})
> ((snezz,0),{(8073),(8073)})
>
> but:
> x = foreach g2 generate group, SUM(data.size);
>
>
>
>
> 2011-04-24 18:02:18,910 [Thread-793] WARN
> org.apache.hadoop.mapred.LocalJobRunner - job_local_0038
> org.apache.pig.backend.executionengine.ExecException: ERROR 2106: Error
> while computing sum in Initial
> at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:87)
> at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:65)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> Caused by: java.lang.ClassCastException: org.apache.pig.data.DataByteArray
> cannot be cast to java.lang.Long
> at org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:79)
> ... 14 more
> 2011-04-24 18:02:19,213 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - HadoopJobId: job_local_0038
> 2011-04-24 18:02:19,213 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 0% complete
> 2011-04-24 18:02:24,215 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - job job_local_0038 has failed! Stop running all dependent jobs
> 2011-04-24 18:02:24,216 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 100% complete
> 2011-04-24 18:02:24,216 [main] ERROR
> org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
> 2011-04-24 18:02:24,216 [main] INFO org.apache.pig.tools.pigstats.PigStats
> - Detected Local mode. Stats reported below may be incomplete
> 2011-04-24 18:02:24,216 [main] INFO org.apache.pig.tools.pigstats.PigStats
> - Script Statistics:
>
>
>
>
> Pig Stack Trace
> ---------------
> ERROR 1066: Unable to open iterator for alias x
>
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
> open iterator for alias x
> at org.apache.pig.PigServer.openIterator(PigServer.java:754)
> at
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
> at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> at org.apache.pig.Main.run(Main.java:465)
> at org.apache.pig.Main.main(Main.java:107)
> Caused by: java.io.IOException: Job terminated with anomalous status FAILED
> at org.apache.pig.PigServer.openIterator(PigServer.java:744)
> ... 7 more