You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Charles Gonçalves <ch...@gmail.com> on 2011/02/11 02:42:42 UTC

Error when using an udf with filter statment

I'm trying just to do a breakdown for all my logs but every time I use  a
operation like :
FILTER alias BY some_udf(alias);
I got a problem.

First  I got : ERROR 0: Scalar has more than one row in the output. :

cfgmc@phoebe:~/workspace-java/MscPigScripts/scripts (121) 23:11:16
scripts:> pig -x local
grunt> REGISTER
/home/speed/cfgmc/workspace-java/MscPigScripts/jar/MscPigUtils.jar
grunt>
grunt> -- Functions Definitions
grunt> DEFINE EdgeLoader msc.pig.EdgeLoader();
grunt> DEFINE valid msc.pig.IsValidUrl();
grunt> raw = LOAD '../inputTestes/wpc_sample.gz' using EdgeLoader;
grunt> Describe raw
raw: {ts: long,timeTaken: int,cIp: chararray,fSize: long,sIp:
chararray,sPort: int,scStatus: chararray,scBytes: long,csMethod:
chararray,url: chararray,rsDuration: int,rsBytes: int,referrer:
chararray,ua: chararray,edgeId: chararray}
grunt> B = FOREACH raw GENERATE cIp,url ;
grunt> describe B;
B: {cIp: chararray,url: chararray}
grunt> *C = FILTER B BY valid(B.url);*
grunt> describe C;
C: {cIp: chararray,url: chararray}
grunt> D = GROUP C BY B.cIp;
grunt> describe D;
D: {group: chararray,C: {cIp: chararray,url: chararray}}
grunt> urls_ok = FOREACH D GENERATE COUNT(C.url);
grunt> describe urls_ok;
urls_ok: {long}
grunt> dump urls_ok;


org.apache.pig.backend.executionengine.ExecException: ERROR 0: Scalar has
more than one row in the output. 1st : (187.113.41.93,
http://webcast.sambatech.com.br/000482/account/8/3/ed92827f3e722bfbbabf89aa4adb0068/ER7_FA_3009_CARRASCONANYDIF_470kbps_2010-09-30.mp4),
2nd :(186.213.248.23,
http://webcast.sambatech.com.br/000482/account/8/3/thumbnail/media/ea41d211f4e277821cb3e9fd392a51cf/R7_CH_TINAROMA_EMAILR7FAZENDA_470kbps_2010-09-140.03426408348605037.jpg
)
 at org.apache.pig.impl.builtin.ReadScalars.exec(ReadScalars.java:89)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
 at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:325)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:169)
 at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:212)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:289)
 at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
 at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPreCombinerLocalRearrange.getNext(POPreCombinerLocalRearrange.java:127)
 at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:240)
 at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
 at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
 at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
 at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)

Then I got :

grunt> REGISTER
/home/speed/cfgmc/workspace-java/MscPigScripts/jar/MscPigUtils.jar
grunt> DEFINE EdgeLoader msc.pig.EdgeLoader();
grunt> DEFINE valid msc.pig.IsValidUrl();
grunt> raw = LOAD '../inputTestes/wpc_sample.gz' using EdgeLoader;
grunt> B = FOREACH raw GENERATE cIp, sIp, sPort, scStatus, csMethod,
scBytes, url ;
grunt> describe B;
B: {cIp: chararray,sIp: chararray,sPort: int,scStatus: chararray,csMethod:
chararray,scBytes: long,url: chararray}
grunt> E = GROUP B ALL ;
grunt> describe E;
E: {group: chararray,B: {cIp: chararray,sIp: chararray,sPort: int,scStatus:
chararray,csMethod: chararray,scBytes: long,url: chararray}}

grunt> edge_breakdown = FOREACH E {
>> dist_cIps = DISTINCT B.cIp;
>> dist_sIps = DISTINCT B.sIp;
>> *urls_ok = FILTER B BY valid(B.url);*
>> GENERATE COUNT(dist_cIps),COUNT(dist_sIps) ,COUNT(urls_ok.url),
COUNT(B.url), SUM(B.scBytes);
>> }
grunt> DESC

DESC       DESCRIBE
grunt> DESCRIBE edge_breakdown;
2011-02-10 23:36:35,274 [main] INFO
 org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-02-10 23:36:35,301 [main] ERROR org.apache.pig.impl.plan.OperatorPlan -
Attempt to connect operator urls_ok: Filter 1-196 which is not in the plan.
2011-02-10 23:36:35,302 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 2219: Unable to process scalar in the plan
Details at logfile:
/home/speed/cfgmc/workspace-java/MscPigScripts/scripts/pig_1297388063472.log
grunt>

The log file  says:

Pig Stack Trace
---------------
ERROR 2219: Unable to process scalar in the plan

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1001: Unable to
describe schema for alias edge_breakdown
 at org.apache.pig.PigServer.dumpSchema(PigServer.java:653)
at
org.apache.pig.tools.grunt.GruntParser.processDescribe(GruntParser.java:236)
 at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:315)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
 at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
 at org.apache.pig.Main.run(Main.java:465)
at org.apache.pig.Main.main(Main.java:107)
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2219:
Unable to process scalar in the plan
at org.apache.pig.PigServer.mergeScalars(PigServer.java:1299)
 at org.apache.pig.PigServer.compileLp(PigServer.java:1304)
at org.apache.pig.PigServer.compileLp(PigServer.java:1241)
 at org.apache.pig.PigServer.dumpSchema(PigServer.java:639)
... 7 more
Caused by: org.apache.pig.impl.plan.PlanException: ERROR 0: Attempt to
connect operator urls_ok: Filter 1-196 which is not in the plan.
 at org.apache.pig.impl.plan.OperatorPlan.checkInPlan(OperatorPlan.java:409)
at
org.apache.pig.impl.plan.OperatorPlan.createSoftLink(OperatorPlan.java:210)
 at org.apache.pig.PigServer.mergeScalars(PigServer.java:1294)
... 10 more
================================================================================

If I run the last  script without the Filter inside the inner foreach it
works perfecty. The udf is used perfectly in other contexts and works fine.



Guys, seriously, what I'm missing here?
I got stuck all day on this issue!


-- 
*Charles Ferreira Gonçalves *
http://homepages.dcc.ufmg.br/~charles/
UFMG - ICEx - Dcc
Cel.: 55 31 87741485
Tel.:  55 31 34741485
Lab.: 55 31 34095840

Re: Error when using an udf with filter statment

Posted by Charles Gonçalves <ch...@gmail.com>.
Thanks Dimitry,  that work ...


On Fri, Feb 11, 2011 at 12:06 AM, Dmitriy Ryaboy <dv...@gmail.com> wrote:

> Charles,
> If you are iterating through a relation, you don't need to refer to it
> in the statement.
>
> meaning:
>
> C = FILTER B BY valid(B.url);
>
> should be
>
> C = FILTER B BY valid(url);
>
> (you already have access to the rows, not to the relation B).
>
> The error you are getting is from a new feature that allows you to
> pretend that some relation is a scalar and use that scalar value
> transparently when iterating over another relation eg:
>
> total = foreach (group stuff all) generate COUNT($1) as cnt;
> percent = foreach (group stuff by type) generate COUNT($1) / total.cnt
>
> Here, I am using the "total" relation as a single-row relation,
> essentially promising Pig that total.cnt is only a single value.
> In your case you are doing that to a multi-row relation, and things blow
> up.
>
> D
>
> On Thu, Feb 10, 2011 at 5:42 PM, Charles Gonçalves <ch...@gmail.com>
> wrote:
> > I'm trying just to do a breakdown for all my logs but every time I use  a
> > operation like :
> > FILTER alias BY some_udf(alias);
> > I got a problem.
> >
> > First  I got : ERROR 0: Scalar has more than one row in the output. :
> >
> > cfgmc@phoebe:~/workspace-java/MscPigScripts/scripts (121) 23:11:16
> > scripts:> pig -x local
> > grunt> REGISTER
> > /home/speed/cfgmc/workspace-java/MscPigScripts/jar/MscPigUtils.jar
> > grunt>
> > grunt> -- Functions Definitions
> > grunt> DEFINE EdgeLoader msc.pig.EdgeLoader();
> > grunt> DEFINE valid msc.pig.IsValidUrl();
> > grunt> raw = LOAD '../inputTestes/wpc_sample.gz' using EdgeLoader;
> > grunt> Describe raw
> > raw: {ts: long,timeTaken: int,cIp: chararray,fSize: long,sIp:
> > chararray,sPort: int,scStatus: chararray,scBytes: long,csMethod:
> > chararray,url: chararray,rsDuration: int,rsBytes: int,referrer:
> > chararray,ua: chararray,edgeId: chararray}
> > grunt> B = FOREACH raw GENERATE cIp,url ;
> > grunt> describe B;
> > B: {cIp: chararray,url: chararray}
> > grunt> *C = FILTER B BY valid(B.url);*
> > grunt> describe C;
> > C: {cIp: chararray,url: chararray}
> > grunt> D = GROUP C BY B.cIp;
> > grunt> describe D;
> > D: {group: chararray,C: {cIp: chararray,url: chararray}}
> > grunt> urls_ok = FOREACH D GENERATE COUNT(C.url);
> > grunt> describe urls_ok;
> > urls_ok: {long}
> > grunt> dump urls_ok;
> >
> >
> > org.apache.pig.backend.executionengine.ExecException: ERROR 0: Scalar has
> > more than one row in the output. 1st : (187.113.41.93,
> >
> http://webcast.sambatech.com.br/000482/account/8/3/ed92827f3e722bfbbabf89aa4adb0068/ER7_FA_3009_CARRASCONANYDIF_470kbps_2010-09-30.mp4
> ),
> > 2nd :(186.213.248.23,
> >
> http://webcast.sambatech.com.br/000482/account/8/3/thumbnail/media/ea41d211f4e277821cb3e9fd392a51cf/R7_CH_TINAROMA_EMAILR7FAZENDA_470kbps_2010-09-140.03426408348605037.jpg
> > )
> >  at org.apache.pig.impl.builtin.ReadScalars.exec(ReadScalars.java:89)
> > at
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
> >  at
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:325)
> > at
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:169)
> >  at
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:212)
> > at
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:289)
> >  at
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
> > at
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
> >  at
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPreCombinerLocalRearrange.getNext(POPreCombinerLocalRearrange.java:127)
> >  at
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
> > at
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:240)
> >  at
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
> > at
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
> >  at
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> > at
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> >  at
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> >  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> >  at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> >
> > Then I got :
> >
> > grunt> REGISTER
> > /home/speed/cfgmc/workspace-java/MscPigScripts/jar/MscPigUtils.jar
> > grunt> DEFINE EdgeLoader msc.pig.EdgeLoader();
> > grunt> DEFINE valid msc.pig.IsValidUrl();
> > grunt> raw = LOAD '../inputTestes/wpc_sample.gz' using EdgeLoader;
> > grunt> B = FOREACH raw GENERATE cIp, sIp, sPort, scStatus, csMethod,
> > scBytes, url ;
> > grunt> describe B;
> > B: {cIp: chararray,sIp: chararray,sPort: int,scStatus:
> chararray,csMethod:
> > chararray,scBytes: long,url: chararray}
> > grunt> E = GROUP B ALL ;
> > grunt> describe E;
> > E: {group: chararray,B: {cIp: chararray,sIp: chararray,sPort:
> int,scStatus:
> > chararray,csMethod: chararray,scBytes: long,url: chararray}}
> >
> > grunt> edge_breakdown = FOREACH E {
> >>> dist_cIps = DISTINCT B.cIp;
> >>> dist_sIps = DISTINCT B.sIp;
> >>> *urls_ok = FILTER B BY valid(B.url);*
> >>> GENERATE COUNT(dist_cIps),COUNT(dist_sIps) ,COUNT(urls_ok.url),
> > COUNT(B.url), SUM(B.scBytes);
> >>> }
> > grunt> DESC
> >
> > DESC       DESCRIBE
> > grunt> DESCRIBE edge_breakdown;
> > 2011-02-10 23:36:35,274 [main] INFO
> >  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
> > with processName=JobTracker, sessionId= - already initialized
> > 2011-02-10 23:36:35,301 [main] ERROR
> org.apache.pig.impl.plan.OperatorPlan -
> > Attempt to connect operator urls_ok: Filter 1-196 which is not in the
> plan.
> > 2011-02-10 23:36:35,302 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> > ERROR 2219: Unable to process scalar in the plan
> > Details at logfile:
> >
> /home/speed/cfgmc/workspace-java/MscPigScripts/scripts/pig_1297388063472.log
> > grunt>
> >
> > The log file  says:
> >
> > Pig Stack Trace
> > ---------------
> > ERROR 2219: Unable to process scalar in the plan
> >
> > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1001: Unable to
> > describe schema for alias edge_breakdown
> >  at org.apache.pig.PigServer.dumpSchema(PigServer.java:653)
> > at
> >
> org.apache.pig.tools.grunt.GruntParser.processDescribe(GruntParser.java:236)
> >  at
> >
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:315)
> > at
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> >  at
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> > at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> >  at org.apache.pig.Main.run(Main.java:465)
> > at org.apache.pig.Main.main(Main.java:107)
> > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR
> 2219:
> > Unable to process scalar in the plan
> > at org.apache.pig.PigServer.mergeScalars(PigServer.java:1299)
> >  at org.apache.pig.PigServer.compileLp(PigServer.java:1304)
> > at org.apache.pig.PigServer.compileLp(PigServer.java:1241)
> >  at org.apache.pig.PigServer.dumpSchema(PigServer.java:639)
> > ... 7 more
> > Caused by: org.apache.pig.impl.plan.PlanException: ERROR 0: Attempt to
> > connect operator urls_ok: Filter 1-196 which is not in the plan.
> >  at
> org.apache.pig.impl.plan.OperatorPlan.checkInPlan(OperatorPlan.java:409)
> > at
> >
> org.apache.pig.impl.plan.OperatorPlan.createSoftLink(OperatorPlan.java:210)
> >  at org.apache.pig.PigServer.mergeScalars(PigServer.java:1294)
> > ... 10 more
> >
> ================================================================================
> >
> > If I run the last  script without the Filter inside the inner foreach it
> > works perfecty. The udf is used perfectly in other contexts and works
> fine.
> >
> >
> >
> > Guys, seriously, what I'm missing here?
> > I got stuck all day on this issue!
> >
> >
> > --
> > *Charles Ferreira Gonçalves *
> > http://homepages.dcc.ufmg.br/~charles/
> > UFMG - ICEx - Dcc
> > Cel.: 55 31 87741485
> > Tel.:  55 31 34741485
> > Lab.: 55 31 34095840
> >
>



-- 
*Charles Ferreira Gonçalves *
http://homepages.dcc.ufmg.br/~charles/
UFMG - ICEx - Dcc
Cel.: 55 31 87741485
Tel.:  55 31 34741485
Lab.: 55 31 34095840

Re: Error when using an udf with filter statment

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Charles,
If you are iterating through a relation, you don't need to refer to it
in the statement.

meaning:

C = FILTER B BY valid(B.url);

should be

C = FILTER B BY valid(url);

(you already have access to the rows, not to the relation B).

The error you are getting is from a new feature that allows you to
pretend that some relation is a scalar and use that scalar value
transparently when iterating over another relation eg:

total = foreach (group stuff all) generate COUNT($1) as cnt;
percent = foreach (group stuff by type) generate COUNT($1) / total.cnt

Here, I am using the "total" relation as a single-row relation,
essentially promising Pig that total.cnt is only a single value.
In your case you are doing that to a multi-row relation, and things blow up.

D

On Thu, Feb 10, 2011 at 5:42 PM, Charles Gonçalves <ch...@gmail.com> wrote:
> I'm trying just to do a breakdown for all my logs but every time I use  a
> operation like :
> FILTER alias BY some_udf(alias);
> I got a problem.
>
> First  I got : ERROR 0: Scalar has more than one row in the output. :
>
> cfgmc@phoebe:~/workspace-java/MscPigScripts/scripts (121) 23:11:16
> scripts:> pig -x local
> grunt> REGISTER
> /home/speed/cfgmc/workspace-java/MscPigScripts/jar/MscPigUtils.jar
> grunt>
> grunt> -- Functions Definitions
> grunt> DEFINE EdgeLoader msc.pig.EdgeLoader();
> grunt> DEFINE valid msc.pig.IsValidUrl();
> grunt> raw = LOAD '../inputTestes/wpc_sample.gz' using EdgeLoader;
> grunt> Describe raw
> raw: {ts: long,timeTaken: int,cIp: chararray,fSize: long,sIp:
> chararray,sPort: int,scStatus: chararray,scBytes: long,csMethod:
> chararray,url: chararray,rsDuration: int,rsBytes: int,referrer:
> chararray,ua: chararray,edgeId: chararray}
> grunt> B = FOREACH raw GENERATE cIp,url ;
> grunt> describe B;
> B: {cIp: chararray,url: chararray}
> grunt> *C = FILTER B BY valid(B.url);*
> grunt> describe C;
> C: {cIp: chararray,url: chararray}
> grunt> D = GROUP C BY B.cIp;
> grunt> describe D;
> D: {group: chararray,C: {cIp: chararray,url: chararray}}
> grunt> urls_ok = FOREACH D GENERATE COUNT(C.url);
> grunt> describe urls_ok;
> urls_ok: {long}
> grunt> dump urls_ok;
>
>
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Scalar has
> more than one row in the output. 1st : (187.113.41.93,
> http://webcast.sambatech.com.br/000482/account/8/3/ed92827f3e722bfbbabf89aa4adb0068/ER7_FA_3009_CARRASCONANYDIF_470kbps_2010-09-30.mp4),
> 2nd :(186.213.248.23,
> http://webcast.sambatech.com.br/000482/account/8/3/thumbnail/media/ea41d211f4e277821cb3e9fd392a51cf/R7_CH_TINAROMA_EMAILR7FAZENDA_470kbps_2010-09-140.03426408348605037.jpg
> )
>  at org.apache.pig.impl.builtin.ReadScalars.exec(ReadScalars.java:89)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
>  at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:325)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:169)
>  at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:212)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:289)
>  at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
>  at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPreCombinerLocalRearrange.getNext(POPreCombinerLocalRearrange.java:127)
>  at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:240)
>  at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>  at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>
> Then I got :
>
> grunt> REGISTER
> /home/speed/cfgmc/workspace-java/MscPigScripts/jar/MscPigUtils.jar
> grunt> DEFINE EdgeLoader msc.pig.EdgeLoader();
> grunt> DEFINE valid msc.pig.IsValidUrl();
> grunt> raw = LOAD '../inputTestes/wpc_sample.gz' using EdgeLoader;
> grunt> B = FOREACH raw GENERATE cIp, sIp, sPort, scStatus, csMethod,
> scBytes, url ;
> grunt> describe B;
> B: {cIp: chararray,sIp: chararray,sPort: int,scStatus: chararray,csMethod:
> chararray,scBytes: long,url: chararray}
> grunt> E = GROUP B ALL ;
> grunt> describe E;
> E: {group: chararray,B: {cIp: chararray,sIp: chararray,sPort: int,scStatus:
> chararray,csMethod: chararray,scBytes: long,url: chararray}}
>
> grunt> edge_breakdown = FOREACH E {
>>> dist_cIps = DISTINCT B.cIp;
>>> dist_sIps = DISTINCT B.sIp;
>>> *urls_ok = FILTER B BY valid(B.url);*
>>> GENERATE COUNT(dist_cIps),COUNT(dist_sIps) ,COUNT(urls_ok.url),
> COUNT(B.url), SUM(B.scBytes);
>>> }
> grunt> DESC
>
> DESC       DESCRIBE
> grunt> DESCRIBE edge_breakdown;
> 2011-02-10 23:36:35,274 [main] INFO
>  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
> with processName=JobTracker, sessionId= - already initialized
> 2011-02-10 23:36:35,301 [main] ERROR org.apache.pig.impl.plan.OperatorPlan -
> Attempt to connect operator urls_ok: Filter 1-196 which is not in the plan.
> 2011-02-10 23:36:35,302 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 2219: Unable to process scalar in the plan
> Details at logfile:
> /home/speed/cfgmc/workspace-java/MscPigScripts/scripts/pig_1297388063472.log
> grunt>
>
> The log file  says:
>
> Pig Stack Trace
> ---------------
> ERROR 2219: Unable to process scalar in the plan
>
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1001: Unable to
> describe schema for alias edge_breakdown
>  at org.apache.pig.PigServer.dumpSchema(PigServer.java:653)
> at
> org.apache.pig.tools.grunt.GruntParser.processDescribe(GruntParser.java:236)
>  at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:315)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>  at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
>  at org.apache.pig.Main.run(Main.java:465)
> at org.apache.pig.Main.main(Main.java:107)
> Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2219:
> Unable to process scalar in the plan
> at org.apache.pig.PigServer.mergeScalars(PigServer.java:1299)
>  at org.apache.pig.PigServer.compileLp(PigServer.java:1304)
> at org.apache.pig.PigServer.compileLp(PigServer.java:1241)
>  at org.apache.pig.PigServer.dumpSchema(PigServer.java:639)
> ... 7 more
> Caused by: org.apache.pig.impl.plan.PlanException: ERROR 0: Attempt to
> connect operator urls_ok: Filter 1-196 which is not in the plan.
>  at org.apache.pig.impl.plan.OperatorPlan.checkInPlan(OperatorPlan.java:409)
> at
> org.apache.pig.impl.plan.OperatorPlan.createSoftLink(OperatorPlan.java:210)
>  at org.apache.pig.PigServer.mergeScalars(PigServer.java:1294)
> ... 10 more
> ================================================================================
>
> If I run the last  script without the Filter inside the inner foreach it
> works perfecty. The udf is used perfectly in other contexts and works fine.
>
>
>
> Guys, seriously, what I'm missing here?
> I got stuck all day on this issue!
>
>
> --
> *Charles Ferreira Gonçalves *
> http://homepages.dcc.ufmg.br/~charles/
> UFMG - ICEx - Dcc
> Cel.: 55 31 87741485
> Tel.:  55 31 34741485
> Lab.: 55 31 34095840
>