You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Skanda <sk...@gmail.com> on 2014/05/09 07:14:42 UTC

issue applying variable to LIMIT

Hi All,

I have a use case to get the get N distinct url's based on the number of
hits and their latest timestamp. Pls find below the snippet of the pig
script that I have written to do this.

prunedUrlData = FOREACH urlPatternData GENERATE (url_pattern is
null?url:url_pattern) AS
url,domid,urlkey,urllen,puid,nwid,lmd,rc,punam,nwnam,ispub,com.xxx.GetDomainStorageLimit(nwid)
AS *domainlimit*;

group_by_Domain_Url = GROUP prunedUrlData BY domid;

rankedUrlByDomain = FOREACH group_by_Domain_Url
{
distinct_url = DISTINCT prunedUrlData;
url_rank_dom = ORDER distinct_url BY lmd DESC,rc DESC;
url_domain_limit = LIMIT url_rank_dom *domainlimit*;
GENERATE FLATTEN(url_domain_limit);
};


The only problem that I have now is the domainlimit variable that I'm
passing to the LIMIT statement @ runtime. I'm getting the following
exception :

java.lang.RuntimeException: Unable to evaluate Limit expression: NULL
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLimit.getNext(POLimit.java:97)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:432)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:583)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PORelationToExprProject.getNext(PORelationToExprProject.java:107)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:465)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:433)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:413)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:257)
	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
	at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:610)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:416)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
	at org.apache.hadoop.mapred.Child.main(Child.java:262)

If I use a constant for the LIMIT, it works fine. I printed the
"group_by_Domain_Url" to see if i'm getting the domainlimit, and I'm
able to see a value.

But when i apply it to LIMIT, it says "Unable to evaluate Limit
expression: NULL". Where am I going wrong?



Regards,
Skanda

Re: Fwd: issue applying variable to LIMIT

Posted by Luis <bl...@gmail.com>.
Skanda <sk...@...> writes:

> 
> Hi All,
> 
> I have a use case to get the get N distinct url's based on the number of
> hits and their latest timestamp. Pls find below the snippet of the pig
> script that I have written to do this.
> 
> prunedUrlData = FOREACH urlPatternData GENERATE (url_pattern is
> null?url:url_pattern) AS
> 
url,domid,urlkey,urllen,puid,nwid,lmd,rc,punam,nwnam,ispub,com.xxx.GetD
omainStorageLimit(nwid)
> AS *domainlimit*;
> 
> group_by_Domain_Url = GROUP prunedUrlData BY domid;
> 
> rankedUrlByDomain = FOREACH group_by_Domain_Url
> {
> distinct_url = DISTINCT prunedUrlData;
> url_rank_dom = ORDER distinct_url BY lmd DESC,rc DESC;
> url_domain_limit = LIMIT url_rank_dom *domainlimit*;
> GENERATE FLATTEN(url_domain_limit);
> };
> 
> The only problem that I have now is the domainlimit variable that I'm
> passing to the LIMIT statement  <at>  runtime. I'm getting the following
> exception :
> 
> java.lang.RuntimeException: Unable to evaluate Limit expression: NULL
> 	at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalO
perators.POLimit.getNext(POLimit.java:97)
> 	at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOp
erator.getNext(PhysicalOperator.java:432)
> 	at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expression
Operators.POProject.processInputBag(POProject.java:583)
> 	at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expression
Operators.PORelationToExprProject.getNext(PORelationToExprProject.java
:107)
> 	at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOp
erator.getNext(PhysicalOperator.java:334)
> 	at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalO
perators.POForEach.processPlan(POForEach.java:372)
> 	at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalO
perators.POForEach.getNext(POForEach.java:297)
> 	at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGen
ericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:465)
> 	at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGen
ericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduc
e.java:433)
> 	at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGen
ericMapReduce$Reduce.reduce(PigGenericMapReduce.java:413)
> 	at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGen
ericMapReduce$Reduce.reduce(PigGenericMapReduce.java:257)
> 	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
> 	at 
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java
:610)
> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:416)
> 	at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma
tion.java:1438)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:262)
> 
> If I use a constant for the LIMIT, it works fine. I printed the
> "group_by_Domain_Url" to see if i'm getting the domainlimit, and I'm
> able to see a value.
> 
> But when i apply it to LIMIT, it says "Unable to evaluate Limit
> expression: NULL". Where am I going wrong?
> 
> Regards,
> Skanda
> 


I'm having the exact same issue.
Have you been able to solve it?
I'm using 0.11.

Please help.
Thanks.



Fwd: issue applying variable to LIMIT

Posted by Skanda <sk...@gmail.com>.
Hi All,

I have a use case to get the get N distinct url's based on the number of
hits and their latest timestamp. Pls find below the snippet of the pig
script that I have written to do this.

prunedUrlData = FOREACH urlPatternData GENERATE (url_pattern is
null?url:url_pattern) AS
url,domid,urlkey,urllen,puid,nwid,lmd,rc,punam,nwnam,ispub,com.xxx.GetDomainStorageLimit(nwid)
AS *domainlimit*;

group_by_Domain_Url = GROUP prunedUrlData BY domid;

rankedUrlByDomain = FOREACH group_by_Domain_Url
{
distinct_url = DISTINCT prunedUrlData;
url_rank_dom = ORDER distinct_url BY lmd DESC,rc DESC;
url_domain_limit = LIMIT url_rank_dom *domainlimit*;
GENERATE FLATTEN(url_domain_limit);
};


The only problem that I have now is the domainlimit variable that I'm
passing to the LIMIT statement @ runtime. I'm getting the following
exception :

java.lang.RuntimeException: Unable to evaluate Limit expression: NULL
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLimit.getNext(POLimit.java:97)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:432)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:583)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PORelationToExprProject.getNext(PORelationToExprProject.java:107)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:465)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:433)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:413)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:257)
	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
	at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:610)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:416)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
	at org.apache.hadoop.mapred.Child.main(Child.java:262)

If I use a constant for the LIMIT, it works fine. I printed the
"group_by_Domain_Url" to see if i'm getting the domainlimit, and I'm
able to see a value.

But when i apply it to LIMIT, it says "Unable to evaluate Limit
expression: NULL". Where am I going wrong?



Regards,
Skanda

Re: issue applying variable to LIMIT

Posted by Skanda <sk...@gmail.com>.
Hi,

I'm using pig 0.11.0 that comes with CDH 4.4.0.

Thanks,
Skanda


On Fri, May 9, 2014 at 10:44 AM, Skanda <sk...@gmail.com> wrote:

> Hi All,
>
> I have a use case to get the get N distinct url's based on the number of
> hits and their latest timestamp. Pls find below the snippet of the pig
> script that I have written to do this.
>
> prunedUrlData = FOREACH urlPatternData GENERATE (url_pattern is
> null?url:url_pattern) AS
> url,domid,urlkey,urllen,puid,nwid,lmd,rc,punam,nwnam,ispub,com.xxx.GetDomainStorageLimit(nwid)
> AS *domainlimit*;
>
> group_by_Domain_Url = GROUP prunedUrlData BY domid;
>
> rankedUrlByDomain = FOREACH group_by_Domain_Url
> {
> distinct_url = DISTINCT prunedUrlData;
> url_rank_dom = ORDER distinct_url BY lmd DESC,rc DESC;
> url_domain_limit = LIMIT url_rank_dom *domainlimit*;
> GENERATE FLATTEN(url_domain_limit);
> };
>
>
> The only problem that I have now is the domainlimit variable that I'm
> passing to the LIMIT statement @ runtime. I'm getting the following
> exception :
>
> java.lang.RuntimeException: Unable to evaluate Limit expression: NULL
> 	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLimit.getNext(POLimit.java:97)
> 	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:432)
> 	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:583)
> 	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PORelationToExprProject.getNext(PORelationToExprProject.java:107)
> 	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334)
> 	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372)
> 	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:465)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:433)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:413)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:257)
> 	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
> 	at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:610)
> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:416)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:262)
>
> If I use a constant for the LIMIT, it works fine. I printed the "group_by_Domain_Url" to see if i'm getting the domainlimit, and I'm able to see a value.
>
> But when i apply it to LIMIT, it says "Unable to evaluate Limit expression: NULL". Where am I going wrong?
>
>
>
> Regards,
> Skanda
>