You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Sean Timm <se...@corp.aol.com> on 2009/04/20 23:27:46 UTC

UDF with parameterized constructor

PIG-546 indicates that it is now possible to pass arguments into a 
custom UDF filter function via a parameterized constructor.  I'm using a 
TRUNK build from April 1 (svn rev. 761067) which appears to have the 
patch applied, but I'm getting the same errors that the patch 
describes.  Should this work?  Is there a better way to pass 
parameters/configuration into a UDF filter function?

The parameterized constructor is called 3 times, followed by the default 
constructor being called 4 times.

On the Hadoop backend:

2009-04-20 17:11:29,935 ERROR com.aol.search.pig.udf.ValidateQuery: default constructor
2009-04-20 17:11:30,034 WARN org.apache.hadoop.mapred.TaskTracker: Error running child
java.lang.IllegalArgumentException: Can not create a Path from a null string

	at org.apache.hadoop.fs.Path.checkPathArg(Path.java:78)
	at org.apache.hadoop.fs.Path.<init>(Path.java:90)
	at com.aol.search.pig.udf.ValidateQuery.loadList(ValidateQuery.java:74)
	at com.aol.search.pig.udf.ValidateQuery.init(ValidateQuery.java:66)
	at com.aol.search.pig.udf.ValidateQuery.exec(ValidateQuery.java:91)
	at com.aol.search.pig.udf.ValidateQuery.exec(ValidateQuery.java:35)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:201)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:251)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:217)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)

Thanks,
Sean


Re: UDF with parameterized constructor

Posted by Sean Timm <ti...@aol.com>.
The logic inside of my exec() function is different than that of 
FILTERFROMFILE.java, but the rest of my class differs very little except 
for the fact that I have two parameters.  The other difference and what 
causes my FilterFunc implementation to fail is the Override of 
getArgToFuncMapping().  I don't really need that, so I've commented it 
out and everything works fine now.  I'm not sure why the Override was a 
problem however.

  /* (non-Javadoc)
   * @see org.apache.pig.EvalFunc#getArgToFuncMapping()
   * This is needed to make sure that both bytearrays and chararrays can 
be passed as arguments
   */
  @Override
  public List<FuncSpec> getArgToFuncMapping() throws FrontendException {
      List<FuncSpec> funcList = new ArrayList<FuncSpec>();
      funcList.add(new FuncSpec(this.getClass().getName(), new 
Schema(new Schema.FieldSchema(null, DataType.CHARARRAY))));

      return funcList;
  }

-Sean

Alan Gates wrote:
> Can you include the load function from your script to show how you're 
> using it?  One issue is that you cannot define constructor arguments 
> for your load function in DEFINE, you have to do it in LOAD, USING 
> X(args go here).  Also, the load function is called on the user's box 
> with arguments passed to it in the USING clause.  It is then 
> serialized and passed to the hadoop machines, where it is 
> deserialized.  At this point the default constructor is called 
> (because that's how Java deserializes objects).  So if those 
> constructor arguments are needed on the backend they need to be cached 
> when the function is constructed on the front end.  So you may need to 
> add logic to explicitly store the filename so it's available at run time.
>
> Alan.
>
> On Apr 20, 2009, at 2:27 PM, Sean Timm wrote:
>
>> PIG-546 indicates that it is now possible to pass arguments into a 
>> custom UDF filter function via a parameterized constructor.  I'm 
>> using a TRUNK build from April 1 (svn rev. 761067) which appears to 
>> have the patch applied, but I'm getting the same errors that the 
>> patch describes.  Should this work?  Is there a better way to pass 
>> parameters/configuration into a UDF filter function?
>>
>> The parameterized constructor is called 3 times, followed by the 
>> default constructor being called 4 times.
>>
>> On the Hadoop backend:
>>
>> 2009-04-20 17:11:29,935 ERROR com.aol.search.pig.udf.ValidateQuery: 
>> default constructor
>> 2009-04-20 17:11:30,034 WARN org.apache.hadoop.mapred.TaskTracker: 
>> Error running child
>> java.lang.IllegalArgumentException: Can not create a Path from a null 
>> string
>>
>>   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:78)
>>   at org.apache.hadoop.fs.Path.<init>(Path.java:90)
>>   at 
>> com.aol.search.pig.udf.ValidateQuery.loadList(ValidateQuery.java:74)
>>   at com.aol.search.pig.udf.ValidateQuery.init(ValidateQuery.java:66)
>>   at com.aol.search.pig.udf.ValidateQuery.exec(ValidateQuery.java:91)
>>   at com.aol.search.pig.udf.ValidateQuery.exec(ValidateQuery.java:35)
>>   at 
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:201) 
>>
>>   at 
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:251) 
>>
>>   at 
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148) 
>>
>>   at 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:217) 
>>
>>   at 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208) 
>>
>>   at 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65) 
>>
>>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
>>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
>>   at 
>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
>>
>> Thanks,
>> Sean
>>
>

Re: UDF with parameterized constructor

Posted by Alan Gates <ga...@yahoo-inc.com>.
Can you include the load function from your script to show how you're  
using it?  One issue is that you cannot define constructor arguments  
for your load function in DEFINE, you have to do it in LOAD, USING  
X(args go here).  Also, the load function is called on the user's box  
with arguments passed to it in the USING clause.  It is then  
serialized and passed to the hadoop machines, where it is  
deserialized.  At this point the default constructor is called  
(because that's how Java deserializes objects).  So if those  
constructor arguments are needed on the backend they need to be cached  
when the function is constructed on the front end.  So you may need to  
add logic to explicitly store the filename so it's available at run  
time.

Alan.

On Apr 20, 2009, at 2:27 PM, Sean Timm wrote:

> PIG-546 indicates that it is now possible to pass arguments into a  
> custom UDF filter function via a parameterized constructor.  I'm  
> using a TRUNK build from April 1 (svn rev. 761067) which appears to  
> have the patch applied, but I'm getting the same errors that the  
> patch describes.  Should this work?  Is there a better way to pass  
> parameters/configuration into a UDF filter function?
>
> The parameterized constructor is called 3 times, followed by the  
> default constructor being called 4 times.
>
> On the Hadoop backend:
>
> 2009-04-20 17:11:29,935 ERROR com.aol.search.pig.udf.ValidateQuery:  
> default constructor
> 2009-04-20 17:11:30,034 WARN org.apache.hadoop.mapred.TaskTracker:  
> Error running child
> java.lang.IllegalArgumentException: Can not create a Path from a  
> null string
>
> 	at org.apache.hadoop.fs.Path.checkPathArg(Path.java:78)
> 	at org.apache.hadoop.fs.Path.<init>(Path.java:90)
> 	at com.aol.search.pig.udf.ValidateQuery.loadList(ValidateQuery.java: 
> 74)
> 	at com.aol.search.pig.udf.ValidateQuery.init(ValidateQuery.java:66)
> 	at com.aol.search.pig.udf.ValidateQuery.exec(ValidateQuery.java:91)
> 	at com.aol.search.pig.udf.ValidateQuery.exec(ValidateQuery.java:35)
> 	at  
> org 
> .apache 
> .pig 
> .backend 
> .hadoop 
> .executionengine 
> .physicalLayer 
> .expressionOperators.POUserFunc.getNext(POUserFunc.java:201)
> 	at  
> org 
> .apache 
> .pig 
> .backend 
> .hadoop 
> .executionengine 
> .physicalLayer 
> .expressionOperators.POUserFunc.getNext(POUserFunc.java:251)
> 	at  
> org 
> .apache 
> .pig 
> .backend 
> .hadoop 
> .executionengine 
> .physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
> 	at  
> org 
> .apache 
> .pig 
> .backend 
> .hadoop 
> .executionengine 
> .mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:217)
> 	at  
> org 
> .apache 
> .pig 
> .backend 
> .hadoop 
> .executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208)
> 	at  
> org 
> .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly 
> $Map.map(PigMapOnly.java:65)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java: 
> 2207)
>
> Thanks,
> Sean
>