You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Sean Timm <se...@corp.aol.com> on 2009/04/20 23:27:46 UTC
UDF with parameterized constructor
PIG-546 indicates that it is now possible to pass arguments into a
custom UDF filter function via a parameterized constructor. I'm using a
TRUNK build from April 1 (svn rev. 761067) which appears to have the
patch applied, but I'm getting the same errors that the patch
describes. Should this work? Is there a better way to pass
parameters/configuration into a UDF filter function?
The parameterized constructor is called 3 times, followed by the default
constructor being called 4 times.
On the Hadoop backend:
2009-04-20 17:11:29,935 ERROR com.aol.search.pig.udf.ValidateQuery: default constructor
2009-04-20 17:11:30,034 WARN org.apache.hadoop.mapred.TaskTracker: Error running child
java.lang.IllegalArgumentException: Can not create a Path from a null string
at org.apache.hadoop.fs.Path.checkPathArg(Path.java:78)
at org.apache.hadoop.fs.Path.<init>(Path.java:90)
at com.aol.search.pig.udf.ValidateQuery.loadList(ValidateQuery.java:74)
at com.aol.search.pig.udf.ValidateQuery.init(ValidateQuery.java:66)
at com.aol.search.pig.udf.ValidateQuery.exec(ValidateQuery.java:91)
at com.aol.search.pig.udf.ValidateQuery.exec(ValidateQuery.java:35)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:201)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:251)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:217)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
Thanks,
Sean
Re: UDF with parameterized constructor
Posted by Sean Timm <ti...@aol.com>.
The logic inside of my exec() function is different than that of
FILTERFROMFILE.java, but the rest of my class differs very little except
for the fact that I have two parameters. The other difference and what
causes my FilterFunc implementation to fail is the Override of
getArgToFuncMapping(). I don't really need that, so I've commented it
out and everything works fine now. I'm not sure why the Override was a
problem however.
/* (non-Javadoc)
* @see org.apache.pig.EvalFunc#getArgToFuncMapping()
* This is needed to make sure that both bytearrays and chararrays can
be passed as arguments
*/
@Override
public List<FuncSpec> getArgToFuncMapping() throws FrontendException {
List<FuncSpec> funcList = new ArrayList<FuncSpec>();
funcList.add(new FuncSpec(this.getClass().getName(), new
Schema(new Schema.FieldSchema(null, DataType.CHARARRAY))));
return funcList;
}
-Sean
Alan Gates wrote:
> Can you include the load function from your script to show how you're
> using it? One issue is that you cannot define constructor arguments
> for your load function in DEFINE, you have to do it in LOAD, USING
> X(args go here). Also, the load function is called on the user's box
> with arguments passed to it in the USING clause. It is then
> serialized and passed to the hadoop machines, where it is
> deserialized. At this point the default constructor is called
> (because that's how Java deserializes objects). So if those
> constructor arguments are needed on the backend they need to be cached
> when the function is constructed on the front end. So you may need to
> add logic to explicitly store the filename so it's available at run time.
>
> Alan.
>
> On Apr 20, 2009, at 2:27 PM, Sean Timm wrote:
>
>> PIG-546 indicates that it is now possible to pass arguments into a
>> custom UDF filter function via a parameterized constructor. I'm
>> using a TRUNK build from April 1 (svn rev. 761067) which appears to
>> have the patch applied, but I'm getting the same errors that the
>> patch describes. Should this work? Is there a better way to pass
>> parameters/configuration into a UDF filter function?
>>
>> The parameterized constructor is called 3 times, followed by the
>> default constructor being called 4 times.
>>
>> On the Hadoop backend:
>>
>> 2009-04-20 17:11:29,935 ERROR com.aol.search.pig.udf.ValidateQuery:
>> default constructor
>> 2009-04-20 17:11:30,034 WARN org.apache.hadoop.mapred.TaskTracker:
>> Error running child
>> java.lang.IllegalArgumentException: Can not create a Path from a null
>> string
>>
>> at org.apache.hadoop.fs.Path.checkPathArg(Path.java:78)
>> at org.apache.hadoop.fs.Path.<init>(Path.java:90)
>> at
>> com.aol.search.pig.udf.ValidateQuery.loadList(ValidateQuery.java:74)
>> at com.aol.search.pig.udf.ValidateQuery.init(ValidateQuery.java:66)
>> at com.aol.search.pig.udf.ValidateQuery.exec(ValidateQuery.java:91)
>> at com.aol.search.pig.udf.ValidateQuery.exec(ValidateQuery.java:35)
>> at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:201)
>>
>> at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:251)
>>
>> at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
>>
>> at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:217)
>>
>> at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208)
>>
>> at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
>>
>> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
>> at
>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
>>
>> Thanks,
>> Sean
>>
>
Re: UDF with parameterized constructor
Posted by Alan Gates <ga...@yahoo-inc.com>.
Can you include the load function from your script to show how you're
using it? One issue is that you cannot define constructor arguments
for your load function in DEFINE, you have to do it in LOAD, USING
X(args go here). Also, the load function is called on the user's box
with arguments passed to it in the USING clause. It is then
serialized and passed to the hadoop machines, where it is
deserialized. At this point the default constructor is called
(because that's how Java deserializes objects). So if those
constructor arguments are needed on the backend they need to be cached
when the function is constructed on the front end. So you may need to
add logic to explicitly store the filename so it's available at run
time.
Alan.
On Apr 20, 2009, at 2:27 PM, Sean Timm wrote:
> PIG-546 indicates that it is now possible to pass arguments into a
> custom UDF filter function via a parameterized constructor. I'm
> using a TRUNK build from April 1 (svn rev. 761067) which appears to
> have the patch applied, but I'm getting the same errors that the
> patch describes. Should this work? Is there a better way to pass
> parameters/configuration into a UDF filter function?
>
> The parameterized constructor is called 3 times, followed by the
> default constructor being called 4 times.
>
> On the Hadoop backend:
>
> 2009-04-20 17:11:29,935 ERROR com.aol.search.pig.udf.ValidateQuery:
> default constructor
> 2009-04-20 17:11:30,034 WARN org.apache.hadoop.mapred.TaskTracker:
> Error running child
> java.lang.IllegalArgumentException: Can not create a Path from a
> null string
>
> at org.apache.hadoop.fs.Path.checkPathArg(Path.java:78)
> at org.apache.hadoop.fs.Path.<init>(Path.java:90)
> at com.aol.search.pig.udf.ValidateQuery.loadList(ValidateQuery.java:
> 74)
> at com.aol.search.pig.udf.ValidateQuery.init(ValidateQuery.java:66)
> at com.aol.search.pig.udf.ValidateQuery.exec(ValidateQuery.java:91)
> at com.aol.search.pig.udf.ValidateQuery.exec(ValidateQuery.java:35)
> at
> org
> .apache
> .pig
> .backend
> .hadoop
> .executionengine
> .physicalLayer
> .expressionOperators.POUserFunc.getNext(POUserFunc.java:201)
> at
> org
> .apache
> .pig
> .backend
> .hadoop
> .executionengine
> .physicalLayer
> .expressionOperators.POUserFunc.getNext(POUserFunc.java:251)
> at
> org
> .apache
> .pig
> .backend
> .hadoop
> .executionengine
> .physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
> at
> org
> .apache
> .pig
> .backend
> .hadoop
> .executionengine
> .mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:217)
> at
> org
> .apache
> .pig
> .backend
> .hadoop
> .executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208)
> at
> org
> .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly
> $Map.map(PigMapOnly.java:65)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:
> 2207)
>
> Thanks,
> Sean
>