You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Dipesh Kumar Singh <di...@gmail.com> on 2012/09/19 20:11:59 UTC

Two or more arguments in udf

I need to pass two or more arguments in my udf to process the data in those
arguments.

I am unable to map the input data schema and getting the
*ERROR 1045: Could not infer the matching function ........*

I have tried to play with outputSchema but going wrong somewhere.


Say, i have a dataset like this:

1,Ron,CA,2012-03-01,1990-01-04
2,John,NY,2012-05-11,1994-08-12

----Myscript.pig--

data = LOAD '/input.txt' using PigStorage(',') as (f1,f2,f3,f4,f5);
rel1 = foreach data generate f2, *MyUdf( f4 , f5 )*;
dump rel1;
--------------------------

=====MyUdf.java========

public class MyUdf extends EvalFunc<String> {
public String exec(Tuple input) throws IOException {
if (input == null || input.size() == 0)
return null;
try {
String str = (String)input.get(0);
String str1 = (String)input.get(1);
return str+str1;
}catch(Exception e){
return "Error in udf";}
}

@Override
public Schema outputSchema(Schema input) {
try
{
Schema tupleSchema = new Schema();
tupleSchema.add(input.getField(0));
tupleSchema.add(input.getField(1));
return new Schema(new
Schema.FieldSchema(getSchemaName(this.getClass().getName().toLowerCase(),
input), tupleSchema, DataType.TUPLE));
}
catch(Exception e)
{ return null; }
}

@Override
public List<FuncSpec> getArgToFuncMapping() throws FrontendException {
List<FuncSpec> funcList = new ArrayList<FuncSpec>();
funcList.add(new FuncSpec(this.getClass().getName(), new Schema( new
Schema.FieldSchema(null, DataType.CHARARRAY)) ) );
return funcList;
}
}


Thanks,
-- 
Dipesh Kr. Singh

Re: Two or more arguments in udf

Posted by Dipesh Kumar Singh <di...@gmail.com>.
Thanks Cheolsoo! It worked.
I understood the mistake now.

Thanks,
Dipesh.



On Thu, Sep 20, 2012 at 1:04 AM, Cheolsoo Park <ch...@cloudera.com>wrote:

> Hi,
>
> Please change your getArgToFuncMapping() as follows:
>
>     @Override
>     public List<FuncSpec> getArgToFuncMapping() throws FrontendException {
>         List<FuncSpec> funcList = new ArrayList<FuncSpec>();
>         Schema tupleSchema = new Schema();
>         tupleSchema.add(new Schema.FieldSchema(null, DataType.CHARARRAY));
>         tupleSchema.add(new Schema.FieldSchema(null, DataType.CHARARRAY));
>         funcList.add(new FuncSpec(this.getClass().getName(), tupleSchema )
> );
>         return funcList;
>     }
>
> You're passing two args, so you need two chararrays in the input schema.
>
> Thanks,
> Cheolsoo
>
> On Wed, Sep 19, 2012 at 12:09 PM, Arun Ahuja <aa...@gmail.com> wrote:
>
> > Just some obvious checks -
> >
> > I assume there is some register statement at the top of the script and
> > you have the proper package name in the function call
> > "org.apache..udfs.MyUdf" or use a DEFINE statement above?  What are
> > the asterisks for?
> >
> > On Wed, Sep 19, 2012 at 2:11 PM, Dipesh Kumar Singh
> > <di...@gmail.com> wrote:
> > > I need to pass two or more arguments in my udf to process the data in
> > those
> > > arguments.
> > >
> > > I am unable to map the input data schema and getting the
> > > *ERROR 1045: Could not infer the matching function ........*
> > >
> > > I have tried to play with outputSchema but going wrong somewhere.
> > >
> > >
> > > Say, i have a dataset like this:
> > >
> > > 1,Ron,CA,2012-03-01,1990-01-04
> > > 2,John,NY,2012-05-11,1994-08-12
> > >
> > > ----Myscript.pig--
> > >
> > > data = LOAD '/input.txt' using PigStorage(',') as (f1,f2,f3,f4,f5);
> > > rel1 = foreach data generate f2, *MyUdf( f4 , f5 )*;
> > > dump rel1;
> > > --------------------------
> > >
> > > =====MyUdf.java========
> > >
> > > public class MyUdf extends EvalFunc<String> {
> > > public String exec(Tuple input) throws IOException {
> > > if (input == null || input.size() == 0)
> > > return null;
> > > try {
> > > String str = (String)input.get(0);
> > > String str1 = (String)input.get(1);
> > > return str+str1;
> > > }catch(Exception e){
> > > return "Error in udf";}
> > > }
> > >
> > > @Override
> > > public Schema outputSchema(Schema input) {
> > > try
> > > {
> > > Schema tupleSchema = new Schema();
> > > tupleSchema.add(input.getField(0));
> > > tupleSchema.add(input.getField(1));
> > > return new Schema(new
> > >
> Schema.FieldSchema(getSchemaName(this.getClass().getName().toLowerCase(),
> > > input), tupleSchema, DataType.TUPLE));
> > > }
> > > catch(Exception e)
> > > { return null; }
> > > }
> > >
> > > @Override
> > > public List<FuncSpec> getArgToFuncMapping() throws FrontendException {
> > > List<FuncSpec> funcList = new ArrayList<FuncSpec>();
> > > funcList.add(new FuncSpec(this.getClass().getName(), new Schema( new
> > > Schema.FieldSchema(null, DataType.CHARARRAY)) ) );
> > > return funcList;
> > > }
> > > }
> > >
> > >
> > > Thanks,
> > > --
> > > Dipesh Kr. Singh
> >
>



-- 
Dipesh Kr. Singh

Re: Two or more arguments in udf

Posted by Cheolsoo Park <ch...@cloudera.com>.
Hi,

Please change your getArgToFuncMapping() as follows:

    @Override
    public List<FuncSpec> getArgToFuncMapping() throws FrontendException {
        List<FuncSpec> funcList = new ArrayList<FuncSpec>();
        Schema tupleSchema = new Schema();
        tupleSchema.add(new Schema.FieldSchema(null, DataType.CHARARRAY));
        tupleSchema.add(new Schema.FieldSchema(null, DataType.CHARARRAY));
        funcList.add(new FuncSpec(this.getClass().getName(), tupleSchema )
);
        return funcList;
    }

You're passing two args, so you need two chararrays in the input schema.

Thanks,
Cheolsoo

On Wed, Sep 19, 2012 at 12:09 PM, Arun Ahuja <aa...@gmail.com> wrote:

> Just some obvious checks -
>
> I assume there is some register statement at the top of the script and
> you have the proper package name in the function call
> "org.apache..udfs.MyUdf" or use a DEFINE statement above?  What are
> the asterisks for?
>
> On Wed, Sep 19, 2012 at 2:11 PM, Dipesh Kumar Singh
> <di...@gmail.com> wrote:
> > I need to pass two or more arguments in my udf to process the data in
> those
> > arguments.
> >
> > I am unable to map the input data schema and getting the
> > *ERROR 1045: Could not infer the matching function ........*
> >
> > I have tried to play with outputSchema but going wrong somewhere.
> >
> >
> > Say, i have a dataset like this:
> >
> > 1,Ron,CA,2012-03-01,1990-01-04
> > 2,John,NY,2012-05-11,1994-08-12
> >
> > ----Myscript.pig--
> >
> > data = LOAD '/input.txt' using PigStorage(',') as (f1,f2,f3,f4,f5);
> > rel1 = foreach data generate f2, *MyUdf( f4 , f5 )*;
> > dump rel1;
> > --------------------------
> >
> > =====MyUdf.java========
> >
> > public class MyUdf extends EvalFunc<String> {
> > public String exec(Tuple input) throws IOException {
> > if (input == null || input.size() == 0)
> > return null;
> > try {
> > String str = (String)input.get(0);
> > String str1 = (String)input.get(1);
> > return str+str1;
> > }catch(Exception e){
> > return "Error in udf";}
> > }
> >
> > @Override
> > public Schema outputSchema(Schema input) {
> > try
> > {
> > Schema tupleSchema = new Schema();
> > tupleSchema.add(input.getField(0));
> > tupleSchema.add(input.getField(1));
> > return new Schema(new
> > Schema.FieldSchema(getSchemaName(this.getClass().getName().toLowerCase(),
> > input), tupleSchema, DataType.TUPLE));
> > }
> > catch(Exception e)
> > { return null; }
> > }
> >
> > @Override
> > public List<FuncSpec> getArgToFuncMapping() throws FrontendException {
> > List<FuncSpec> funcList = new ArrayList<FuncSpec>();
> > funcList.add(new FuncSpec(this.getClass().getName(), new Schema( new
> > Schema.FieldSchema(null, DataType.CHARARRAY)) ) );
> > return funcList;
> > }
> > }
> >
> >
> > Thanks,
> > --
> > Dipesh Kr. Singh
>

Re: Two or more arguments in udf

Posted by Arun Ahuja <aa...@gmail.com>.
Just some obvious checks -

I assume there is some register statement at the top of the script and
you have the proper package name in the function call
"org.apache..udfs.MyUdf" or use a DEFINE statement above?  What are
the asterisks for?

On Wed, Sep 19, 2012 at 2:11 PM, Dipesh Kumar Singh
<di...@gmail.com> wrote:
> I need to pass two or more arguments in my udf to process the data in those
> arguments.
>
> I am unable to map the input data schema and getting the
> *ERROR 1045: Could not infer the matching function ........*
>
> I have tried to play with outputSchema but going wrong somewhere.
>
>
> Say, i have a dataset like this:
>
> 1,Ron,CA,2012-03-01,1990-01-04
> 2,John,NY,2012-05-11,1994-08-12
>
> ----Myscript.pig--
>
> data = LOAD '/input.txt' using PigStorage(',') as (f1,f2,f3,f4,f5);
> rel1 = foreach data generate f2, *MyUdf( f4 , f5 )*;
> dump rel1;
> --------------------------
>
> =====MyUdf.java========
>
> public class MyUdf extends EvalFunc<String> {
> public String exec(Tuple input) throws IOException {
> if (input == null || input.size() == 0)
> return null;
> try {
> String str = (String)input.get(0);
> String str1 = (String)input.get(1);
> return str+str1;
> }catch(Exception e){
> return "Error in udf";}
> }
>
> @Override
> public Schema outputSchema(Schema input) {
> try
> {
> Schema tupleSchema = new Schema();
> tupleSchema.add(input.getField(0));
> tupleSchema.add(input.getField(1));
> return new Schema(new
> Schema.FieldSchema(getSchemaName(this.getClass().getName().toLowerCase(),
> input), tupleSchema, DataType.TUPLE));
> }
> catch(Exception e)
> { return null; }
> }
>
> @Override
> public List<FuncSpec> getArgToFuncMapping() throws FrontendException {
> List<FuncSpec> funcList = new ArrayList<FuncSpec>();
> funcList.add(new FuncSpec(this.getClass().getName(), new Schema( new
> Schema.FieldSchema(null, DataType.CHARARRAY)) ) );
> return funcList;
> }
> }
>
>
> Thanks,
> --
> Dipesh Kr. Singh