You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by DIPESH KUMAR SINGH <di...@gmail.com> on 2012/05/09 21:00:44 UTC

Many to One UDF Problem

(Yet another basic udf question)

I want my udf to take values of all the columns in a row.

For example: If there are 3 records in my input file. (Tab delimited row)

John   12
Jeff     33
Chin    20

Currently my UDF could only take one, (I don't know how to do more than
one):

*register 'dudf.jar';**
**player = load '/pig_data/dxmlsample1.txt' as (name:chararray,
id:chararray);*
*-- As i have only passed name here, I want whole row to be passed, i.e.
name and id. (here)**
**unintended = foreach player generate name, id, Dudf_try(name);**
**dump unintended;*

My UDF code is:

*import java.io.IOException;**
**import java.util.List;**
**import java.util.ArrayList;**
**
**import org.apache.pig.EvalFunc;**
**import org.apache.pig.FuncSpec;**
**import org.apache.pig.data.Tuple;**
**import org.apache.pig.data.DataType;**
**import org.apache.pig.impl.logicalLayer.schema.Schema;**
**import org.apache.pig.impl.logicalLayer.FrontendException;**
**
**public class Dudf_try extends EvalFunc<String> {**
** public String exec(Tuple input) throws IOException {**
** if(input == null || input.size() == 0)**
** return null;**
** try{**
** String query = (String)input.get(0);**
**  //String query1 = (String)input.get(1);**
** **
** // Some more transformation here , but ultimate Output is String**
** **
** return query+"<>"+query1;**
** }catch(Exception e){**
** System.err.println("failed to process input; error - " + e.getMessage());
**
** return null;**
** }**
** }**
**
** @Override**
** public Schema outputSchema(Schema input) {**
** return new Schema(new
Schema.FieldSchema(getSchemaName(this.getClass().getName().toLowerCase(),
input), DataType.CHARARRAY));**
** }**
**
** @Override**
** public List<FuncSpec> getArgToFuncMapping() throws FrontendException {**
** List<FuncSpec> funcList = new ArrayList<FuncSpec>();**
** funcList.add(new FuncSpec(this.getClass().getName(), new Schema(new
Schema.FieldSchema(null, DataType.CHARARRAY))));**
**
** return funcList;**
** }**
**
**}*
*
*

I need some suggestion here on how to proceed with the intended.

Thanks!
Dipesh

-- 
Dipesh Kr. Singh

Re: Many to One UDF Problem

Posted by DIPESH KUMAR SINGH <di...@gmail.com>.
Thanks Prashant,  Russel.

I was PigStorage while loading tab delimited file. Now its running fine.

Regards,
Dipesh
On May 10, 2012 11:00 AM, "Prashant Kommireddi" <pr...@gmail.com> wrote:

> I messed up, your original UDF does not need to be changed.
>
> Just pass in all fields (*) as I suggested in my previous email, and access
> them the way you were doing it before:
> String query = (String)input.get(0);
> String query1 = (String)input.get(1);
>
> That should work.
>
> -Prashant
>
>
> On Wed, May 9, 2012 at 7:00 PM, DIPESH KUMAR SINGH <dipesh.tech@gmail.com
> >wrote:
>
> > MapReduce job runs now, but string output of UDF is not coming. It shows
> > something
> > like this:
> >
> > (Jeff,13,)
> > (John,12,)
> >
> > May be something needs to be changed in output schema, i was passing
> > earlier:
> >
> > @Override
> > public Schema outputSchema(Schema input) {
> > return new Schema(new
> > Schema.FieldSchema(getSchemaName(this.getClass().getName().toLowerCase(),
> > input), DataType.CHARARRAY));
> > }
> >
> > Thanks,
> > Dipesh
> >
> > On Thu, May 10, 2012 at 7:09 AM, Prashant Kommireddi <
> prash1784@gmail.com
> > >wrote:
> >
> > > public List<FuncSpec> getArgToFuncMapping() throws FrontendException
> > > needs to be modified accordingly, since you are now passing your UDF
> > > the entire tuple. You don't really need to implement it if there is no
> > > overloaded function.
> > >
> > > Sent from my iPhone
> > >
> > > On May 9, 2012, at 6:19 PM, DIPESH KUMAR SINGH <di...@gmail.com>
> > > wrote:
> > >
> > > > public List<FuncSpec> getArgToFuncMapping() throws FrontendException
> > >
> >
> >
> >
> > --
> > Dipesh Kr. Singh
> >
>

Re: Many to One UDF Problem

Posted by Prashant Kommireddi <pr...@gmail.com>.
I messed up, your original UDF does not need to be changed.

Just pass in all fields (*) as I suggested in my previous email, and access
them the way you were doing it before:
String query = (String)input.get(0);
String query1 = (String)input.get(1);

That should work.

-Prashant


On Wed, May 9, 2012 at 7:00 PM, DIPESH KUMAR SINGH <di...@gmail.com>wrote:

> MapReduce job runs now, but string output of UDF is not coming. It shows
> something
> like this:
>
> (Jeff,13,)
> (John,12,)
>
> May be something needs to be changed in output schema, i was passing
> earlier:
>
> @Override
> public Schema outputSchema(Schema input) {
> return new Schema(new
> Schema.FieldSchema(getSchemaName(this.getClass().getName().toLowerCase(),
> input), DataType.CHARARRAY));
> }
>
> Thanks,
> Dipesh
>
> On Thu, May 10, 2012 at 7:09 AM, Prashant Kommireddi <prash1784@gmail.com
> >wrote:
>
> > public List<FuncSpec> getArgToFuncMapping() throws FrontendException
> > needs to be modified accordingly, since you are now passing your UDF
> > the entire tuple. You don't really need to implement it if there is no
> > overloaded function.
> >
> > Sent from my iPhone
> >
> > On May 9, 2012, at 6:19 PM, DIPESH KUMAR SINGH <di...@gmail.com>
> > wrote:
> >
> > > public List<FuncSpec> getArgToFuncMapping() throws FrontendException
> >
>
>
>
> --
> Dipesh Kr. Singh
>

Re: Many to One UDF Problem

Posted by Russell Jurney <ru...@gmail.com>.
It seems you want to group your data, then feed this group into the
UDF for processing. Look at SUM and AVG, I think, for examples?

Russell Jurney http://datasyndrome.com

On May 9, 2012, at 7:01 PM, DIPESH KUMAR SINGH <di...@gmail.com> wrote:

> MapReduce job runs now, but string output of UDF is not coming. It shows
> something
> like this:
>
> (Jeff,13,)
> (John,12,)
>
> May be something needs to be changed in output schema, i was passing
> earlier:
>
> @Override
> public Schema outputSchema(Schema input) {
> return new Schema(new
> Schema.FieldSchema(getSchemaName(this.getClass().getName().toLowerCase(),
> input), DataType.CHARARRAY));
> }
>
> Thanks,
> Dipesh
>
> On Thu, May 10, 2012 at 7:09 AM, Prashant Kommireddi <pr...@gmail.com>wrote:
>
>> public List<FuncSpec> getArgToFuncMapping() throws FrontendException
>> needs to be modified accordingly, since you are now passing your UDF
>> the entire tuple. You don't really need to implement it if there is no
>> overloaded function.
>>
>> Sent from my iPhone
>>
>> On May 9, 2012, at 6:19 PM, DIPESH KUMAR SINGH <di...@gmail.com>
>> wrote:
>>
>>> public List<FuncSpec> getArgToFuncMapping() throws FrontendException
>>
>
>
>
> --
> Dipesh Kr. Singh

Re: Many to One UDF Problem

Posted by DIPESH KUMAR SINGH <di...@gmail.com>.
MapReduce job runs now, but string output of UDF is not coming. It shows
something
like this:

(Jeff,13,)
(John,12,)

May be something needs to be changed in output schema, i was passing
earlier:

@Override
public Schema outputSchema(Schema input) {
return new Schema(new
Schema.FieldSchema(getSchemaName(this.getClass().getName().toLowerCase(),
input), DataType.CHARARRAY));
}

Thanks,
Dipesh

On Thu, May 10, 2012 at 7:09 AM, Prashant Kommireddi <pr...@gmail.com>wrote:

> public List<FuncSpec> getArgToFuncMapping() throws FrontendException
> needs to be modified accordingly, since you are now passing your UDF
> the entire tuple. You don't really need to implement it if there is no
> overloaded function.
>
> Sent from my iPhone
>
> On May 9, 2012, at 6:19 PM, DIPESH KUMAR SINGH <di...@gmail.com>
> wrote:
>
> > public List<FuncSpec> getArgToFuncMapping() throws FrontendException
>



-- 
Dipesh Kr. Singh

Re: Many to One UDF Problem

Posted by Prashant Kommireddi <pr...@gmail.com>.
public List<FuncSpec> getArgToFuncMapping() throws FrontendException
needs to be modified accordingly, since you are now passing your UDF
the entire tuple. You don't really need to implement it if there is no
overloaded function.

Sent from my iPhone

On May 9, 2012, at 6:19 PM, DIPESH KUMAR SINGH <di...@gmail.com> wrote:

> public List<FuncSpec> getArgToFuncMapping() throws FrontendException

Re: Many to One UDF Problem

Posted by DIPESH KUMAR SINGH <di...@gmail.com>.
Prashant,

I followed as directed by you but i am getting the following error:

2012-05-10 06:40:38,097 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1045: Could not infer the matching function for Dudf_try as multiple
or none of them fit. Please use an explicit cast.

Detailed Error Log and code is is attached.

Thanks,
Dipesh

On Thu, May 10, 2012 at 12:39 AM, Prashant Kommireddi
<pr...@gmail.com>wrote:

> Dipesh,
>
> You can pass in the entire tuple (row) to the UDF.
>
> unintended = foreach player generate name, id, Dudf_try(*);
>
> And the UDF now will be able to use the entire row :
>
> Tuple tuple = (Tuple)input.get(0);
>
> To process individual fields, you can iterate or positionally access the
> above tuple.
>
> String name = tuple.get(0).toString();
> String id = tuple.get(1).toString();
>
> -Prashant
>
>
> On Wed, May 9, 2012 at 12:00 PM, DIPESH KUMAR SINGH
> <di...@gmail.com>wrote:
>
> > (Yet another basic udf question)
> >
> > I want my udf to take values of all the columns in a row.
> >
> > For example: If there are 3 records in my input file. (Tab delimited row)
> >
> > John   12
> > Jeff     33
> > Chin    20
> >
> > Currently my UDF could only take one, (I don't know how to do more than
> > one):
> >
> > *register 'dudf.jar';**
> > **player = load '/pig_data/dxmlsample1.txt' as (name:chararray,
> > id:chararray);*
> > *-- As i have only passed name here, I want whole row to be passed, i.e.
> > name and id. (here)**
> > **unintended = foreach player generate name, id, Dudf_try(name);**
> > **dump unintended;*
> >
> > My UDF code is:
> >
> > *import java.io.IOException;**
> > **import java.util.List;**
> > **import java.util.ArrayList;**
> > **
> > **import org.apache.pig.EvalFunc;**
> > **import org.apache.pig.FuncSpec;**
> > **import org.apache.pig.data.Tuple;**
> > **import org.apache.pig.data.DataType;**
> > **import org.apache.pig.impl.logicalLayer.schema.Schema;**
> > **import org.apache.pig.impl.logicalLayer.FrontendException;**
> > **
> > **public class Dudf_try extends EvalFunc<String> {**
> > ** public String exec(Tuple input) throws IOException {**
> > ** if(input == null || input.size() == 0)**
> > ** return null;**
> > ** try{**
> > ** String query = (String)input.get(0);**
> > **  //String query1 = (String)input.get(1);**
> > ** **
> > ** // Some more transformation here , but ultimate Output is String**
> > ** **
> > ** return query+"<>"+query1;**
> > ** }catch(Exception e){**
> > ** System.err.println("failed to process input; error - " +
> > e.getMessage());
> > **
> > ** return null;**
> > ** }**
> > ** }**
> > **
> > ** @Override**
> > ** public Schema outputSchema(Schema input) {**
> > ** return new Schema(new
> > Schema.FieldSchema(getSchemaName(this.getClass().getName().toLowerCase(),
> > input), DataType.CHARARRAY));**
> > ** }**
> > **
> > ** @Override**
> > ** public List<FuncSpec> getArgToFuncMapping() throws FrontendException
> {**
> > ** List<FuncSpec> funcList = new ArrayList<FuncSpec>();**
> > ** funcList.add(new FuncSpec(this.getClass().getName(), new Schema(new
> > Schema.FieldSchema(null, DataType.CHARARRAY))));**
> > **
> > ** return funcList;**
> > ** }**
> > **
> > **}*
> > *
> > *
> >
> > I need some suggestion here on how to proceed with the intended.
> >
> > Thanks!
> > Dipesh
> >
> > --
> > Dipesh Kr. Singh
> >
>



-- 
Dipesh Kr. Singh

Re: Many to One UDF Problem

Posted by Prashant Kommireddi <pr...@gmail.com>.
Dipesh,

You can pass in the entire tuple (row) to the UDF.

unintended = foreach player generate name, id, Dudf_try(*);

And the UDF now will be able to use the entire row :

Tuple tuple = (Tuple)input.get(0);

To process individual fields, you can iterate or positionally access the
above tuple.

String name = tuple.get(0).toString();
String id = tuple.get(1).toString();

-Prashant


On Wed, May 9, 2012 at 12:00 PM, DIPESH KUMAR SINGH
<di...@gmail.com>wrote:

> (Yet another basic udf question)
>
> I want my udf to take values of all the columns in a row.
>
> For example: If there are 3 records in my input file. (Tab delimited row)
>
> John   12
> Jeff     33
> Chin    20
>
> Currently my UDF could only take one, (I don't know how to do more than
> one):
>
> *register 'dudf.jar';**
> **player = load '/pig_data/dxmlsample1.txt' as (name:chararray,
> id:chararray);*
> *-- As i have only passed name here, I want whole row to be passed, i.e.
> name and id. (here)**
> **unintended = foreach player generate name, id, Dudf_try(name);**
> **dump unintended;*
>
> My UDF code is:
>
> *import java.io.IOException;**
> **import java.util.List;**
> **import java.util.ArrayList;**
> **
> **import org.apache.pig.EvalFunc;**
> **import org.apache.pig.FuncSpec;**
> **import org.apache.pig.data.Tuple;**
> **import org.apache.pig.data.DataType;**
> **import org.apache.pig.impl.logicalLayer.schema.Schema;**
> **import org.apache.pig.impl.logicalLayer.FrontendException;**
> **
> **public class Dudf_try extends EvalFunc<String> {**
> ** public String exec(Tuple input) throws IOException {**
> ** if(input == null || input.size() == 0)**
> ** return null;**
> ** try{**
> ** String query = (String)input.get(0);**
> **  //String query1 = (String)input.get(1);**
> ** **
> ** // Some more transformation here , but ultimate Output is String**
> ** **
> ** return query+"<>"+query1;**
> ** }catch(Exception e){**
> ** System.err.println("failed to process input; error - " +
> e.getMessage());
> **
> ** return null;**
> ** }**
> ** }**
> **
> ** @Override**
> ** public Schema outputSchema(Schema input) {**
> ** return new Schema(new
> Schema.FieldSchema(getSchemaName(this.getClass().getName().toLowerCase(),
> input), DataType.CHARARRAY));**
> ** }**
> **
> ** @Override**
> ** public List<FuncSpec> getArgToFuncMapping() throws FrontendException {**
> ** List<FuncSpec> funcList = new ArrayList<FuncSpec>();**
> ** funcList.add(new FuncSpec(this.getClass().getName(), new Schema(new
> Schema.FieldSchema(null, DataType.CHARARRAY))));**
> **
> ** return funcList;**
> ** }**
> **
> **}*
> *
> *
>
> I need some suggestion here on how to proceed with the intended.
>
> Thanks!
> Dipesh
>
> --
> Dipesh Kr. Singh
>