You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Alexander Schätzle <al...@yahoo.com> on 2010/05/22 18:10:31 UTC

Aliases in EvalFunction

Hi all,

in the exec(Tuple input) method of an EvalFunc I get the tuple to be processed by the Eval Function.
Is there any secure possibility to get the Aliases of the fields in the input Tuple?
My Eval Function needs to know the names of the Field-Aliases in order to know how to evaluate the tuple.

My current solution is to save the Schema which I get in the outputSchema(Schema input) method in a static instance variable of the Eval Function Class so that I can access the Schema-Information in the exec-method:

private static Schema inputSchema;
public Schema outputSchema(Schema input) {
        inputSchema = input;
        return input;
}

public Tuple exec(Tuple input) throws IOException {
        Set<String> aliases = inputSchema.getAliases();
        ...
        return input;
}

Local tests worked but I'm not sure if it is guaranteed to work?

Example:

Describe A;
A: {varName1: chararray, varName2: chararray}

B = Foreach A Generate MyUDF(*);


In MyUDF I have to know the name of the first and second field (varName1 and varName2) because the processing is dependant on the varNames.
Any suggestions?

Thx in advance,
Alex


AW: Aliases in EvalFunction

Posted by Alexander Schätzle <al...@yahoo.com>.
I'm using the CDH3 Beta Version of the Cloudera Distribution of Hadoop. This version includes Pig 0.5.0.
Is there no possibility in Pig 0.5.0 to get the aliases of the tuple fields in an evaluation function? In my opinion this is a very important thing!

Alex



________________________________
Von: Alan Gates <ga...@yahoo-inc.com>
An: pig-user@hadoop.apache.org
Gesendet: Montag, den 24. Mai 2010, 18:26:22 Uhr
Betreff: Re: Aliases in EvalFunction

UDFs are not serialized from the frontend to the backend, so simply saving the schema into a variable will not work.

What version of Pig are you using?  In 0.6 and later the UDFContext class exists to allow UDFs to carry information like this from the frontend to the backend.

Alan.

On May 22, 2010, at 9:10 AM, Alexander Schätzle wrote:

> Hi all,
> 
> in the exec(Tuple input) method of an EvalFunc I get the tuple to be processed by the Eval Function.
> Is there any secure possibility to get the Aliases of the fields in the input Tuple?
> My Eval Function needs to know the names of the Field-Aliases in order to know how to evaluate the tuple.
> 
> My current solution is to save the Schema which I get in the outputSchema(Schema input) method in a static instance variable of the Eval Function Class so that I can access the Schema-Information in the exec-method:
> 
> private static Schema inputSchema;
> public Schema outputSchema(Schema input) {
>        inputSchema = input;
>        return input;
> }
> 
> public Tuple exec(Tuple input) throws IOException {
>        Set<String> aliases = inputSchema.getAliases();
>        ...
>        return input;
> }
> 
> Local tests worked but I'm not sure if it is guaranteed to work?
> 
> Example:
> 
> Describe A;
> A: {varName1: chararray, varName2: chararray}
> 
> B = Foreach A Generate MyUDF(*);
> 
> 
> In MyUDF I have to know the name of the first and second field (varName1 and varName2) because the processing is dependant on the varNames.
> Any suggestions?
> 
> Thx in advance,
> Alex
> 


Re: Aliases in EvalFunction

Posted by Alan Gates <ga...@yahoo-inc.com>.
UDFs are not serialized from the frontend to the backend, so simply  
saving the schema into a variable will not work.

What version of Pig are you using?  In 0.6 and later the UDFContext  
class exists to allow UDFs to carry information like this from the  
frontend to the backend.

Alan.

On May 22, 2010, at 9:10 AM, Alexander Schätzle wrote:

> Hi all,
>
> in the exec(Tuple input) method of an EvalFunc I get the tuple to be  
> processed by the Eval Function.
> Is there any secure possibility to get the Aliases of the fields in  
> the input Tuple?
> My Eval Function needs to know the names of the Field-Aliases in  
> order to know how to evaluate the tuple.
>
> My current solution is to save the Schema which I get in the  
> outputSchema(Schema input) method in a static instance variable of  
> the Eval Function Class so that I can access the Schema-Information  
> in the exec-method:
>
> private static Schema inputSchema;
> public Schema outputSchema(Schema input) {
>        inputSchema = input;
>        return input;
> }
>
> public Tuple exec(Tuple input) throws IOException {
>        Set<String> aliases = inputSchema.getAliases();
>        ...
>        return input;
> }
>
> Local tests worked but I'm not sure if it is guaranteed to work?
>
> Example:
>
> Describe A;
> A: {varName1: chararray, varName2: chararray}
>
> B = Foreach A Generate MyUDF(*);
>
>
> In MyUDF I have to know the name of the first and second field  
> (varName1 and varName2) because the processing is dependant on the  
> varNames.
> Any suggestions?
>
> Thx in advance,
> Alex
>