You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by pRaShAnT <pr...@gmail.com> on 2011/11/16 23:16:38 UTC

Incorrect outputSchema is invoked when overloading UDF in 0.9.1

Hi,

When overloading a UDF with getArgToFuncMapping() the parent/root UDF
outputSchema() is being called.

*LogFieldValue *
@Override
    public List<FuncSpec> getArgToFuncMapping() throws FrontendException {
        List<FuncSpec> funcList = new ArrayList<FuncSpec>();
        Schema s = new Schema();
        s.add(new Schema.FieldSchema(null, DataType.TUPLE));
        s.add(new Schema.FieldSchema(null, DataType.CHARARRAY));
        funcList.add(new FuncSpec(this.getClass().getName(), s));

        s = new Schema();
        s.add(new Schema.FieldSchema(null, DataType.TUPLE));
        s.add(new Schema.FieldSchema(null, DataType.TUPLE));
        funcList.add(new FuncSpec(LogFieldValues.class.getName(), s));

        return funcList;
    }

*LogFieldValue -> returns a CHARARRAY*
@Override
    public Schema outputSchema(Schema input)  {
        return new Schema(new Schema.FieldSchema(null, DataType.CHARARRAY));
    }

*LogFieldValues -> returns a TUPLE*
  @Override
    public Schema outputSchema(Schema input) {
        return new Schema(new Schema.FieldSchema(null, DataType.TUPLE));
    }


The problem I am seeing is when I make a call to LogFieldValue with input
(tuple, tuple) it should be invoking the overloaded function
LogFieldValues. It does invoke LogFieldValues but fails when checking
return type through outputSchema.
*For example the following statement results in error:*
vLogFields = FOREACH A GENERATE FLATTEN(LFV(TOTUPLE(*), ('timestamp',
'runTime', 'cpuTime', 'oracleStatCpuTime', 'userId', 'organizationId') ))
as (ts:bytearray, runTime:bytearray, cpuTime:bytearray,
oracleStatCpuTime:bytearray, userId, orgId);
*pig script failed to validate:
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1031:
Incompatable schema: left is
"ts:bytearray,runTime:bytearray,cpuTime:bytearray,oracleStatCpuTime:bytearray,userId:NULL,orgId:NULL",
right is ":chararray
*
I am using 0.9.1, any known issues? Or am I missing something here?

Thanks,
Prashant