You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "David Ciemiewicz (JIRA)" <ji...@apache.org> on 2008/12/22 20:02:44 UTC
[jira] Updated: (PIG-575) Please extend FieldSchema class with
getSchema() member function for iterating over complex Schemas in Pig UDF
outputSchema
[ https://issues.apache.org/jira/browse/PIG-575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
David Ciemiewicz updated PIG-575:
---------------------------------
Component/s: impl
Priority: Minor (was: Major)
> Please extend FieldSchema class with getSchema() member function for iterating over complex Schemas in Pig UDF outputSchema
> ---------------------------------------------------------------------------------------------------------------------------
>
> Key: PIG-575
> URL: https://issues.apache.org/jira/browse/PIG-575
> Project: Pig
> Issue Type: Improvement
> Components: impl
> Reporter: David Ciemiewicz
> Priority: Minor
>
> I have discovered that it is not possible to recurse through parts of the input Schema in the UDF outputSchema function.
> I have a function that operates on an input bag of tuples and then creates sequential pairings of the rows.
> A = foreach One generate {
> ( 1, a ),
> ( 2, b )
> } as bag { tuple ( seq: int, value: chararray ) };
> The output of the PAIRS(A) should be:
> {
> ( ( 1, a ), ( 2, b ) ),
> ( ( 2, b ), ( null, null ) )
> }
> The default output schema for the function should be:
> bag { tuple ( tuple ( order: int, value: chararray ), tuple ( order: int, value: chararray ) ) ) }
> The problem I have is that I'm not able to recurse into the internal Schema of the FieldSchema in my outputSchema function to get at the tuple within the input bag.
> Here's my sample outputSchema for PAIRS:
> public Schema outputSchema(Schema input) {
> try {
> System.out.println("input: " + input.toString());
> Schema databagSchema = new Schema();
> Schema tupleSchema = new Schema();
> Schema inputDataBag = new Schema(input.getFields().get(0));
> System.out.println("inputDataBag: " + input.getFields().get(0).toString());
> //
> // RIGHT HERE IS WHERE I WANT TO DO inputDataBag.getFields.get(0).getSchema
> //
> Schema.FieldSchema inputTuple = inputDataBag.getFields().get(0); // Here's where I want to say
> System.out.println("inputTuple: " + inputTuple.toString());
> databagSchema.add(new Schema.FieldSchema(null, DataType.TUPLE));
> System.out.println("databagSchema: " + databagSchema.toString());
> return new Schema(
> new Schema.FieldSchema(
> getSchemaName( this.getClass().getName().toLowerCase(), input),
> databagSchema,
> DataType.BAG
> )
> );
> } catch (Exception e) {
> return null;
> }
> }
> Here's the execution output from outputSchema:
> input: {A: {seq: int,value: chararray},int,int}
> inputDataBag: A: bag({seq: int,value: chararray})
> inputTuple: A: bag({seq: int,value: chararray}) <= what I want to see is ( seq: int, value: chararray )
> rowSchema: A: bag({seq: int,value: chararray})
> rowSchema: A: bag({seq: int,value: chararray})
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.