You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "David Ciemiewicz (JIRA)" <ji...@apache.org> on 2008/12/22 20:00:44 UTC

[jira] Created: (PIG-575) Please extend FieldSchema class with getSchema() member function for iterating over complex Schemas in Pig UDF outputSchema

Please extend FieldSchema class with getSchema() member function for iterating over complex Schemas in Pig UDF outputSchema
---------------------------------------------------------------------------------------------------------------------------

                 Key: PIG-575
                 URL: https://issues.apache.org/jira/browse/PIG-575
             Project: Pig
          Issue Type: Improvement
            Reporter: David Ciemiewicz


I have discovered that it is not possible to recurse through parts of the input Schema in the UDF outputSchema function.

I have a function that operates on an input bag of tuples and then creates sequential pairings of the rows.

A = foreach One generate { 
( 1, a ),
( 2, b )
}   as  bag { tuple ( seq: int, value: chararray ) };

The output of the PAIRS(A) should be:

{
( ( 1, a ), ( 2, b ) ),
( ( 2, b ), ( null, null ) )
}

The default output schema for the function should be:

bag { tuple ( tuple ( order: int, value: chararray ), tuple ( order: int, value: chararray ) ) ) }

The problem I have is that I'm not able to recurse into the internal Schema of the FieldSchema in my outputSchema function to get at the tuple within the input bag.

Here's my sample outputSchema for PAIRS:

    public Schema outputSchema(Schema input) {
        try {
        System.out.println("input: " + input.toString());

        Schema databagSchema = new Schema();
        Schema tupleSchema = new Schema();

        Schema inputDataBag = new Schema(input.getFields().get(0));
        System.out.println("inputDataBag: " + input.getFields().get(0).toString());

//
//  RIGHT HERE IS WHERE I WANT TO DO inputDataBag.getFields.get(0).getSchema
//
        Schema.FieldSchema inputTuple = inputDataBag.getFields().get(0);  // Here's where I want to say  
        System.out.println("inputTuple: " + inputTuple.toString());

        databagSchema.add(new Schema.FieldSchema(null, DataType.TUPLE));
        System.out.println("databagSchema: " + databagSchema.toString());

        return new Schema(
            new Schema.FieldSchema(
                getSchemaName( this.getClass().getName().toLowerCase(), input),
                databagSchema,
                DataType.BAG
            )
        );
        } catch (Exception e) {
                return null;
        }
    }

Here's the execution output from outputSchema:

input: {A: {seq: int,value: chararray},int,int}
inputDataBag: A: bag({seq: int,value: chararray})
inputTuple: A: bag({seq: int,value: chararray})    <= what I want to see is ( seq: int, value: chararray )
rowSchema: A: bag({seq: int,value: chararray})
rowSchema: A: bag({seq: int,value: chararray})


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-575) Please extend FieldSchema class with getSchema() member function for iterating over complex Schemas in Pig UDF outputSchema

Posted by "Santhosh Srinivasan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12658625#action_12658625 ] 

Santhosh Srinivasan commented on PIG-575:
-----------------------------------------

The FiledSchema member variable schema is public. It can be accessed directly without the use of a getSchema() although having the method could make the code cleaner.

> Please extend FieldSchema class with getSchema() member function for iterating over complex Schemas in Pig UDF outputSchema
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-575
>                 URL: https://issues.apache.org/jira/browse/PIG-575
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>            Reporter: David Ciemiewicz
>            Priority: Minor
>
> I have discovered that it is not possible to recurse through parts of the input Schema in the UDF outputSchema function.
> I have a function that operates on an input bag of tuples and then creates sequential pairings of the rows.
> A = foreach One generate { 
> ( 1, a ),
> ( 2, b )
> }   as  bag { tuple ( seq: int, value: chararray ) };
> The output of the PAIRS(A) should be:
> {
> ( ( 1, a ), ( 2, b ) ),
> ( ( 2, b ), ( null, null ) )
> }
> The default output schema for the function should be:
> bag { tuple ( tuple ( order: int, value: chararray ), tuple ( order: int, value: chararray ) ) ) }
> The problem I have is that I'm not able to recurse into the internal Schema of the FieldSchema in my outputSchema function to get at the tuple within the input bag.
> Here's my sample outputSchema for PAIRS:
>     public Schema outputSchema(Schema input) {
>         try {
>         System.out.println("input: " + input.toString());
>         Schema databagSchema = new Schema();
>         Schema tupleSchema = new Schema();
>         Schema inputDataBag = new Schema(input.getFields().get(0));
>         System.out.println("inputDataBag: " + input.getFields().get(0).toString());
> //
> //  RIGHT HERE IS WHERE I WANT TO DO inputDataBag.getFields.get(0).getSchema
> //
>         Schema.FieldSchema inputTuple = inputDataBag.getFields().get(0);  // Here's where I want to say  
>         System.out.println("inputTuple: " + inputTuple.toString());
>         databagSchema.add(new Schema.FieldSchema(null, DataType.TUPLE));
>         System.out.println("databagSchema: " + databagSchema.toString());
>         return new Schema(
>             new Schema.FieldSchema(
>                 getSchemaName( this.getClass().getName().toLowerCase(), input),
>                 databagSchema,
>                 DataType.BAG
>             )
>         );
>         } catch (Exception e) {
>                 return null;
>         }
>     }
> Here's the execution output from outputSchema:
> input: {A: {seq: int,value: chararray},int,int}
> inputDataBag: A: bag({seq: int,value: chararray})
> inputTuple: A: bag({seq: int,value: chararray})    <= what I want to see is ( seq: int, value: chararray )
> rowSchema: A: bag({seq: int,value: chararray})
> rowSchema: A: bag({seq: int,value: chararray})

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-575) Please extend FieldSchema class with getSchema() member function for iterating over complex Schemas in Pig UDF outputSchema

Posted by "David Ciemiewicz (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Ciemiewicz updated PIG-575:
---------------------------------

    Component/s: impl
       Priority: Minor  (was: Major)

> Please extend FieldSchema class with getSchema() member function for iterating over complex Schemas in Pig UDF outputSchema
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-575
>                 URL: https://issues.apache.org/jira/browse/PIG-575
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>            Reporter: David Ciemiewicz
>            Priority: Minor
>
> I have discovered that it is not possible to recurse through parts of the input Schema in the UDF outputSchema function.
> I have a function that operates on an input bag of tuples and then creates sequential pairings of the rows.
> A = foreach One generate { 
> ( 1, a ),
> ( 2, b )
> }   as  bag { tuple ( seq: int, value: chararray ) };
> The output of the PAIRS(A) should be:
> {
> ( ( 1, a ), ( 2, b ) ),
> ( ( 2, b ), ( null, null ) )
> }
> The default output schema for the function should be:
> bag { tuple ( tuple ( order: int, value: chararray ), tuple ( order: int, value: chararray ) ) ) }
> The problem I have is that I'm not able to recurse into the internal Schema of the FieldSchema in my outputSchema function to get at the tuple within the input bag.
> Here's my sample outputSchema for PAIRS:
>     public Schema outputSchema(Schema input) {
>         try {
>         System.out.println("input: " + input.toString());
>         Schema databagSchema = new Schema();
>         Schema tupleSchema = new Schema();
>         Schema inputDataBag = new Schema(input.getFields().get(0));
>         System.out.println("inputDataBag: " + input.getFields().get(0).toString());
> //
> //  RIGHT HERE IS WHERE I WANT TO DO inputDataBag.getFields.get(0).getSchema
> //
>         Schema.FieldSchema inputTuple = inputDataBag.getFields().get(0);  // Here's where I want to say  
>         System.out.println("inputTuple: " + inputTuple.toString());
>         databagSchema.add(new Schema.FieldSchema(null, DataType.TUPLE));
>         System.out.println("databagSchema: " + databagSchema.toString());
>         return new Schema(
>             new Schema.FieldSchema(
>                 getSchemaName( this.getClass().getName().toLowerCase(), input),
>                 databagSchema,
>                 DataType.BAG
>             )
>         );
>         } catch (Exception e) {
>                 return null;
>         }
>     }
> Here's the execution output from outputSchema:
> input: {A: {seq: int,value: chararray},int,int}
> inputDataBag: A: bag({seq: int,value: chararray})
> inputTuple: A: bag({seq: int,value: chararray})    <= what I want to see is ( seq: int, value: chararray )
> rowSchema: A: bag({seq: int,value: chararray})
> rowSchema: A: bag({seq: int,value: chararray})

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.