You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Rohini Palaniswamy (JIRA)" <ji...@apache.org> on 2016/05/24 22:37:14 UTC

[jira] [Updated] (PIG-3911) Define unique fields with @OutputSchema

     [ https://issues.apache.org/jira/browse/PIG-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rohini Palaniswamy updated PIG-3911:
------------------------------------
    Fix Version/s:     (was: 0.16.0)
                   0.17.0

> Define unique fields with @OutputSchema
> ---------------------------------------
>
>                 Key: PIG-3911
>                 URL: https://issues.apache.org/jira/browse/PIG-3911
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.11, 0.12.0, 0.11.1, 0.12.1, 0.13.0
>            Reporter: Lorand Bendig
>            Assignee: Lorand Bendig
>             Fix For: 0.17.0
>
>         Attachments: PIG-3911.patch
>
>
> Based on PIG-2361, I took the liberty of extending {{@Outputschema}} so that more flexible output schema can be defined through annotations. As a result, the repeating patterns of {{EvalFunc#outputSchema()}} can be eliminated from most of the UDFs.
> Examples:
> {code}
> @OutputSchema("bytearray")
> {code}
> => equivalent to:
> {code}
> @Override
> public Schema outputSchema(Schema input) {
>   return new Schema(new Schema.FieldSchema(null, DataType.BYTEARRAY));
> }
> {code}
> {code}
> @OutputSchema("chararray")
> @Unique
> {code}
> => equivalent to:
> {code}
> @Override
> public Schema outputSchema(Schema input) {
>   return new Schema(new Schema.FieldSchema(getSchemaName(this.getClass().getName().toLowerCase(), input), DataType.CHARARRAY));
> }
> {code}
> {code}
> @OutputSchema(value = "dimensions:bag", useInputSchema = true)
> {code}
> => equivalent to:
> {code}
> @Override
> public Schema outputSchema(Schema input) {
>   return new Schema(new FieldSchema("dimensions", input, DataType.BAG));
> }
> {code}
> {code}
> @OutputSchema(value = "${0}:bag", useInputSchema = true)
> @Unique("${0}")
> {code}
> => equivalent to:
> {code}
> @Override
> public Schema outputSchema(Schema input) {
>     return new Schema(new Schema.FieldSchema(getSchemaName(this.getClass().getName().toLowerCase(), input), input, DataType.BAG));
> }
> {code}
> If {{useInputSchema}} attribute is set then input schema will be applied to the output schema, provided that:
> * outputschema is "simple", i.e: \[name\]\[:type\]  or '()', '{}', '[]' and
> * it has complex field type (tuple, bag, map)
> @Unique : this annotation defines which fields should be unique in the schema
> * if no parameters are provided, all fields will be unique
> * otherwise it takes a string array of fields name
> Unique field generation:
> A unique field is generated in the same manner that {{EvalFunc#getSchemaName}} does.
> * if field has an alias:
>   ** it's a placeholder ($\{i\}, i=0..n) : fieldName -> com_myfunc_\[input_alias\]\_\[nextSchemaId\]
>   ** otherwise: fieldName -> fieldName\_\[nextSchemaId\]
> * otherwise: com\_myfunc\_\[input_alias\]\_\[nextSchemaId\]
> Supported scripting UDFs: Python, Jython, Groovy, JRuby



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)