You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Olga Natkovich (JIRA)" <ji...@apache.org> on 2011/04/09 02:26:05 UTC

[jira] [Resolved] (PIG-1984) Nedd to clarify unknown schema

     [ https://issues.apache.org/jira/browse/PIG-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich resolved PIG-1984.
---------------------------------

    Resolution: Fixed

We already include the following in 0.9 documentation:

Known Schema Handling

Note the following:

    * You can define a schema that includes both the field name and field type.
    * You can define a schema that includes the field name only; in this case, the field type defaults to bytearray.
    * You can choose not to define a schema; in this case, the field is un-named and the field type defaults to bytearray.

If you assign a name to a field, you can refer to that field using the name or by positional notation. If you don't assign a name to a field (the field is un-named) you can only refer to the field using positional notation.

If you assign a type to a field, you can subsequently change the type using the cast operators. If you don't assign a type to a field, the field defaults to bytearray; you can change the default type using the cast operators.

Unknown Schema Handling

Note the following:

    * When you JOIN/COGROUP/CROSS multiple relations, if any relation has a null schema (no defined schema), the schema for the resulting relation is null.
    * If you FLATTEN a bag with empty inner schema, the schema for the resulting relation is null.
    * If you UNION two relations with incompatible schema, the schema for resulting relation is null.
    * If the schema is null, Pig treats all fields as bytearray (in the backend, Pig will determine the real type for the fields dynamically)


> Nedd to clarify unknown schema
> ------------------------------
>
>                 Key: PIG-1984
>                 URL: https://issues.apache.org/jira/browse/PIG-1984
>             Project: Pig
>          Issue Type: Improvement
>          Components: documentation
>    Affects Versions: 0.9.0
>            Reporter: Daniel Dai
>            Assignee: Corinne Chandel
>             Fix For: 0.9.0
>
>
> We need to clarify how unknown schema is used in Pig. For every field, if user don't tell us the data type, we use bytearray to denote an unknown type. In the case when we don't even know how many fields, Pig will derive unknown (null) schema.
> For example:
> a = load '1.txt' as (a0, b0);
> a: {a0: bytearray,b0: bytearray}
> a = load '1.txt';
> a: Schema for a unknown

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira