You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Rohini Palaniswamy (JIRA)" <ji...@apache.org> on 2017/02/20 19:29:44 UTC

[jira] [Commented] (PIG-5110) Removing schema alias and :: coming from parent relation

    [ https://issues.apache.org/jira/browse/PIG-5110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15874984#comment-15874984 ] 

Rohini Palaniswamy commented on PIG-5110:
-----------------------------------------

Looks good in general. Few comments below.
1) Can you rename pig.schema.aliasprepending.disabled to pig.schema.disambiguate.enabled with default as true. Disambiguate operator is the terminology we use in documentation. Most of pig, tez and hadoop settings generally end with enabled and would be good to keep it consistent. Also variable names have to be changed from prependingDisabled -> disambiguateEnabled in other places.
2) Can you move the documentation to the http://pig.apache.org/docs/r0.16.0/basic.html#disambiguate section. It will make it have better context.

{code}
<p>After JOIN, COGROUP, CROSS, or FLATTEN operations, the field names have the orginial alias and the disambiguate operator ( :: ) prepended in the schema.
The disambiguate operator is used to identify field names in case there is a ambiguity.</p>

<p>In this example, to disambiguate y,  use A::y or B::y.  In cases where there is no ambiguity, such as z, the :: is not necessary but is still supported.</p>

<source>
A = load 'data1' as (x, y);
B = load 'data2' as (x, y, z);
C = join A by x, B by x;
D = foreach C generate A::y, z; -- Cannot simply refer to y as it can refer to A::y or B::y
</source>

<p> For users who do not prefer the disambiguate operator as part of the schema, it can be disabled by setting the <i>pig.schema.disambiguate.enabled</i> Pig property to "false".
It is the responsibility of the user to make sure that there is no conflict in the field names in that case.
This is useful in cases where the schema is stored as part of the StoreFunc like PigStorage, JsonStorage, AvroStorage or OrcStorage
and users want to have :: removed in the field names without having to add an extra FOREACH to rename the field names.
</p>
{code}
3) For testDisabledPrependingFailsForDupeAliases, can you actually catch the exception and assert on its message instead of using the expected annotation.

> Removing schema alias and :: coming from parent relation
> --------------------------------------------------------
>
>                 Key: PIG-5110
>                 URL: https://issues.apache.org/jira/browse/PIG-5110
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Adam Szita
>            Assignee: Adam Szita
>         Attachments: PIG-5110.0.patch
>
>
> Customers have asked for a feature to get rid of the schema alias prefixes. CROSS, JOIN, FLATTEN, etc.. prepend the field name with the parent field alias and ::
> I would like to find a way to disable this feature. (The burden of making sure not to have duplicate aliases - and hence the appropriate FrontendException getting thrown - is on the user)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)