You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Allan Avendaño (JIRA)" <ji...@apache.org> on 2012/06/06 18:22:23 UTC

[jira] [Created] (PIG-2743) Output Schema

Allan Avendaño created PIG-2743:
-----------------------------------

             Summary: Output Schema
                 Key: PIG-2743
                 URL: https://issues.apache.org/jira/browse/PIG-2743
             Project: Pig
          Issue Type: Sub-task
            Reporter: Allan Avendaño
            Assignee: Allan Avendaño


For the rank operator, I was considering the following schema:

E.g.
A = load 'data' as (x:int,y:chararray,z:int,rz:chararray);
C = rank A by x;

So the output schema could be: 
C: {x: int,y: chararray,z: int,rz: chararray,A::rank: int}

In general 
{<schema_of_working_alias>,<alias>::rank#int}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Resolved] (PIG-2743) Output Schema

Posted by "Gianmarco De Francisci Morales (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gianmarco De Francisci Morales resolved PIG-2743.
-------------------------------------------------

    Resolution: Fixed
    
> Output Schema
> -------------
>
>                 Key: PIG-2743
>                 URL: https://issues.apache.org/jira/browse/PIG-2743
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Allan Avendaño
>            Assignee: Allan Avendaño
>
> For the rank operator, I was considering the following schema:
> E.g.
> A = load 'data' as (x:int,y:chararray,z:int,rz:chararray);
> C = rank A by x;
> So the output schema could be: 
> C: {x: int,y: chararray,z: int,rz: chararray,A::rank: int}
> In general 
> {<schema_of_working_alias>,<alias>::rank#int}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2743) Output Schema

Posted by "Gianmarco De Francisci Morales (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291801#comment-13291801 ] 

Gianmarco De Francisci Morales commented on PIG-2743:
-----------------------------------------------------

The alternative option would be to prepend the rank to the tuple (akin to line numbers).
The advantage would be you always know where your rank field will end up (i.e. $0).
But I have no strong opinion on it.
Anybody else cares to comment?
                
> Output Schema
> -------------
>
>                 Key: PIG-2743
>                 URL: https://issues.apache.org/jira/browse/PIG-2743
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Allan Avendaño
>            Assignee: Allan Avendaño
>
> For the rank operator, I was considering the following schema:
> E.g.
> A = load 'data' as (x:int,y:chararray,z:int,rz:chararray);
> C = rank A by x;
> So the output schema could be: 
> C: {x: int,y: chararray,z: int,rz: chararray,A::rank: int}
> In general 
> {<schema_of_working_alias>,<alias>::rank#int}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Work started] (PIG-2743) Output Schema

Posted by "Allan Avendaño (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on PIG-2743 started by Allan Avendaño.

> Output Schema
> -------------
>
>                 Key: PIG-2743
>                 URL: https://issues.apache.org/jira/browse/PIG-2743
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Allan Avendaño
>            Assignee: Allan Avendaño
>
> For the rank operator, I was considering the following schema:
> E.g.
> A = load 'data' as (x:int,y:chararray,z:int,rz:chararray);
> C = rank A by x;
> So the output schema could be: 
> C: {x: int,y: chararray,z: int,rz: chararray,A::rank: int}
> In general 
> {<schema_of_working_alias>,<alias>::rank#int}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (PIG-2743) Output Schema

Posted by "Allan Avendaño (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400066#comment-13400066 ] 

Allan Avendaño commented on PIG-2743:
-------------------------------------

Currently, I put the field at the beginning.
 
I did two small changes into the schema: 
1.- rank field is long.
2.- The field is named as "rank"
All changes are reflected at ReviewBoard: https://reviews.apache.org/r/5523/diff/#index_header
                
> Output Schema
> -------------
>
>                 Key: PIG-2743
>                 URL: https://issues.apache.org/jira/browse/PIG-2743
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Allan Avendaño
>            Assignee: Allan Avendaño
>
> For the rank operator, I was considering the following schema:
> E.g.
> A = load 'data' as (x:int,y:chararray,z:int,rz:chararray);
> C = rank A by x;
> So the output schema could be: 
> C: {x: int,y: chararray,z: int,rz: chararray,A::rank: int}
> In general 
> {<schema_of_working_alias>,<alias>::rank#int}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (PIG-2743) Output Schema

Posted by "Cristina L. Abad (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13399517#comment-13399517 ] 

Cristina L. Abad commented on PIG-2743:
---------------------------------------

I would also find it helpful if the rank ends up in position $0. Does anybody think this would have any disadvantages?
                
> Output Schema
> -------------
>
>                 Key: PIG-2743
>                 URL: https://issues.apache.org/jira/browse/PIG-2743
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Allan Avendaño
>            Assignee: Allan Avendaño
>
> For the rank operator, I was considering the following schema:
> E.g.
> A = load 'data' as (x:int,y:chararray,z:int,rz:chararray);
> C = rank A by x;
> So the output schema could be: 
> C: {x: int,y: chararray,z: int,rz: chararray,A::rank: int}
> In general 
> {<schema_of_working_alias>,<alias>::rank#int}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira