You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org> on 2011/07/15 17:15:06 UTC

[jira] [Created] (PIG-2168) CubeDimensions UDF

CubeDimensions UDF
------------------

                 Key: PIG-2168
                 URL: https://issues.apache.org/jira/browse/PIG-2168
             Project: Pig
          Issue Type: Sub-task
            Reporter: Dmitriy V. Ryaboy
            Assignee: Dmitriy V. Ryaboy


A prerequisite for a naive cubing implementation:
A UDF that, given a set of dimensions (a, b, c) generates all the points on the cube:
(a, b, c), (a, b, null), (a, null, c), (null, b, c), (null, null, c), (a, null, null), (null, b, null), (null, null, null).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2168) CubeDimensions UDF

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy V. Ryaboy updated PIG-2168:
-----------------------------------

    Status: Patch Available  (was: Open)

> CubeDimensions UDF
> ------------------
>
>                 Key: PIG-2168
>                 URL: https://issues.apache.org/jira/browse/PIG-2168
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>             Fix For: 0.10
>
>         Attachments: PIG-2168.patch
>
>
> A prerequisite for a naive cubing implementation:
> A UDF that, given a set of dimensions (a, b, c) generates all the points on the cube:
> (a, b, c), (a, b, null), (a, null, c), (null, b, c), (null, null, c), (a, null, null), (null, b, null), (null, null, null).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2168) CubeDimensions UDF

Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071229#comment-13071229 ] 

Thejas M Nair commented on PIG-2168:
------------------------------------

+1

> CubeDimensions UDF
> ------------------
>
>                 Key: PIG-2168
>                 URL: https://issues.apache.org/jira/browse/PIG-2168
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>             Fix For: 0.10
>
>         Attachments: PIG-2168.2.patch, PIG-2168.patch
>
>
> A prerequisite for a naive cubing implementation:
> A UDF that, given a set of dimensions (a, b, c) generates all the points on the cube:
> (a, b, c), (a, b, null), (a, null, c), (null, b, c), (null, null, c), (a, null, null), (null, b, null), (null, null, null).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2168) CubeDimensions UDF

Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071151#comment-13071151 ] 

Thejas M Nair commented on PIG-2168:
------------------------------------

Can you please also add apache license headers for  test/org/apache/pig/test/TestCubeDimensions.java and src/org/apache/pig/builtin/CubeDimensions.java ? Everything else in test-patch was successful.



> CubeDimensions UDF
> ------------------
>
>                 Key: PIG-2168
>                 URL: https://issues.apache.org/jira/browse/PIG-2168
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>             Fix For: 0.10
>
>         Attachments: PIG-2168.patch
>
>
> A prerequisite for a naive cubing implementation:
> A UDF that, given a set of dimensions (a, b, c) generates all the points on the cube:
> (a, b, c), (a, b, null), (a, null, c), (null, b, c), (null, null, c), (a, null, null), (null, b, null), (null, null, null).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2168) CubeDimensions UDF

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy V. Ryaboy updated PIG-2168:
-----------------------------------

      Resolution: Fixed
    Release Note: 
A new builtin UDF, CubeDimensions, is added to simplify the process of producing cube-like aggregations.

CubeDimensions produces a DataBag with all combinations of the argument tuple members as in a data cube. Meaning, (a, b, c) will produce the following bag:

 { (a, b, c), (null, null, null), (a, b, null), (a, null, c),
   (a, null, null), (null, b, c), (null, null, c), (null, b, null) }
 
The "all" marker is null by default, but can be set to an arbitrary string by invoking a constructor (via a DEFINE). The constructor takes a single argument, the string you want to represent "all".

Usage goes something like this:

events = load '/logs/events' using EventLoader() as (lang, event, app_id);
 cubed = foreach x generate
   FLATTEN(piggybank.CubeDimensions(lang, event, app_id))
     as (lang, event, app_id),
   measure;
 cube = foreach (group cubed
                 by (lang, event, app_id) parallel $P)
        generate
   flatten(group) as (lang, event, app_id),
   COUNT_STAR(cubed),
   SUM(measure);
 store cube into 'event_cube';

Note: doing this with non-algebraic aggregations on large data can result in very slow reducers, since one of the groups is going to get all the records in your relation.
          Status: Resolved  (was: Patch Available)

Committed to 0.10

> CubeDimensions UDF
> ------------------
>
>                 Key: PIG-2168
>                 URL: https://issues.apache.org/jira/browse/PIG-2168
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>             Fix For: 0.10
>
>         Attachments: PIG-2168.2.patch, PIG-2168.patch
>
>
> A prerequisite for a naive cubing implementation:
> A UDF that, given a set of dimensions (a, b, c) generates all the points on the cube:
> (a, b, c), (a, b, null), (a, null, c), (null, b, c), (null, null, c), (a, null, null), (null, b, null), (null, null, null).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2168) CubeDimensions UDF

Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070851#comment-13070851 ] 

Thejas M Nair commented on PIG-2168:
------------------------------------

Patch looks good. A minor comment - Lists.newArrayListWithCapacity would be a better for this case, than Lists.newArrayListWithExpectedSize.  Lists.newArrayListWithExpectedSize adds padding, which is unnecessary in this case.


> CubeDimensions UDF
> ------------------
>
>                 Key: PIG-2168
>                 URL: https://issues.apache.org/jira/browse/PIG-2168
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>             Fix For: 0.10
>
>         Attachments: PIG-2168.patch
>
>
> A prerequisite for a naive cubing implementation:
> A UDF that, given a set of dimensions (a, b, c) generates all the points on the cube:
> (a, b, c), (a, b, null), (a, null, c), (null, b, c), (null, null, c), (a, null, null), (null, b, null), (null, null, null).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2168) CubeDimensions UDF

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy V. Ryaboy updated PIG-2168:
-----------------------------------

    Attachment: PIG-2168.patch

> CubeDimensions UDF
> ------------------
>
>                 Key: PIG-2168
>                 URL: https://issues.apache.org/jira/browse/PIG-2168
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>             Fix For: 0.10
>
>         Attachments: PIG-2168.patch
>
>
> A prerequisite for a naive cubing implementation:
> A UDF that, given a set of dimensions (a, b, c) generates all the points on the cube:
> (a, b, c), (a, b, null), (a, null, c), (null, b, c), (null, null, c), (a, null, null), (null, b, null), (null, null, null).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2168) CubeDimensions UDF

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy V. Ryaboy updated PIG-2168:
-----------------------------------

    Attachment: PIG-2168.2.patch

Attaching patch with a change to WithCapacity and proper apache headers.

> CubeDimensions UDF
> ------------------
>
>                 Key: PIG-2168
>                 URL: https://issues.apache.org/jira/browse/PIG-2168
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>             Fix For: 0.10
>
>         Attachments: PIG-2168.2.patch, PIG-2168.patch
>
>
> A prerequisite for a naive cubing implementation:
> A UDF that, given a set of dimensions (a, b, c) generates all the points on the cube:
> (a, b, c), (a, b, null), (a, null, c), (null, b, c), (null, null, c), (a, null, null), (null, b, null), (null, null, null).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Closed] (PIG-2168) CubeDimensions UDF

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy V. Ryaboy closed PIG-2168.
----------------------------------


> CubeDimensions UDF
> ------------------
>
>                 Key: PIG-2168
>                 URL: https://issues.apache.org/jira/browse/PIG-2168
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>             Fix For: 0.10
>
>         Attachments: PIG-2168.2.patch, PIG-2168.patch
>
>
> A prerequisite for a naive cubing implementation:
> A UDF that, given a set of dimensions (a, b, c) generates all the points on the cube:
> (a, b, c), (a, b, null), (a, null, c), (null, b, c), (null, null, c), (a, null, null), (null, b, null), (null, null, null).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira