You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org> on 2011/07/15 17:15:06 UTC
[jira] [Created] (PIG-2168) CubeDimensions UDF
CubeDimensions UDF
------------------
Key: PIG-2168
URL: https://issues.apache.org/jira/browse/PIG-2168
Project: Pig
Issue Type: Sub-task
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
A prerequisite for a naive cubing implementation:
A UDF that, given a set of dimensions (a, b, c) generates all the points on the cube:
(a, b, c), (a, b, null), (a, null, c), (null, b, c), (null, null, c), (a, null, null), (null, b, null), (null, null, null).
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2168) CubeDimensions UDF
Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dmitriy V. Ryaboy updated PIG-2168:
-----------------------------------
Status: Patch Available (was: Open)
> CubeDimensions UDF
> ------------------
>
> Key: PIG-2168
> URL: https://issues.apache.org/jira/browse/PIG-2168
> Project: Pig
> Issue Type: Sub-task
> Reporter: Dmitriy V. Ryaboy
> Assignee: Dmitriy V. Ryaboy
> Fix For: 0.10
>
> Attachments: PIG-2168.patch
>
>
> A prerequisite for a naive cubing implementation:
> A UDF that, given a set of dimensions (a, b, c) generates all the points on the cube:
> (a, b, c), (a, b, null), (a, null, c), (null, b, c), (null, null, c), (a, null, null), (null, b, null), (null, null, null).
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2168) CubeDimensions UDF
Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071229#comment-13071229 ]
Thejas M Nair commented on PIG-2168:
------------------------------------
+1
> CubeDimensions UDF
> ------------------
>
> Key: PIG-2168
> URL: https://issues.apache.org/jira/browse/PIG-2168
> Project: Pig
> Issue Type: Sub-task
> Reporter: Dmitriy V. Ryaboy
> Assignee: Dmitriy V. Ryaboy
> Fix For: 0.10
>
> Attachments: PIG-2168.2.patch, PIG-2168.patch
>
>
> A prerequisite for a naive cubing implementation:
> A UDF that, given a set of dimensions (a, b, c) generates all the points on the cube:
> (a, b, c), (a, b, null), (a, null, c), (null, b, c), (null, null, c), (a, null, null), (null, b, null), (null, null, null).
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2168) CubeDimensions UDF
Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071151#comment-13071151 ]
Thejas M Nair commented on PIG-2168:
------------------------------------
Can you please also add apache license headers for test/org/apache/pig/test/TestCubeDimensions.java and src/org/apache/pig/builtin/CubeDimensions.java ? Everything else in test-patch was successful.
> CubeDimensions UDF
> ------------------
>
> Key: PIG-2168
> URL: https://issues.apache.org/jira/browse/PIG-2168
> Project: Pig
> Issue Type: Sub-task
> Reporter: Dmitriy V. Ryaboy
> Assignee: Dmitriy V. Ryaboy
> Fix For: 0.10
>
> Attachments: PIG-2168.patch
>
>
> A prerequisite for a naive cubing implementation:
> A UDF that, given a set of dimensions (a, b, c) generates all the points on the cube:
> (a, b, c), (a, b, null), (a, null, c), (null, b, c), (null, null, c), (a, null, null), (null, b, null), (null, null, null).
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2168) CubeDimensions UDF
Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dmitriy V. Ryaboy updated PIG-2168:
-----------------------------------
Resolution: Fixed
Release Note:
A new builtin UDF, CubeDimensions, is added to simplify the process of producing cube-like aggregations.
CubeDimensions produces a DataBag with all combinations of the argument tuple members as in a data cube. Meaning, (a, b, c) will produce the following bag:
{ (a, b, c), (null, null, null), (a, b, null), (a, null, c),
(a, null, null), (null, b, c), (null, null, c), (null, b, null) }
The "all" marker is null by default, but can be set to an arbitrary string by invoking a constructor (via a DEFINE). The constructor takes a single argument, the string you want to represent "all".
Usage goes something like this:
events = load '/logs/events' using EventLoader() as (lang, event, app_id);
cubed = foreach x generate
FLATTEN(piggybank.CubeDimensions(lang, event, app_id))
as (lang, event, app_id),
measure;
cube = foreach (group cubed
by (lang, event, app_id) parallel $P)
generate
flatten(group) as (lang, event, app_id),
COUNT_STAR(cubed),
SUM(measure);
store cube into 'event_cube';
Note: doing this with non-algebraic aggregations on large data can result in very slow reducers, since one of the groups is going to get all the records in your relation.
Status: Resolved (was: Patch Available)
Committed to 0.10
> CubeDimensions UDF
> ------------------
>
> Key: PIG-2168
> URL: https://issues.apache.org/jira/browse/PIG-2168
> Project: Pig
> Issue Type: Sub-task
> Reporter: Dmitriy V. Ryaboy
> Assignee: Dmitriy V. Ryaboy
> Fix For: 0.10
>
> Attachments: PIG-2168.2.patch, PIG-2168.patch
>
>
> A prerequisite for a naive cubing implementation:
> A UDF that, given a set of dimensions (a, b, c) generates all the points on the cube:
> (a, b, c), (a, b, null), (a, null, c), (null, b, c), (null, null, c), (a, null, null), (null, b, null), (null, null, null).
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2168) CubeDimensions UDF
Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070851#comment-13070851 ]
Thejas M Nair commented on PIG-2168:
------------------------------------
Patch looks good. A minor comment - Lists.newArrayListWithCapacity would be a better for this case, than Lists.newArrayListWithExpectedSize. Lists.newArrayListWithExpectedSize adds padding, which is unnecessary in this case.
> CubeDimensions UDF
> ------------------
>
> Key: PIG-2168
> URL: https://issues.apache.org/jira/browse/PIG-2168
> Project: Pig
> Issue Type: Sub-task
> Reporter: Dmitriy V. Ryaboy
> Assignee: Dmitriy V. Ryaboy
> Fix For: 0.10
>
> Attachments: PIG-2168.patch
>
>
> A prerequisite for a naive cubing implementation:
> A UDF that, given a set of dimensions (a, b, c) generates all the points on the cube:
> (a, b, c), (a, b, null), (a, null, c), (null, b, c), (null, null, c), (a, null, null), (null, b, null), (null, null, null).
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2168) CubeDimensions UDF
Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dmitriy V. Ryaboy updated PIG-2168:
-----------------------------------
Attachment: PIG-2168.patch
> CubeDimensions UDF
> ------------------
>
> Key: PIG-2168
> URL: https://issues.apache.org/jira/browse/PIG-2168
> Project: Pig
> Issue Type: Sub-task
> Reporter: Dmitriy V. Ryaboy
> Assignee: Dmitriy V. Ryaboy
> Fix For: 0.10
>
> Attachments: PIG-2168.patch
>
>
> A prerequisite for a naive cubing implementation:
> A UDF that, given a set of dimensions (a, b, c) generates all the points on the cube:
> (a, b, c), (a, b, null), (a, null, c), (null, b, c), (null, null, c), (a, null, null), (null, b, null), (null, null, null).
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2168) CubeDimensions UDF
Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dmitriy V. Ryaboy updated PIG-2168:
-----------------------------------
Attachment: PIG-2168.2.patch
Attaching patch with a change to WithCapacity and proper apache headers.
> CubeDimensions UDF
> ------------------
>
> Key: PIG-2168
> URL: https://issues.apache.org/jira/browse/PIG-2168
> Project: Pig
> Issue Type: Sub-task
> Reporter: Dmitriy V. Ryaboy
> Assignee: Dmitriy V. Ryaboy
> Fix For: 0.10
>
> Attachments: PIG-2168.2.patch, PIG-2168.patch
>
>
> A prerequisite for a naive cubing implementation:
> A UDF that, given a set of dimensions (a, b, c) generates all the points on the cube:
> (a, b, c), (a, b, null), (a, null, c), (null, b, c), (null, null, c), (a, null, null), (null, b, null), (null, null, null).
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Closed] (PIG-2168) CubeDimensions UDF
Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dmitriy V. Ryaboy closed PIG-2168.
----------------------------------
> CubeDimensions UDF
> ------------------
>
> Key: PIG-2168
> URL: https://issues.apache.org/jira/browse/PIG-2168
> Project: Pig
> Issue Type: Sub-task
> Reporter: Dmitriy V. Ryaboy
> Assignee: Dmitriy V. Ryaboy
> Fix For: 0.10
>
> Attachments: PIG-2168.2.patch, PIG-2168.patch
>
>
> A prerequisite for a naive cubing implementation:
> A UDF that, given a set of dimensions (a, b, c) generates all the points on the cube:
> (a, b, c), (a, b, null), (a, null, c), (null, b, c), (null, null, c), (a, null, null), (null, b, null), (null, null, null).
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira