You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Travis Woodruff (JIRA)" <ji...@apache.org> on 2014/01/06 22:25:51 UTC
[jira] [Updated] (PIG-3649) POPartialAgg incorrectly calculates
size reduction when multiple values aggregated
[ https://issues.apache.org/jira/browse/PIG-3649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Travis Woodruff updated PIG-3649:
---------------------------------
Attachment: PIG-3469.patch
Attaching path that updates {{aggregate()}} to count number of result tuples.
Includes two new tests:
- One that shows that basic aggregation of multiple columns works
- Another that reproduces the issue reported here. This requires aggregating > 10,000 rows, so it is a bit slow. Suggestions for alternative approaches welcome.
> POPartialAgg incorrectly calculates size reduction when multiple values aggregated
> ----------------------------------------------------------------------------------
>
> Key: PIG-3649
> URL: https://issues.apache.org/jira/browse/PIG-3649
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.11, 0.12.0, 0.11.1
> Reporter: Travis Woodruff
> Attachments: PIG-3469.patch
>
>
> {{POPartialAgg.aggregate()}} counts the number of output columns ({{valueTuple.size() - 1}}), but {{checkSizeReduction()}} compares this to the number of input tuples.
> When multiple columns are aggregated, this causes the reduction factor to be calculated as too high by a factor of the number of columns being aggregated, which causes in-memory aggregation to be disabled when it should not be, adversely affecting performance,
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)