You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "Rick Kellogg (JIRA)" <ji...@apache.org> on 2015/10/09 02:36:27 UTC

[jira] [Updated] (STORM-7) storm.trident.operation.Aggregator: include group information in init() method

     [ https://issues.apache.org/jira/browse/STORM-7?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rick Kellogg updated STORM-7:
-----------------------------
    Component/s: storm-core

> storm.trident.operation.Aggregator: include group information in init() method
> ------------------------------------------------------------------------------
>
>                 Key: STORM-7
>                 URL: https://issues.apache.org/jira/browse/STORM-7
>             Project: Apache Storm
>          Issue Type: Improvement
>          Components: storm-core
>            Reporter: James Xu
>            Priority: Minor
>
> Reported by @lorenzfischer
> To be able to share resources between different groups in a grouped aggregator, it would be helpful to have information about the group available in the init() method of the aggregator interface.
> The concrete use case is the following:
> For our project we need to count the number of unique values in a field of a grouped stream. We have hundreds of millions of unique values and millions of grouped values. For this reason, we're currently deploying the HyperLogLog class that has generously been made available by the people at Clearspring >(https://github.com/clearspring/stream-lib). Naturally, we end up with millions of counter objects.
> The DSI-Utils library (http://dsiutils.di.unimi.it) offers a class that allows one to reduce the overhead incurred by this many HLL objects through its HyperLogLogCounterArray class. We're struggling with the implementation in Trident though, as the init(Object batchId, TridentCollector collector) method of the aggregator interface does not provide any information about the current "group" the aggregator should be initialized for.
> (This was initially posted on Google Groups: https://groups.google.com/forum/#!topic/storm-user/dthUfkMRNhU)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)