You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Rajasekar Elango <re...@salesforce.com> on 2015/08/31 16:57:49 UTC

Aggregating by time window with storm Trident

We have time series data in kafka and we want to aggregate it in storm
using trident. I was able to get data aggregated using persistentAggregate
based onFAQ <https://storm.apache.org/documentation/FAQ.html>. But
aggregation is always done within small batches, I could not figure out a
way to detect when all events for a one minute time window is processed.
Calling each after persistentAggregate(...).newValuesStream() returns
results as soon as a batch is processed, but I want to aggregate values
across multiple batches for a time window. I could not find good answer or
example online. I also see mixed opinion, some people say it's not possible
to do time window aggregation in trident, some people say it's possible
(especially FAQ <https://storm.apache.org/documentation/FAQ.html> looks
promising). The alternate option seem to be using tick tuples with storm
basic, but would prefer to do it in trident as it has better guaranteed
processing semantics and abstraction for persistence.

Can some one provide more details or examples on how to do this?


-- 
Thanks,
Raja.

Re: Aggregating by time window with storm Trident

Posted by Indranil Roy <in...@gmail.com>.
You might want to look at this - its not exactly what you want but may be
what you need.

https://tomdzk.wordpress.com/2011/09/28/storm-esper/

On Mon, Aug 31, 2015 at 8:27 PM Rajasekar Elango <re...@salesforce.com>
wrote:

> We have time series data in kafka and we want to aggregate it in storm
> using trident. I was able to get data aggregated using persistentAggregate
> based onFAQ <https://storm.apache.org/documentation/FAQ.html>. But
> aggregation is always done within small batches, I could not figure out a
> way to detect when all events for a one minute time window is processed.
> Calling each after persistentAggregate(...).newValuesStream() returns
> results as soon as a batch is processed, but I want to aggregate values
> across multiple batches for a time window. I could not find good answer or
> example online. I also see mixed opinion, some people say it's not possible
> to do time window aggregation in trident, some people say it's possible
> (especially FAQ <https://storm.apache.org/documentation/FAQ.html> looks
> promising). The alternate option seem to be using tick tuples with storm
> basic, but would prefer to do it in trident as it has better guaranteed
> processing semantics and abstraction for persistence.
>
> Can some one provide more details or examples on how to do this?
>
>
> --
> Thanks,
> Raja.
>
-- 
Indranil RoyChowdhury
+91-9830027560