You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Jon Yeargers <jo...@cedexis.com> on 2017/03/20 11:51:52 UTC

clearing an aggregation?

Is this possible? Im wondering about gathering data from a stream into a
series of windowed aggregators: minute, hour and day. A separate process
would start at fixed intervals, query the appropriate state store for
available values and then hopefully clear / zero / reset everything for the
next interval.

I could use the retention period setting but I would (somehow) need to
guarantee that the windows would reset on clock boundaries and not based on
start time for the app.

Re: clearing an aggregation?

Posted by Michael Noll <mi...@confluent.io>.
> Since apparently there isn't a way to iterate through Windowed KTables Im
> guessing that this sort of 'aggregate and clear' approach still requires
an
> external datastore (like Redis). Please correct me if Im wrong.

You don't need an external datastore.  You can use state stores for that:
http://docs.confluent.io/current/streams/developer-guide.html#state-stores

FWIW, a newer question of yours was asking how to access data in state
stores "from the outside", which (for the record) I answered by pointing to
Kafka's interactive queries feature:
http://docs.confluent.io/current/streams/developer-guide.html#interactive-queries

-Michael





On Wed, Mar 22, 2017 at 10:00 PM, Jon Yeargers <jo...@cedexis.com>
wrote:

> I get that the windows are aligned along seconds but this doesn't really
> help with true clock alignment (IE top of the hour, midnight, etc).
>
> I can imagine a strategy using overlapping windows. One would
> (hypothetically) walk through the list until a window that spanned the
> desired time was found.
>
> Since apparently there isn't a way to iterate through Windowed KTables Im
> guessing that this sort of 'aggregate and clear' approach still requires an
> external datastore (like Redis). Please correct me if Im wrong.
>
> On Mon, Mar 20, 2017 at 9:29 AM, Michael Noll <mi...@confluent.io>
> wrote:
>
> > Jon,
> >
> > the windowing operation of Kafka's Streams API (in its DSL) aligns
> > time-based windows to the epoch [1]:
> >
> > Quoting from e.g. hopping windows (sometimes called sliding windows in
> > other technologies):
> >
> > > Hopping time windows are aligned to the epoch, with the lower interval
> > bound
> > > being inclusive and the upper bound being exclusive. “Aligned to the
> > epoch”
> > > means that the first window starts at timestamp zero.
> > > For example, hopping windows with a size of 5000ms and an advance
> > interval
> > > (“hop”) of 3000ms have predictable window boundaries
> > `[0;5000),[3000;8000),...`
> > > — and not `[1000;6000),[4000;9000),...` or even something “random” like
> > > `[1452;6452),[4452;9452),...`.
> >
> > Would that help you?
> >
> > -Michael
> >
> >
> >
> > [1] http://docs.confluent.io/current/streams/developer-guide.html
> >
> >
> > On Mon, Mar 20, 2017 at 12:51 PM, Jon Yeargers <jon.yeargers@cedexis.com
> >
> > wrote:
> >
> > > Is this possible? Im wondering about gathering data from a stream into
> a
> > > series of windowed aggregators: minute, hour and day. A separate
> process
> > > would start at fixed intervals, query the appropriate state store for
> > > available values and then hopefully clear / zero / reset everything for
> > the
> > > next interval.
> > >
> > > I could use the retention period setting but I would (somehow) need to
> > > guarantee that the windows would reset on clock boundaries and not
> based
> > on
> > > start time for the app.
> > >
> >
>

Re: clearing an aggregation?

Posted by Jon Yeargers <jo...@cedexis.com>.
I get that the windows are aligned along seconds but this doesn't really
help with true clock alignment (IE top of the hour, midnight, etc).

I can imagine a strategy using overlapping windows. One would
(hypothetically) walk through the list until a window that spanned the
desired time was found.

Since apparently there isn't a way to iterate through Windowed KTables Im
guessing that this sort of 'aggregate and clear' approach still requires an
external datastore (like Redis). Please correct me if Im wrong.

On Mon, Mar 20, 2017 at 9:29 AM, Michael Noll <mi...@confluent.io> wrote:

> Jon,
>
> the windowing operation of Kafka's Streams API (in its DSL) aligns
> time-based windows to the epoch [1]:
>
> Quoting from e.g. hopping windows (sometimes called sliding windows in
> other technologies):
>
> > Hopping time windows are aligned to the epoch, with the lower interval
> bound
> > being inclusive and the upper bound being exclusive. “Aligned to the
> epoch”
> > means that the first window starts at timestamp zero.
> > For example, hopping windows with a size of 5000ms and an advance
> interval
> > (“hop”) of 3000ms have predictable window boundaries
> `[0;5000),[3000;8000),...`
> > — and not `[1000;6000),[4000;9000),...` or even something “random” like
> > `[1452;6452),[4452;9452),...`.
>
> Would that help you?
>
> -Michael
>
>
>
> [1] http://docs.confluent.io/current/streams/developer-guide.html
>
>
> On Mon, Mar 20, 2017 at 12:51 PM, Jon Yeargers <jo...@cedexis.com>
> wrote:
>
> > Is this possible? Im wondering about gathering data from a stream into a
> > series of windowed aggregators: minute, hour and day. A separate process
> > would start at fixed intervals, query the appropriate state store for
> > available values and then hopefully clear / zero / reset everything for
> the
> > next interval.
> >
> > I could use the retention period setting but I would (somehow) need to
> > guarantee that the windows would reset on clock boundaries and not based
> on
> > start time for the app.
> >
>

Re: clearing an aggregation?

Posted by Michael Noll <mi...@confluent.io>.
Jon,

the windowing operation of Kafka's Streams API (in its DSL) aligns
time-based windows to the epoch [1]:

Quoting from e.g. hopping windows (sometimes called sliding windows in
other technologies):

> Hopping time windows are aligned to the epoch, with the lower interval
bound
> being inclusive and the upper bound being exclusive. “Aligned to the
epoch”
> means that the first window starts at timestamp zero.
> For example, hopping windows with a size of 5000ms and an advance interval
> (“hop”) of 3000ms have predictable window boundaries
`[0;5000),[3000;8000),...`
> — and not `[1000;6000),[4000;9000),...` or even something “random” like
> `[1452;6452),[4452;9452),...`.

Would that help you?

-Michael



[1] http://docs.confluent.io/current/streams/developer-guide.html


On Mon, Mar 20, 2017 at 12:51 PM, Jon Yeargers <jo...@cedexis.com>
wrote:

> Is this possible? Im wondering about gathering data from a stream into a
> series of windowed aggregators: minute, hour and day. A separate process
> would start at fixed intervals, query the appropriate state store for
> available values and then hopefully clear / zero / reset everything for the
> next interval.
>
> I could use the retention period setting but I would (somehow) need to
> guarantee that the windows would reset on clock boundaries and not based on
> start time for the app.
>