You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Guozhang Wang <wa...@gmail.com> on 2016/08/02 15:37:10 UTC

Re: Kafka streams Issue

Hi Hamza,

We are also working on letting users to have some indirect control over the
data volume based on caching:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-63%3A+Unify+store+and+downstream+caching+in+streams

Guozhang

On Fri, Jul 29, 2016 at 8:24 AM, Hamza HACHANI <ha...@supcom.tn>
wrote:

> Thanks i will try that.
>
>
> Hamza
>
> ________________________________
> De : Tauzell, Dave <Da...@surescripts.com>
> Envoyé : vendredi 29 juillet 2016 03:18:47
> À : users@kafka.apache.org
> Objet : RE: Kafka streams Issue
>
> Let's say you currently have:
>
> Procesing App    ---> OUTPUT TOPIC ---> output consumer
>
> You would ideally like the processing app to only write to the output
> topic every minute, but cannot easily do this.  So what you might be able
> to do is:
>
>
> Processing App ---> INTERMIDIATE OUTPUT TOPIC --->  Coalesce Process
> --->>= OUTPUT TOPIC
>
> The Coalesce Process is an application that does something like:
>
> Bucket = new list()
> Consumer = createConsumer()
> While( message = Cosumer.next() ) {
>     Window = calculate current window
>    If message is after Window:
>      Send Bucket to OUTPUT TOPIC
>   Else
>     Add message to Bucket
>
> }
>
> Dave Tauzell | Senior Software Engineer | Surescripts
> O: 651.855.3042 | www.surescripts.com<http://www.surescripts.com> |
> Dave.Tauzell@surescripts.com
> Connect with us: Twitter I LinkedIn I Facebook I YouTube
>
>
> -----Original Message-----
> From: Hamza HACHANI [mailto:hamza.hachani@supcom.tn]
> Sent: Friday, July 29, 2016 9:53 AM
> To: users@kafka.apache.org
> Subject: RE: Kafka streams Issue
>
> Hi Dave,
>
> Could you explain a little bit much your idea ?
> I can't figure out what you are suggesting.
> Thank you
>
> -Hamza
> ________________________________
> De : Tauzell, Dave <Da...@surescripts.com> Envoyé : vendredi 29
> juillet 2016 02:39:53 À : users@kafka.apache.org Objet : RE: Kafka
> streams Issue
>
> You could send the message immediately to an intermediary topic.  Then
> have a consumer of that topic that pull messages off and waits until the
> minute is up.
>
> -Dave
>
> Dave Tauzell | Senior Software Engineer | Surescripts
> O: 651.855.3042 | www.surescripts.com<http://www.surescripts.com> |
> Dave.Tauzell@surescripts.com
> Connect with us: Twitter I LinkedIn I Facebook I YouTube
>
>
> -----Original Message-----
> From: Hamza HACHANI [mailto:hamza.hachani@supcom.tn]
> Sent: Friday, July 29, 2016 9:36 AM
> To: users@kafka.apache.org
> Subject: Kafka streams Issue
>
> > Good morning,
> >
> > I'm an ICT student in TELECOM BRRETAGNE (a french school).
> > I did follow your presentation in Youtube and i found them really
> > intresting.
> > I'm trying to do some stuffs with Kafka. And now it has been  about 3
> > days that I'm blocked.
> > I'm trying to control the time in which my processing application send
> > data to the output topic .
> > What i'm trying to do is to make the application process data from the
> > input topic all the time but send the messages only at the end of a
> > minute/an hour/a month .... (the notion of windowing).
> > For the moment what i managed to do is that the application instead of
> > sending data only at the end of the minute,it send it anytime it does
> > receive it from the input topic.
> > Have you any suggestions to help me?
> > I would be really gratfeul.
>
>
> Preliminary answer for now:
>
> > For the moment what i managed to do is that the application instead of
> sending data only at the end
> > of the minute,it send it anytime it does receive it from the input topic.
>
> This is actually the expected behavior at the moment.
>
> The main reason for this behavior is that, in stream processing, we never
> know whether there is still late-arriving data to be received.  For
> example, imagine you have 1-minute windows based on event-time.  Here, it
> may happen that, after the first 1 minute window has passed, another record
> arrives five minutes later but, according to the record's event-time, it
> should have still been part of the first 1-minute window.  In this case,
> what we typically want to happen is that the first 1-window will be
> updated/reprocessed with the late-arriving record included.  In other
> words, just because 1 minute has passed (= the 1-minute window is "done")
> it does not mean that actually all the data for that time interval has been
> processed already -- so sending only a single update after 1 minute has
> passed would even produce incorrect results in many cases.  For this reason
> you currently see a downstream update anytime there is a new incoming data
> record ("send it anytime it does receive it from the input topic").  So the
> point here is due ensure correctness of processing.
>
> That said, one known drawback of the current behavior is that users
> haven't been able to control (read: decrease/reduce) the rate/volume of the
> resulting downstream updates.  For example, if you have an input topic with
> a rate of 1 million msg/s (which is easy for Kafka), some users want to
> aggregate/window results primarily to reduce the input rate to a lower
> numbers (e.g. 1 thousand msg/s) so that the data can be fed from Kafka to
> other systems that might not scale as well as Kafka.  To help these use
> cases we will have a new configuration parameter in the next major version
> of Kafka that allows you to control the rate/volume of downstream updates.
> Here, the point is to help users optimize resource usage rather than
> correctness of processing.  This new parameter should also help you with
> your use case.  But even this new parameter is not based on strict time
> behavior or time windows.
>
> This e-mail and any files transmitted with it are confidential, may
> contain sensitive information, and are intended solely for the use of the
> individual or entity to whom they are addressed. If you have received this
> e-mail in error, please notify the sender by reply e-mail immediately and
> destroy all copies of the e-mail and any attachments.
>



-- 
-- Guozhang