You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by ni...@afshartous.com on 2019/01/17 02:51:36 UTC

Prioritized Topics for Kafka


Hi all,

On the dev list we’ve been discussing a proposed new feature (prioritized topics).  In a nutshell, when consuming from a set of topics with assigned priorities, consumption from lower-priority topics only occurs if there’s no data flowing in from a higher-priority topic.  

  https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics <https://cwiki.apache.org/confluence/display/KAFKA/KIP-349:+Priorities+for+Source+Topics>

One question is are there use-cases for the proposed API.  If you think this would be useful and have use-cases in mind please reply with the use-cases.

Its also possible to implement prioritization with the existing API by using a combination of pausing, resuming, and local buffering.  The question is then does it make sense to introduce the proposed higher-level API to make this easier ?

The responses will be used as input to determine if we move ahead with the proposal.  Thanks in advance for input.  

Cheers,
--
      Nick

Re: Prioritized Topics for Kafka

Posted by Michal Michalski <mi...@zalando.ie>.

Hi,

This sounds like a great idea, and thanks for reaching out for feedback.
Here are two use cases I've worked on that I'd seriously consider using
such feature for:

1. Priority Republish of Data - in an event driven system, there's a
"republish" functionality used for e.g. fixing data affected by bugs,
updating data to remove some information (Hello GDPR!), feeding data to new
consumers who - for some reason - can't use the existing outbound data
streams and need new ones etc. These republishes can have different
priorities - bug fixes and "compliance-related" republishes are usually P1,
while bootstrapping new consumer is something I'd probably know about weeks
in advance, so this could be a background task going on for days.

2. Weighted consumption priority for events - there's a system in which
data model is built from a few building blocks that are put together
(aggregated) into a final, single object at the end. Some of the building
blocks are "core" and are very important, others are not - e.g. think about
system that's processing shop product information: changes to pricing or
stock availability are probably very important (could cause you lose money
if you sell too cheap or cause legal problems if you sell something you
don't have in stock), changes to some product properties might be less
important (e.g. adding new photo of the chair you're selling is probably
important, but if it comes in late it's probably not the end of the world)
and updates to some ancillary information (e.g. average product rating) can
probably wait hours if not days.

While I know that this proposal is about the "consume all from P1 before
moving to P2", I'd like to point out that for some use cases, like (2)
above, it would be great to have the priorities used as "weights" that
define how much of the "consumption time unit" is spent in each of the
topics, so even if this is not planned to be implemented initially, I hope
that it was not completely ruled out as a future feature yet (and if it
was, may I ask why? are there any technical limitations to that or is there
some reason why it wouldn't work that I can't see? just my curiosity).

Anyway, regardless of the details, such feature would be very welcome!

Thanks,
Michał

On Thu, 17 Jan 2019 at 19:30, Ryanne Dolan <ry...@gmail.com> wrote:

> Nick, I think it's worth noting that Kafka is not a real-time system, and
> most of these use cases (and TBH any that I can imagine) for prioritized
> topics are real-time use cases.
>
> For example, you wouldn't want to pilot a jet (a canonical real-time
> system) based on Kafka events, as there is no guarantee that a message will
> be received, stored, or processed at any particular rate or by any specific
> deadline. Adding priorities does not change that.
>
> It would be incredibly cool to support real-time processing in Kafka, but I
> think this would require coordination between producers, brokers, and
> consumers, as is the case with exactly-once delivery, and would likely
> require huge architectural changes. If we did something like that,
> priorities would be moot, as a producer could say here's a message, make
> sure it's processed by this deadline, etc.
>
> I'm certainly not saying there are no good real-time use cases (there are
> many!), but I don't think they are a valid argument in support of
> prioritized topics.
>
> Ryanne
>
>
> On Thu, Jan 17, 2019, 12:35 PM Subhash Sriram <subhash.sriram@gmail.com
> wrote:
>
> > Use case: we process documents from a variety of sources. We want to
> > process some of these sources in a priority order, but we don’t want to
> > necessarily finish all the higher priority sources before going to lower
> > priority because the volume of higher priority sources can be extremely
> > high.
> >
> > We have solved this problem right now with multiple topics and we consume
> > from them in priority order, pausing every N records, to see if there are
> > any records from a lower priority source.
> >
> > That being said, our company also has a use case similar to what Jeff
> > described. I think priority topics would be a very valuable feature!
> >
> > Thanks,
> > Subhash
> >
> > On Thu, Jan 17, 2019 at 12:34 PM Jeff Widman <je...@jeffwidman.com>
> wrote:
> >
> > > Use case:
> > > I work for a company that ingests events that come from both real-time
> > > sources (which spike during the day) and historical log data.
> > >
> > > We want the real-time data processed in minutes, and the historical log
> > > data processed within hours. The consumer's business logic is the same.
> > >
> > > Our current plan is to have two topics, and two downstream consumer
> > groups.
> > > We plan to have the "hot" consumer group of the real-time data
> > provisioned
> > > at the 90th percentile of inbound message rate. And the "cold" log data
> > at
> > > the 60th percentile because it's okay if it takes longer to absorb
> spikes
> > > in cold data.
> > >
> > > Priority topics could *potentially* solve this.
> > >
> > > However, one problem we've hit with a similar priority queuing system
> > built
> > > using a different tech stack was that if there was even a handful of
> > > messages in the priority queues, those would keep the consumer just
> busy
> > > enough that the cold data would never be processed.
> > >
> > > The underlying root cause of the problem was two-fold:
> > > 1) the API only returned messages from a single queue at a time, so
> even
> > if
> > > the consumer requested 1,000 messages, the scheduler would see a
> message
> > in
> > > the hot queue and immediately return it. By the time the consumer
> > processed
> > > that and requested another batch of messages, one more message had
> > trickled
> > > into the hot queue. Versus if the API made sure to return a full batch,
> > > first by filling up the hot queue and then from the cold queue, we
> could
> > > still get batch efficiency at the network / consumer / downstream DB
> call
> > > layers.
> > > 2) On the server side, switching between fetching messages for the
> > > different queues seemed to be expensive. I'm not sure if that was due
> to
> > an
> > > inefficient scheduler, lack of memory, or poor I/O management. I
> suspect
> > > Kafka wouldn't hit this as long as the messages were present in the
> page
> > > cache, but it's just something to keep in mind--how this is implemented
> > > matters from a performance/starvation standpoint.
> > >
> > > So from a design standpoint, I think that means that for a priority
> > > queueing design to minimize starvation, the design criteria should
> > probably
> > > be "returning messages based on priority, but be sure to also keep the
> > > consumer fully occupied"
> > >
> > > If done right, this would make our lives much easier operationally
> (only
> > > one consumer group to manage, not two) and make our consumer usage more
> > > efficient.
> > >
> > >
> > > On Thu, Jan 17, 2019 at 4:20 AM Tobias Adamson <
> tobias@stargazer.com.sg>
> > > wrote:
> > >
> > > > Use cases: prioritise current data
> > > >
> > > > When processing messages sometimes there is a need to re process old
> > > data.
> > > > It would be nice to be abled to send the old data as messages to a
> > > > separate topic and that would only be processed when the current
> topic
> > > > doesn’t have any messages left to process.
> > > > This would prevent customers getting delays in current data
> processing
> > > due
> > > > to message processors being  busy processing old data.
> > > >
> > > >
> > > > > On 17 Jan 2019, at 7:55 PM, Tim Ward <ti...@origamienergy.com>
> > > wrote:
> > > > >
> > > > > Use cases: processing alerts.
> > > > >
> > > > > High priority alerts ("a large chunk of your system has stopped
> > > > providing service, immediate action essential") should be processed
> > > before
> > > > low priority alerts ("some minor component has put out a not-very
> > serious
> > > > warning, somebody should probably have a look at it when they get
> > > bored"),
> > > > of which there could be a long queue.
> > > > >
> > > > > Urgent alerts (a phone call telling someone "you need to do this
> > now")
> > > > should be processed before non-urgent alerts (a phone call telling
> > > someone
> > > > "FYI, such and such is going to happen in a couple of hours").
> > > > >
> > > > > Tim Ward
> > > > >
> > > > > -----Original Message-----
> > > > > From: nick@afshartous.com <ni...@afshartous.com>
> > > > > Sent: 17 January 2019 02:52
> > > > > To: users@kafka.apache.org
> > > > > Subject: Prioritized Topics for Kafka
> > > > >
> > > > >
> > > > >
> > > > > Hi all,
> > > > >
> > > > > On the dev list we’ve been discussing a proposed new feature
> > > > (prioritized topics).  In a nutshell, when consuming from a set of
> > topics
> > > > with assigned priorities, consumption from lower-priority topics only
> > > > occurs if there’s no data flowing in from a higher-priority topic.
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics
> > > > <
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-349:+Priorities+for+Source+Topics
> > > > >
> > > > >
> > > > > One question is are there use-cases for the proposed API.  If you
> > think
> > > > this would be useful and have use-cases in mind please reply with the
> > > > use-cases.
> > > > >
> > > > > Its also possible to implement prioritization with the existing API
> > by
> > > > using a combination of pausing, resuming, and local buffering.  The
> > > > question is then does it make sense to introduce the proposed
> > > higher-level
> > > > API to make this easier ?
> > > > >
> > > > > The responses will be used as input to determine if we move ahead
> > with
> > > > the proposal.  Thanks in advance for input.
> > > > >
> > > > > Cheers,
> > > > > --
> > > > >      Nick
> > > > >
> > > > > The contents of this email and any attachment are confidential to
> the
> > > > intended recipient(s). If you are not an intended recipient: (i) do
> not
> > > > use, disclose, distribute, copy or publish this email or its
> contents;
> > > (ii)
> > > > please contact the sender immediately; and (iii) delete this email.
> Our
> > > > privacy policy is available here:
> > > > https://origamienergy.com/privacy-policy/. Origami Energy Limited
> > > > (company number 8619644); Origami Storage Limited (company number
> > > 10436515)
> > > > and OSSPV001 Limited (company number 10933403), each registered in
> > > England
> > > > and each with a registered office at: Ashcombe Court, Woolsack Way,
> > > > Godalming, GU7 1LQ.
> > > >
> > > >
> > >
> > > --
> > >
> > > *Jeff Widman*
> > > jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265)
> > > <><
> > >
> >
>

Re: Prioritized Topics for Kafka

Posted by Ryanne Dolan <ry...@gmail.com>.

Nick, I think it's worth noting that Kafka is not a real-time system, and
most of these use cases (and TBH any that I can imagine) for prioritized
topics are real-time use cases.

For example, you wouldn't want to pilot a jet (a canonical real-time
system) based on Kafka events, as there is no guarantee that a message will
be received, stored, or processed at any particular rate or by any specific
deadline. Adding priorities does not change that.

It would be incredibly cool to support real-time processing in Kafka, but I
think this would require coordination between producers, brokers, and
consumers, as is the case with exactly-once delivery, and would likely
require huge architectural changes. If we did something like that,
priorities would be moot, as a producer could say here's a message, make
sure it's processed by this deadline, etc.

I'm certainly not saying there are no good real-time use cases (there are
many!), but I don't think they are a valid argument in support of
prioritized topics.

Ryanne


On Thu, Jan 17, 2019, 12:35 PM Subhash Sriram <subhash.sriram@gmail.com
wrote:

> Use case: we process documents from a variety of sources. We want to
> process some of these sources in a priority order, but we don’t want to
> necessarily finish all the higher priority sources before going to lower
> priority because the volume of higher priority sources can be extremely
> high.
>
> We have solved this problem right now with multiple topics and we consume
> from them in priority order, pausing every N records, to see if there are
> any records from a lower priority source.
>
> That being said, our company also has a use case similar to what Jeff
> described. I think priority topics would be a very valuable feature!
>
> Thanks,
> Subhash
>
> On Thu, Jan 17, 2019 at 12:34 PM Jeff Widman <je...@jeffwidman.com> wrote:
>
> > Use case:
> > I work for a company that ingests events that come from both real-time
> > sources (which spike during the day) and historical log data.
> >
> > We want the real-time data processed in minutes, and the historical log
> > data processed within hours. The consumer's business logic is the same.
> >
> > Our current plan is to have two topics, and two downstream consumer
> groups.
> > We plan to have the "hot" consumer group of the real-time data
> provisioned
> > at the 90th percentile of inbound message rate. And the "cold" log data
> at
> > the 60th percentile because it's okay if it takes longer to absorb spikes
> > in cold data.
> >
> > Priority topics could *potentially* solve this.
> >
> > However, one problem we've hit with a similar priority queuing system
> built
> > using a different tech stack was that if there was even a handful of
> > messages in the priority queues, those would keep the consumer just busy
> > enough that the cold data would never be processed.
> >
> > The underlying root cause of the problem was two-fold:
> > 1) the API only returned messages from a single queue at a time, so even
> if
> > the consumer requested 1,000 messages, the scheduler would see a message
> in
> > the hot queue and immediately return it. By the time the consumer
> processed
> > that and requested another batch of messages, one more message had
> trickled
> > into the hot queue. Versus if the API made sure to return a full batch,
> > first by filling up the hot queue and then from the cold queue, we could
> > still get batch efficiency at the network / consumer / downstream DB call
> > layers.
> > 2) On the server side, switching between fetching messages for the
> > different queues seemed to be expensive. I'm not sure if that was due to
> an
> > inefficient scheduler, lack of memory, or poor I/O management. I suspect
> > Kafka wouldn't hit this as long as the messages were present in the page
> > cache, but it's just something to keep in mind--how this is implemented
> > matters from a performance/starvation standpoint.
> >
> > So from a design standpoint, I think that means that for a priority
> > queueing design to minimize starvation, the design criteria should
> probably
> > be "returning messages based on priority, but be sure to also keep the
> > consumer fully occupied"
> >
> > If done right, this would make our lives much easier operationally (only
> > one consumer group to manage, not two) and make our consumer usage more
> > efficient.
> >
> >
> > On Thu, Jan 17, 2019 at 4:20 AM Tobias Adamson <to...@stargazer.com.sg>
> > wrote:
> >
> > > Use cases: prioritise current data
> > >
> > > When processing messages sometimes there is a need to re process old
> > data.
> > > It would be nice to be abled to send the old data as messages to a
> > > separate topic and that would only be processed when the current topic
> > > doesn’t have any messages left to process.
> > > This would prevent customers getting delays in current data processing
> > due
> > > to message processors being  busy processing old data.
> > >
> > >
> > > > On 17 Jan 2019, at 7:55 PM, Tim Ward <ti...@origamienergy.com>
> > wrote:
> > > >
> > > > Use cases: processing alerts.
> > > >
> > > > High priority alerts ("a large chunk of your system has stopped
> > > providing service, immediate action essential") should be processed
> > before
> > > low priority alerts ("some minor component has put out a not-very
> serious
> > > warning, somebody should probably have a look at it when they get
> > bored"),
> > > of which there could be a long queue.
> > > >
> > > > Urgent alerts (a phone call telling someone "you need to do this
> now")
> > > should be processed before non-urgent alerts (a phone call telling
> > someone
> > > "FYI, such and such is going to happen in a couple of hours").
> > > >
> > > > Tim Ward
> > > >
> > > > -----Original Message-----
> > > > From: nick@afshartous.com <ni...@afshartous.com>
> > > > Sent: 17 January 2019 02:52
> > > > To: users@kafka.apache.org
> > > > Subject: Prioritized Topics for Kafka
> > > >
> > > >
> > > >
> > > > Hi all,
> > > >
> > > > On the dev list we’ve been discussing a proposed new feature
> > > (prioritized topics).  In a nutshell, when consuming from a set of
> topics
> > > with assigned priorities, consumption from lower-priority topics only
> > > occurs if there’s no data flowing in from a higher-priority topic.
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics
> > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-349:+Priorities+for+Source+Topics
> > > >
> > > >
> > > > One question is are there use-cases for the proposed API.  If you
> think
> > > this would be useful and have use-cases in mind please reply with the
> > > use-cases.
> > > >
> > > > Its also possible to implement prioritization with the existing API
> by
> > > using a combination of pausing, resuming, and local buffering.  The
> > > question is then does it make sense to introduce the proposed
> > higher-level
> > > API to make this easier ?
> > > >
> > > > The responses will be used as input to determine if we move ahead
> with
> > > the proposal.  Thanks in advance for input.
> > > >
> > > > Cheers,
> > > > --
> > > >      Nick
> > > >
> > > > The contents of this email and any attachment are confidential to the
> > > intended recipient(s). If you are not an intended recipient: (i) do not
> > > use, disclose, distribute, copy or publish this email or its contents;
> > (ii)
> > > please contact the sender immediately; and (iii) delete this email. Our
> > > privacy policy is available here:
> > > https://origamienergy.com/privacy-policy/. Origami Energy Limited
> > > (company number 8619644); Origami Storage Limited (company number
> > 10436515)
> > > and OSSPV001 Limited (company number 10933403), each registered in
> > England
> > > and each with a registered office at: Ashcombe Court, Woolsack Way,
> > > Godalming, GU7 1LQ.
> > >
> > >
> >
> > --
> >
> > *Jeff Widman*
> > jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265)
> > <><
> >
>

Re: Prioritized Topics for Kafka

Posted by Subhash Sriram <su...@gmail.com>.

Use case: we process documents from a variety of sources. We want to
process some of these sources in a priority order, but we don’t want to
necessarily finish all the higher priority sources before going to lower
priority because the volume of higher priority sources can be extremely
high.

We have solved this problem right now with multiple topics and we consume
from them in priority order, pausing every N records, to see if there are
any records from a lower priority source.

That being said, our company also has a use case similar to what Jeff
described. I think priority topics would be a very valuable feature!

Thanks,
Subhash

On Thu, Jan 17, 2019 at 12:34 PM Jeff Widman <je...@jeffwidman.com> wrote:

> Use case:
> I work for a company that ingests events that come from both real-time
> sources (which spike during the day) and historical log data.
>
> We want the real-time data processed in minutes, and the historical log
> data processed within hours. The consumer's business logic is the same.
>
> Our current plan is to have two topics, and two downstream consumer groups.
> We plan to have the "hot" consumer group of the real-time data provisioned
> at the 90th percentile of inbound message rate. And the "cold" log data at
> the 60th percentile because it's okay if it takes longer to absorb spikes
> in cold data.
>
> Priority topics could *potentially* solve this.
>
> However, one problem we've hit with a similar priority queuing system built
> using a different tech stack was that if there was even a handful of
> messages in the priority queues, those would keep the consumer just busy
> enough that the cold data would never be processed.
>
> The underlying root cause of the problem was two-fold:
> 1) the API only returned messages from a single queue at a time, so even if
> the consumer requested 1,000 messages, the scheduler would see a message in
> the hot queue and immediately return it. By the time the consumer processed
> that and requested another batch of messages, one more message had trickled
> into the hot queue. Versus if the API made sure to return a full batch,
> first by filling up the hot queue and then from the cold queue, we could
> still get batch efficiency at the network / consumer / downstream DB call
> layers.
> 2) On the server side, switching between fetching messages for the
> different queues seemed to be expensive. I'm not sure if that was due to an
> inefficient scheduler, lack of memory, or poor I/O management. I suspect
> Kafka wouldn't hit this as long as the messages were present in the page
> cache, but it's just something to keep in mind--how this is implemented
> matters from a performance/starvation standpoint.
>
> So from a design standpoint, I think that means that for a priority
> queueing design to minimize starvation, the design criteria should probably
> be "returning messages based on priority, but be sure to also keep the
> consumer fully occupied"
>
> If done right, this would make our lives much easier operationally (only
> one consumer group to manage, not two) and make our consumer usage more
> efficient.
>
>
> On Thu, Jan 17, 2019 at 4:20 AM Tobias Adamson <to...@stargazer.com.sg>
> wrote:
>
> > Use cases: prioritise current data
> >
> > When processing messages sometimes there is a need to re process old
> data.
> > It would be nice to be abled to send the old data as messages to a
> > separate topic and that would only be processed when the current topic
> > doesn’t have any messages left to process.
> > This would prevent customers getting delays in current data processing
> due
> > to message processors being  busy processing old data.
> >
> >
> > > On 17 Jan 2019, at 7:55 PM, Tim Ward <ti...@origamienergy.com>
> wrote:
> > >
> > > Use cases: processing alerts.
> > >
> > > High priority alerts ("a large chunk of your system has stopped
> > providing service, immediate action essential") should be processed
> before
> > low priority alerts ("some minor component has put out a not-very serious
> > warning, somebody should probably have a look at it when they get
> bored"),
> > of which there could be a long queue.
> > >
> > > Urgent alerts (a phone call telling someone "you need to do this now")
> > should be processed before non-urgent alerts (a phone call telling
> someone
> > "FYI, such and such is going to happen in a couple of hours").
> > >
> > > Tim Ward
> > >
> > > -----Original Message-----
> > > From: nick@afshartous.com <ni...@afshartous.com>
> > > Sent: 17 January 2019 02:52
> > > To: users@kafka.apache.org
> > > Subject: Prioritized Topics for Kafka
> > >
> > >
> > >
> > > Hi all,
> > >
> > > On the dev list we’ve been discussing a proposed new feature
> > (prioritized topics).  In a nutshell, when consuming from a set of topics
> > with assigned priorities, consumption from lower-priority topics only
> > occurs if there’s no data flowing in from a higher-priority topic.
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics
> > <
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-349:+Priorities+for+Source+Topics
> > >
> > >
> > > One question is are there use-cases for the proposed API.  If you think
> > this would be useful and have use-cases in mind please reply with the
> > use-cases.
> > >
> > > Its also possible to implement prioritization with the existing API by
> > using a combination of pausing, resuming, and local buffering.  The
> > question is then does it make sense to introduce the proposed
> higher-level
> > API to make this easier ?
> > >
> > > The responses will be used as input to determine if we move ahead with
> > the proposal.  Thanks in advance for input.
> > >
> > > Cheers,
> > > --
> > >      Nick
> > >
> > > The contents of this email and any attachment are confidential to the
> > intended recipient(s). If you are not an intended recipient: (i) do not
> > use, disclose, distribute, copy or publish this email or its contents;
> (ii)
> > please contact the sender immediately; and (iii) delete this email. Our
> > privacy policy is available here:
> > https://origamienergy.com/privacy-policy/. Origami Energy Limited
> > (company number 8619644); Origami Storage Limited (company number
> 10436515)
> > and OSSPV001 Limited (company number 10933403), each registered in
> England
> > and each with a registered office at: Ashcombe Court, Woolsack Way,
> > Godalming, GU7 1LQ.
> >
> >
>
> --
>
> *Jeff Widman*
> jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265)
> <><
>

Re: Prioritized Topics for Kafka

Posted by Jeff Widman <je...@jeffwidman.com>.

Use case:
I work for a company that ingests events that come from both real-time
sources (which spike during the day) and historical log data.

We want the real-time data processed in minutes, and the historical log
data processed within hours. The consumer's business logic is the same.

Our current plan is to have two topics, and two downstream consumer groups.
We plan to have the "hot" consumer group of the real-time data provisioned
at the 90th percentile of inbound message rate. And the "cold" log data at
the 60th percentile because it's okay if it takes longer to absorb spikes
in cold data.

Priority topics could *potentially* solve this.

However, one problem we've hit with a similar priority queuing system built
using a different tech stack was that if there was even a handful of
messages in the priority queues, those would keep the consumer just busy
enough that the cold data would never be processed.

The underlying root cause of the problem was two-fold:
1) the API only returned messages from a single queue at a time, so even if
the consumer requested 1,000 messages, the scheduler would see a message in
the hot queue and immediately return it. By the time the consumer processed
that and requested another batch of messages, one more message had trickled
into the hot queue. Versus if the API made sure to return a full batch,
first by filling up the hot queue and then from the cold queue, we could
still get batch efficiency at the network / consumer / downstream DB call
layers.
2) On the server side, switching between fetching messages for the
different queues seemed to be expensive. I'm not sure if that was due to an
inefficient scheduler, lack of memory, or poor I/O management. I suspect
Kafka wouldn't hit this as long as the messages were present in the page
cache, but it's just something to keep in mind--how this is implemented
matters from a performance/starvation standpoint.

So from a design standpoint, I think that means that for a priority
queueing design to minimize starvation, the design criteria should probably
be "returning messages based on priority, but be sure to also keep the
consumer fully occupied"

If done right, this would make our lives much easier operationally (only
one consumer group to manage, not two) and make our consumer usage more
efficient.

On Thu, Jan 17, 2019 at 4:20 AM Tobias Adamson <to...@stargazer.com.sg>
wrote:

> Use cases: prioritise current data
>
> When processing messages sometimes there is a need to re process old data.
> It would be nice to be abled to send the old data as messages to a
> separate topic and that would only be processed when the current topic
> doesn’t have any messages left to process.
> This would prevent customers getting delays in current data processing due
> to message processors being  busy processing old data.
>
>
> > On 17 Jan 2019, at 7:55 PM, Tim Ward <ti...@origamienergy.com> wrote:
> >
> > Use cases: processing alerts.
> >
> > High priority alerts ("a large chunk of your system has stopped
> providing service, immediate action essential") should be processed before
> low priority alerts ("some minor component has put out a not-very serious
> warning, somebody should probably have a look at it when they get bored"),
> of which there could be a long queue.
> >
> > Urgent alerts (a phone call telling someone "you need to do this now")
> should be processed before non-urgent alerts (a phone call telling someone
> "FYI, such and such is going to happen in a couple of hours").
> >
> > Tim Ward
> >
> > -----Original Message-----
> > From: nick@afshartous.com <ni...@afshartous.com>
> > Sent: 17 January 2019 02:52
> > To: users@kafka.apache.org
> > Subject: Prioritized Topics for Kafka
> >
> >
> >
> > Hi all,
> >
> > On the dev list we’ve been discussing a proposed new feature
> (prioritized topics).  In a nutshell, when consuming from a set of topics
> with assigned priorities, consumption from lower-priority topics only
> occurs if there’s no data flowing in from a higher-priority topic.
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-349:+Priorities+for+Source+Topics
> >
> >
> > One question is are there use-cases for the proposed API.  If you think
> this would be useful and have use-cases in mind please reply with the
> use-cases.
> >
> > Its also possible to implement prioritization with the existing API by
> using a combination of pausing, resuming, and local buffering.  The
> question is then does it make sense to introduce the proposed higher-level
> API to make this easier ?
> >
> > The responses will be used as input to determine if we move ahead with
> the proposal.  Thanks in advance for input.
> >
> > Cheers,
> > --
> >      Nick
> >
> > The contents of this email and any attachment are confidential to the
> intended recipient(s). If you are not an intended recipient: (i) do not
> use, disclose, distribute, copy or publish this email or its contents; (ii)
> please contact the sender immediately; and (iii) delete this email. Our
> privacy policy is available here:
> https://origamienergy.com/privacy-policy/. Origami Energy Limited
> (company number 8619644); Origami Storage Limited (company number 10436515)
> and OSSPV001 Limited (company number 10933403), each registered in England
> and each with a registered office at: Ashcombe Court, Woolsack Way,
> Godalming, GU7 1LQ.
>
>

-- 

*Jeff Widman*
jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265)
<><

Re: Prioritized Topics for Kafka

Posted by Tobias Adamson <to...@stargazer.com.sg>.

Use cases: prioritise current data

When processing messages sometimes there is a need to re process old data.
It would be nice to be abled to send the old data as messages to a separate topic and that would only be processed when the current topic doesn’t have any messages left to process.
This would prevent customers getting delays in current data processing due to message processors being  busy processing old data.


> On 17 Jan 2019, at 7:55 PM, Tim Ward <ti...@origamienergy.com> wrote:
> 
> Use cases: processing alerts.
> 
> High priority alerts ("a large chunk of your system has stopped providing service, immediate action essential") should be processed before low priority alerts ("some minor component has put out a not-very serious warning, somebody should probably have a look at it when they get bored"), of which there could be a long queue.
> 
> Urgent alerts (a phone call telling someone "you need to do this now") should be processed before non-urgent alerts (a phone call telling someone "FYI, such and such is going to happen in a couple of hours").
> 
> Tim Ward
> 
> -----Original Message-----
> From: nick@afshartous.com <ni...@afshartous.com>
> Sent: 17 January 2019 02:52
> To: users@kafka.apache.org
> Subject: Prioritized Topics for Kafka
> 
> 
> 
> Hi all,
> 
> On the dev list we’ve been discussing a proposed new feature (prioritized topics).  In a nutshell, when consuming from a set of topics with assigned priorities, consumption from lower-priority topics only occurs if there’s no data flowing in from a higher-priority topic.
> 
>  https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics <https://cwiki.apache.org/confluence/display/KAFKA/KIP-349:+Priorities+for+Source+Topics>
> 
> One question is are there use-cases for the proposed API.  If you think this would be useful and have use-cases in mind please reply with the use-cases.
> 
> Its also possible to implement prioritization with the existing API by using a combination of pausing, resuming, and local buffering.  The question is then does it make sense to introduce the proposed higher-level API to make this easier ?
> 
> The responses will be used as input to determine if we move ahead with the proposal.  Thanks in advance for input.
> 
> Cheers,
> --
>      Nick
> 
> The contents of this email and any attachment are confidential to the intended recipient(s). If you are not an intended recipient: (i) do not use, disclose, distribute, copy or publish this email or its contents; (ii) please contact the sender immediately; and (iii) delete this email. Our privacy policy is available here: https://origamienergy.com/privacy-policy/. Origami Energy Limited (company number 8619644); Origami Storage Limited (company number 10436515) and OSSPV001 Limited (company number 10933403), each registered in England and each with a registered office at: Ashcombe Court, Woolsack Way, Godalming, GU7 1LQ.

RE: Prioritized Topics for Kafka

Posted by Tim Ward <ti...@origamienergy.com>.

Use cases: processing alerts.

High priority alerts ("a large chunk of your system has stopped providing service, immediate action essential") should be processed before low priority alerts ("some minor component has put out a not-very serious warning, somebody should probably have a look at it when they get bored"), of which there could be a long queue.

Urgent alerts (a phone call telling someone "you need to do this now") should be processed before non-urgent alerts (a phone call telling someone "FYI, such and such is going to happen in a couple of hours").

Tim Ward

-----Original Message-----
From: nick@afshartous.com <ni...@afshartous.com>
Sent: 17 January 2019 02:52
To: users@kafka.apache.org
Subject: Prioritized Topics for Kafka



Hi all,

On the dev list we’ve been discussing a proposed new feature (prioritized topics).  In a nutshell, when consuming from a set of topics with assigned priorities, consumption from lower-priority topics only occurs if there’s no data flowing in from a higher-priority topic.

  https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics <https://cwiki.apache.org/confluence/display/KAFKA/KIP-349:+Priorities+for+Source+Topics>

One question is are there use-cases for the proposed API.  If you think this would be useful and have use-cases in mind please reply with the use-cases.

Its also possible to implement prioritization with the existing API by using a combination of pausing, resuming, and local buffering.  The question is then does it make sense to introduce the proposed higher-level API to make this easier ?

The responses will be used as input to determine if we move ahead with the proposal.  Thanks in advance for input.

Cheers,
--
      Nick

The contents of this email and any attachment are confidential to the intended recipient(s). If you are not an intended recipient: (i) do not use, disclose, distribute, copy or publish this email or its contents; (ii) please contact the sender immediately; and (iii) delete this email. Our privacy policy is available here: https://origamienergy.com/privacy-policy/. Origami Energy Limited (company number 8619644); Origami Storage Limited (company number 10436515) and OSSPV001 Limited (company number 10933403), each registered in England and each with a registered office at: Ashcombe Court, Woolsack Way, Godalming, GU7 1LQ.

Re: Prioritized Topics for Kafka

Posted by ni...@afshartous.com.


> On Jan 16, 2019, at 9:51 PM, nick@afshartous.com wrote:
> 
> Hi all,
> 
> On the dev list we’ve been discussing a proposed new feature (prioritized topics).  In a nutshell, when consuming from a set of topics with assigned priorities, consumption from lower-priority topics only occurs if there’s no data flowing in from a higher-priority topic.  
> 
>  https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics <https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics><https://cwiki.apache.org/confluence/display/KAFKA/KIP-349:+Priorities+for+Source+Topics <https://cwiki.apache.org/confluence/display/KAFKA/KIP-349:+Priorities+for+Source+Topics>>
> 
> One question is are there use-cases for the proposed API.  If you think this would be useful and have use-cases in mind please reply with the use-cases.
> 
> Its also possible to implement prioritization with the existing API by using a combination of pausing, resuming, and local buffering.  The question is then does it make sense to introduce the proposed higher-level API to make this easier ?
> 
> The responses will be used as input to determine if we move ahead with the proposal.  Thanks in advance for input.  


Hi all,

Thanks for everyone’s input.  After a very long discussion on the dev list we’re not moving forward with KIP-349.  Some felt that this feature could be achieved by using existing capabilities of the current consumer API.  See the thread on the dev list (with KIP-349 in subject heading) for more details.

Cheers,
--
      Nick