You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pulsar.apache.org by Matteo Merli <mm...@apache.org> on 2019/01/17 06:58:01 UTC

[DISCUSSION] Delayed message delivery

After a long delay (no pun intended), I finally got through the
previous discussions around the delayed message delivery proposals.
I'm referring to PIP-26
https://github.com/apache/pulsar/wiki/PIP-26%3A-Delayed-Message-Delivery
and the Pull Request at #3155
https://github.com/apache/pulsar/pull/3155

To summarize these proposals (correct me if I'm getting any point wrong):

 * PIP-26
    - Producer sets arbitrary timeout on each message
    - Broker keeps a hash-wheel timer (backed by ledger) to keep track
of messages for which the dispatch has to be deferred

 * PR #3155
   - Consumer specify a fixed time delay to consume messages
   - Broker will defer delivery by that time

As I stated previously, we should try to avoid adding complexity in
the broker dispatching code, unless there's a clear benefit compared
to do the same operation in client library.

After discussing with Ivan, I wanted to share this alternative approach.

Key points:
  * Application set arbitrary timeout on each message
  * Broker is unchanged
  * Consumer (in client library) will make these messages visible to
application after delay has expired

Implementation notes:
  * Producer change is trivial. We just need to add new field in
message metadata (similar as described in PIP-26)
  * On consumer side, the following will happen:
     - Messages get added to receiverQueue
     - When application calls receive, we might get from the queue a
message with delay.
     - This message is not passed to application. Rather insert the
message ID into a priority queue (or equivalent structure), ordered by
target time.
     - At this point messages are not added to the ack-timeout tracker
     - Periodically, we check the head of the priority queue to see if
there's anything ready
        - If so, we request the broker to "redeliver" these messages,
using the same mechanism as ack-timeout:
CommandRedeliverUnacknowledgedMessages
          with a list of message ids)

This approach will ensure:
 * We can support arbitrary delays
 * No changes and no overhead in broker - No need to configure
policies for delay activation
 * Works well with existing flow control mechanism: messages are
dequeued so that we can process messages with smaller delays
 * Amount of memory required in client side is limited.
     - We just keep message ids (we could consider caching few
messages as well, as an optimization)
     - Broker has a limit of unacked messages pushed to a consumer
(default 50K). I don't expect this being a particular problem.
       If there a lot of messages with big differences in the delay
value, the worst case would be that the applied delay to be higher
       for some of the messages.

Any thoughts on this?

--
Matteo Merli
<mm...@apache.org>

Re: [DISCUSSION] Delayed message delivery

Posted by Matteo Merli <mm...@apache.org>.

On Tue, Apr 16, 2019 at 8:08 PM Ezequiel Lovelle
<ez...@gmail.com> wrote:
>
> Hi Matteo!
>
> Great work! Really neat and clear, I like it!
>
> My 2 cents, I prefer adding deliverAt() and deliverAfter() on
> ProducerBuidler rather than TypedMessageBuilder.
> That would result in a more limited version because delay will be the
> same for all the messages, but I think it covers most of the cases.
>
> I consider having a delay per-message is quite ambitious because could
> lead to a very compromising situation, e.g: producing messages with a
> very wide range of delay.

Having per-producer delay won't simplify the implementation since a
single topic could have producers with different delays.

In my view, having arbitrary delays on messages is the only compelling
reason for implementing the feature in Pulsar compared to leaving the
implementation to the applications.

We have several real-world use cases that need this kind of feature. This
implementation aims to work well and be efficient even when the delays are
very different.

> If this makes sense to you, it also offers the opportunity to have
> its counterpart methods receiveAt() and receiveAfter() in ConsumerBuilder
> and it also covers all the spectrum of *fixed delay messages* at both
> sides.

How would this work for receiveAt() since consumer will not know which message
is about to receive and at what time it was published.
Also, for receiveAfter(), how would that work when producer already
set a different
delay or if different consumers (on same subscription) are requesting
different delays?
Can you make an example of use case that would need a consumer driven
fixed delay
and cannot use the producer arbitrarily delay feature?


>
> Thanks!
> --
> *Ezequiel Lovelle*
>
>
> On Tue, 16 Apr 2019 at 21:26, Matteo Merli <ma...@gmail.com> wrote:
>
> > Thanks everyone for the feedback.
> >
> > I actually went through and gave it a shot at implementing this on
> > https://github.com/apache/pulsar/pull/4062
> >
> > I think this implementation should address all the concern exposed in
> > this thread.
> > Please everyone involved take a deep review of the change.
> >
> >
> > Thanks,
> > Matteo
> >
> > --
> > Matteo Merli
> > <ma...@gmail.com>
> >
> > On Thu, Mar 14, 2019 at 8:10 PM Sijie Guo <gu...@gmail.com> wrote:
> > >
> > > On Fri, Mar 8, 2019 at 11:11 PM Ezequiel Lovelle <
> > ezequiellovelle@gmail.com>
> > > wrote:
> > >
> > > > > Seems like we are implementing per message timers.
> > > >
> > > > As per pr #3155 <https://github.com/apache/pulsar/pull/3155>, nope.
> > Each
> > > > message won't have a Timer class per se,
> > > > just a long field representing its expiration deadline and will be
> > > > just one, and only one, scheduled task per consumer at any given time.
> > > >
> > > > > Seems simpler to just have delay on a topic level.
> > > >
> > > > I think complexity would be very similar on both sides
> > (producer/consumer)
> > > > An important aspect here would be the decision to provide this feature
> > > > (delay messages on consumer) separately from the producer, hence, the
> > > > consumer
> > > > can make the decision to 'delay' all messages regardless of the
> > producer.
> > > >
> > > > > if we are able to find a way to plug a new "fixed delay" dispatcher
> > > > without touching other dispatcher logic, is that a good approach for
> > the
> > > > community to proceed on this direction?
> > > >
> > > > Great question! I like this path.
> > > >
> > > > One solution that I think of is something similar of what Mateo did
> > here:
> > > > https://github.com/apache/pulsar/pull/3615
> > > >
> > > > So, we can have a separated class handling consumers with delay
> > extending
> > > > normal consumer base. The problem with this approach would be in the
> > > > feature
> > > > if we want to have consumers with multiple behaviour.
> > > >
> > > > e.g. delayed consumer plus some future feature not present right now.
> > > >
> > >
> > >
> > >
> > >
> > > >
> > > > Anyway, if everyone agrees with Sijie question, we might discuss this
> > on a
> > > > separated thread.
> > > >
> > >
> > > It seems that there are no objections. So we can probably move forward
> > with
> > > the idea of having
> > > a separate dispatch for fixed delayed subscription. This would isolate
> > the
> > > impacts of modifying existing dispatchers.
> > >
> > >
> > > >
> > > > --
> > > > *Ezequiel Lovelle*
> > > >
> > > >
> > > > On Sat, 2 Mar 2019 at 08:45, Ali Ahmed <ah...@gmail.com> wrote:
> > > >
> > > > > Seems like we are implementing per message timers.
> > > > >
> > > > > Not aware of any log pub sub that does that expect rocketmq , not
> > sure
> > > > how
> > > > > performant that is.
> > > > >
> > > > >
> > > >
> > https://github.com/apache/rocketmq/blob/2b692c912d18c0f9889fd73358581bcccf37bbbe/store/src/main/java/org/apache/rocketmq/store/schedule/ScheduleMessageService.java
> > > > >
> > > > > Seems simpler to just have delay on a topic level.  The cursor for
> > client
> > > > > subscriptions can make messages available after a delay.
> > > > > I don't know if we can achieve significant throughput with so many
> > active
> > > > > timers.
> > > > >
> > > > > On Sat, Mar 2, 2019 at 2:49 AM Sijie Guo <gu...@gmail.com> wrote:
> > > > >
> > > > > > I am trying to draw a conclusion on this email thread.
> > > > > >
> > > > > > > Maybe some way to plug to the broker some logic without
> > > > > > interfering with its core?
> > > > > > >  In our business fixed delay at consumer level regardless of any
> > > > > producer
> > > > > > > configuration is a big win due to easy implementation and usage.
> > > > > >
> > > > > > Based on Ezequiel's last comment, if we are able to find a way to
> > plug
> > > > a
> > > > > > new "fixed delay" dispatcher without touching other dispatcher
> > logic,
> > > > is
> > > > > > that a good approach for the community to proceed on this
> > direction?
> > > > > >
> > > > > > - Sijie
> > > > > >
> > > > > >
> > > > > > On Wed, Feb 20, 2019 at 8:26 AM 李鹏辉gmail <co...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Sorry for hear that DLQ causes GC.
> > > > > > >
> > > > > > > Agree with discussed before, Dispatcher is a performance
> > sensitive
> > > > > piece
> > > > > > > of code.
> > > > > > > If we make changes on the dispatcher, we must pay attention to
> > memory
> > > > > > > overhead and blocking.
> > > > > > >
> > > > > > > I prefer fixed delayed message solution(aka delayed time level).
> > User
> > > > > > > can define multi topics with deferent delay.Topic is still a FIFO
> > > > > model.
> > > > > > >
> > > > > > > Improve user experience by packaging client API, topics can be
> > > > created
> > > > > > > automatically, User can customize the delay level.
> > > > > > >
> > > > > > > In our scene, This can already meet most of the needs. Currently
> > > > > depends
> > > > > > > on DLQ feature. We know from the user where the experience is not
> > > > very
> > > > > > > good.
> > > > > > > User need to maintain the message expired.
> > > > > > >
> > > > > > > So, If we can avoid complexity of use and do not impose a
> > performance
> > > > > > > burden
> > > > > > > on message dispatching. I prefer implement it on broker
> > side(broker
> > > > do
> > > > > > not
> > > > > > > need to sorting messages by time, just need to check the tail
> > message
> > > > > > > can be dispatch, i don’t think this will cause dispatching
> > > > performance
> > > > > > > problem).
> > > > > > >
> > > > > > > For more complicated delayed messages(e.g. arbitrary delayed
> > > > delivery).
> > > > > > > I don’t think pulsar need to support such complicated
> > scene(after we
> > > > > > > discussed before).
> > > > > > > In our scene, we have more complicated message requirement(e.g.
> > delay
> > > > > > > message can be
> > > > > > > paused, stoped, and re-run. e.g. cron messages).
> > > > > > >
> > > > > > > However these case is not very widely used.
> > > > > > >
> > > > > > > - Penghui
> > > > > > >
> > > > > > >
> > > > > > > > 在 2019年2月20日，06:37，Sebastián Schepens
> > > > > > > <se...@mercadolibre.com.INVALID> 写道：
> > > > > > > >
> > > > > > > > Hi,
> > > > > > > > I am really not into any details of the proposed
> > implementation,
> > > > but
> > > > > > was
> > > > > > > > just wondering, has anyone had a look at how Uber implemented
> > this
> > > > in
> > > > > > > > Cherami? Cherami seems very similar to Pulsar, its storage
> > system
> > > > > also
> > > > > > > > seems very similar to bookkeeper. They seem to implement
> > delayed
> > > > > queues
> > > > > > > by
> > > > > > > > storing the time as part of the key in rocksdb and using sorted
> > > > > > > iterators,
> > > > > > > > could this be done in Pulsar as well?
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Sebastian
> > > > > > > >
> > > > > > > > On Tue, Feb 19, 2019 at 6:02 PM Dave Fisher <
> > dave2wave@comcast.net
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > >> Hi -
> > > > > > > >>
> > > > > > > >> Well, it does, but can this be implemented without building a
> > > > > > > delayQueue?
> > > > > > > >> It seems to me that a delayQueue both breaks resiliency if the
> > > > > broker
> > > > > > > goes
> > > > > > > >> down and would certainly add overhead. Perhaps my idea to
> > discard
> > > > > > > responses
> > > > > > > >> that are too new and then retrieve once they are out of the
> > > > delayed
> > > > > > > >> timeframe would be simpler?
> > > > > > > >>
> > > > > > > >> Again I am somewhat naive to the details. I’m not sure that
> > the
> > > > path
> > > > > > > >> through the code is kept to an absolute minimum when you have
> > a
> > > > > > Consumer
> > > > > > > >> with a nonzero delay?
> > > > > > > >>
> > > > > > > >> Regards,
> > > > > > > >> Dave
> > > > > > > >>
> > > > > > > >>> On Feb 19, 2019, at 12:39 PM, Ezequiel Lovelle <
> > > > > > > >> ezequiellovelle@gmail.com> wrote:
> > > > > > > >>>
> > > > > > > >>> Hi Dave!
> > > > > > > >>>
> > > > > > > >>>> I wonder if clients can add an optional argument to the
> > broker
> > > > > call
> > > > > > > when
> > > > > > > >>> pulling events. The argument would be the amount of delay.
> > Any
> > > > > > messages
> > > > > > > >>> younger than the delay are not returned by the broker.
> > > > > > > >>>
> > > > > > > >>> This is exactly what
> > https://github.com/apache/pulsar/pull/3155
> > > > > does
> > > > > > > :).
> > > > > > > >>> We still need to decide if we want to add this feature at
> > client
> > > > > side
> > > > > > > or
> > > > > > > >>> broker side, the pull request does it on the broker.
> > > > > > > >>>
> > > > > > > >>> --
> > > > > > > >>> *Ezequiel Lovelle*
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>> On Tue, 19 Feb 2019 at 17:06, Dave Fisher <
> > dave2wave@comcast.net
> > > > >
> > > > > > > wrote:
> > > > > > > >>>
> > > > > > > >>>> Hi -
> > > > > > > >>>>
> > > > > > > >>>> My thoughts here may be completely useless but I wonder if
> > > > clients
> > > > > > can
> > > > > > > >> add
> > > > > > > >>>> an optional argument to the broker call when pulling
> > events. The
> > > > > > > >> argument
> > > > > > > >>>> would be the amount of delay. Any messages younger than the
> > > > delay
> > > > > > are
> > > > > > > >> not
> > > > > > > >>>> returned by the broker.
> > > > > > > >>>>
> > > > > > > >>>> Regards,
> > > > > > > >>>> Dave
> > > > > > > >>>>
> > > > > > > >>>>> On Feb 19, 2019, at 11:47 AM, Ezequiel Lovelle <
> > > > > > > >>>> ezequiellovelle@gmail.com> wrote:
> > > > > > > >>>>>
> > > > > > > >>>>>> The recent changes made to support DLQ caused major
> > problems
> > > > > with
> > > > > > > >>>> garbage
> > > > > > > >>>>> collection
> > > > > > > >>>>>
> > > > > > > >>>>> If garbage collection is a big concern maybe we could add
> > some
> > > > > > config
> > > > > > > >>>>> parameter on the broker to disable the usage of this
> > feature
> > > > and
> > > > > > > return
> > > > > > > >>>>> BrokerMetadataException in this situation, giving the
> > power to
> > > > > the
> > > > > > > >>>>> administrator whether to offer this feature or not.
> > > > > > > >>>>>
> > > > > > > >>>>>> is it acceptable to do it at broker side?
> > > > > > > >>>>>
> > > > > > > >>>>> I think this is the big question that needs to be answered.
> > > > > > > >>>>>
> > > > > > > >>>>>> can we just have a separated dispatcher for fixed delayed
> > > > > > > >> subscription?
> > > > > > > >>>>>
> > > > > > > >>>>> I will try to do a completely new approach, simpler, and
> > more
> > > > > > > isolated
> > > > > > > >>>>> from broker logic. Maybe some way to plug to the broker
> > some
> > > > > logic
> > > > > > > >>>> without
> > > > > > > >>>>> interfering with its core?
> > > > > > > >>>>>
> > > > > > > >>>>> In our business fixed delay at consumer level regardless
> > of any
> > > > > > > >> producer
> > > > > > > >>>>> configuration is a big win due to easy implementation and
> > > > usage.
> > > > > > > >>>>>
> > > > > > > >>>>> --
> > > > > > > >>>>> *Ezequiel Lovelle*
> > > > > > > >>>>>
> > > > > > > >>>>>
> > > > > > > >>>>> On Wed, 13 Feb 2019 at 23:25, Sijie Guo <
> > guosijie@gmail.com>
> > > > > > wrote:
> > > > > > > >>>>>
> > > > > > > >>>>>> Agreed that dispatcher is a performance sensitive piece of
> > > > code.
> > > > > > > Feel
> > > > > > > >>>> bad
> > > > > > > >>>>>> to hear that DLQ causes GC. Are there any issues tracking
> > > > those
> > > > > > > items
> > > > > > > >>>> you
> > > > > > > >>>>>> guys identified with DLQ changes?
> > > > > > > >>>>>>
> > > > > > > >>>>>>> How is this different from a subscription running behind?
> > > > > > > >>>>>>
> > > > > > > >>>>>> As far as I understand form the discussion at #3155, I
> > don't
> > > > > think
> > > > > > > >>>> there is
> > > > > > > >>>>>> a fundamental difference from a backlogged subscriber.
> > > > > > > >>>>>> The discussion point will mainly be - if a delayed
> > > > subscription
> > > > > > can
> > > > > > > be
> > > > > > > >>>>>> implemented with a simpler approach at broker side without
> > > > > > changing
> > > > > > > >>>> other
> > > > > > > >>>>>> dispatcher logic,
> > > > > > > >>>>>> is it acceptable to do it at broker side? So we don't
> > have to
> > > > > > > >>>> reimplement
> > > > > > > >>>>>> the same mechanism at different language clients. I think
> > > > that's
> > > > > > the
> > > > > > > >>>> same
> > > > > > > >>>>>> tradeoff we were discussing for generic delayed messages.
> > > > > > > >>>>>>
> > > > > > > >>>>>> My thought would be - can we just have a separated
> > dispatcher
> > > > > for
> > > > > > > >> fixed
> > > > > > > >>>>>> delayed subscription? The logic can be ISOLATED from other
> > > > > normal
> > > > > > > >>>>>> dispatchers. if users don't enable delayed subscription,
> > they
> > > > > will
> > > > > > > not
> > > > > > > >>>>>> exercise that dispatcher. This can be a good direction to
> > > > > explore
> > > > > > > for
> > > > > > > >>>>>> future changes that are related to dispatchers.
> > > > > > > >>>>>>
> > > > > > > >>>>>> - Sijie
> > > > > > > >>>>>>
> > > > > > > >>>>>>
> > > > > > > >>>>>> On Thu, Feb 14, 2019 at 8:43 AM Joe F <
> > joefrancisk@gmail.com>
> > > > > > > wrote:
> > > > > > > >>>>>>
> > > > > > > >>>>>>> Delayed subscription is simpler, and probably worth
> > doing in
> > > > > the
> > > > > > > >> broker
> > > > > > > >>>>>> IF
> > > > > > > >>>>>>> done right.
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> How is this different from a subscription running behind?
> > > > Why
> > > > > > does
> > > > > > > >>>>>>> supporting that require this complex a change in the
> > > > > dispatcher,
> > > > > > > when
> > > > > > > >>>> we
> > > > > > > >>>>>>> already support backlogged subscribers?
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> I am extremely wary of changes in the dispatcher. The
> > recent
> > > > > > > changes
> > > > > > > >>>> made
> > > > > > > >>>>>>> to support DLQ caused major problems with garbage
> > collection,
> > > > > > > broker
> > > > > > > >>>>>>> failure  and service interruptions for us. Even though
> > we ARE
> > > > > NOT
> > > > > > > >> using
> > > > > > > >>>>>> the
> > > > > > > >>>>>>> DLQ feature. Not a pleasant experience.
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> This is a very performance sensitive piece of code, and
> > it
> > > > > should
> > > > > > > be
> > > > > > > >>>>>>> treated as such.
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> Joe
> > > > > > > >>>>>>>
> > > > > > > >>>>>>>
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> On Wed, Feb 13, 2019 at 3:58 PM Sijie Guo <
> > > > guosijie@gmail.com>
> > > > > > > >> wrote:
> > > > > > > >>>>>>>
> > > > > > > >>>>>>>> Hi all,
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>> I am going to wrap up the discussion regarding delayed
> > > > > delivery
> > > > > > > use
> > > > > > > >>>>>>> cases.
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>> For arbitrary delayed delivery, there are a few +1s to
> > doing
> > > > > > > PIP-26
> > > > > > > >> in
> > > > > > > >>>>>>>> functions. I am assuming that we will go down this path,
> > > > > unless
> > > > > > > >> there
> > > > > > > >>>>>> are
> > > > > > > >>>>>>>> other proposals.
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>> However there is a use case Lovelle pointed out about
> > "Fixed
> > > > > > > Delayed
> > > > > > > >>>>>>>> Message". More specifically it is
> > > > > > > >>>>>>>> https://github.com/apache/pulsar/pull/3155
> > > > > > > >>>>>>>> (The caption in #3155 is a bit misleading). IMO it is a
> > > > > "delayed
> > > > > > > >>>>>>>> subscription", basically all messages in the
> > subscription is
> > > > > > > delayed
> > > > > > > >>>> to
> > > > > > > >>>>>>>> dispatch in a given time interval. The consensus of this
> > > > > feature
> > > > > > > is
> > > > > > > >>>> not
> > > > > > > >>>>>>> yet
> > > > > > > >>>>>>>> achieved. Basically, there will be two approaches for
> > this:
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>> a) DONT treat "fixed delayed message" as a different
> > case.
> > > > > Just
> > > > > > > use
> > > > > > > >>>> the
> > > > > > > >>>>>>>> same approach as in PIP-26.
> > > > > > > >>>>>>>> b) treat "fixed delayed message" as a different case,
> > e.g.
> > > > we
> > > > > > can
> > > > > > > >>>>>> better
> > > > > > > >>>>>>>> call it "delayed subscription" or whatever can
> > distinguish
> > > > it
> > > > > > from
> > > > > > > >>>>>>> general
> > > > > > > >>>>>>>> arbitrary delayed delivery. Use the approach
> > > > > proposed/discussed
> > > > > > in
> > > > > > > >>>>>> #3155.
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>> I would like the community to discuss this and also
> > come to
> > > > an
> > > > > > > >>>>>> agreement.
> > > > > > > >>>>>>>> So Lovelle can move forward with the approach agreed by
> > the
> > > > > > > >> community.
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>> Thanks,
> > > > > > > >>>>>>>> Sijie
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>> On Tue, Jan 29, 2019 at 6:30 AM Ezequiel Lovelle <
> > > > > > > >>>>>>>> ezequiellovelle@gmail.com>
> > > > > > > >>>>>>>> wrote:
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>>> "I agree, but that is *not what #3155 tries to
> > achieve."
> > > > > > > >>>>>>>>>
> > > > > > > >>>>>>>>> This typo made this phrase nonsense, sorry!
> > > > > > > >>>>>>>>>
> > > > > > > >>>>>>>>> On Mon, 28 Jan 2019, 16:44 Ezequiel Lovelle <
> > > > > > > >>>>>> ezequiellovelle@gmail.com
> > > > > > > >>>>>>>>> wrote:
> > > > > > > >>>>>>>>>
> > > > > > > >>>>>>>>>>> What exactly is the delayed delivery use case?
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>> This is helpful on systems relaying on pulsar for
> > > > persistent
> > > > > > > >>>>>>> guarantees
> > > > > > > >>>>>>>>>> and using it for synchronization or some sort of
> > checks,
> > > > but
> > > > > > on
> > > > > > > >>>>>> such
> > > > > > > >>>>>>>>>> systems is common to have some overhead committing
> > data on
> > > > > > > >>>>>> persistent
> > > > > > > >>>>>>>>>> storage maybe due to buffered mechanism or
> > distributing
> > > > the
> > > > > > data
> > > > > > > >>>>>>> across
> > > > > > > >>>>>>>>>> the network before being available.
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>> Surely would be more use cases I don't came across
> > right
> > > > > now.
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>>> Random insertion and deletion is not what FIFO queues
> > > > like
> > > > > > > Pulsar
> > > > > > > >>>>>>> are
> > > > > > > >>>>>>>>>> designed for.
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>> I agree, but that is now what #3155 tries to achieve.
> > > > #3155
> > > > > is
> > > > > > > >>>>>> just a
> > > > > > > >>>>>>>>>> fixed delay for all message in a consumer, that's the
> > > > reason
> > > > > > > that
> > > > > > > >>>>>> the
> > > > > > > >>>>>>>>>> implementation of #3155 is quite trivial.
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>> +1 from me for doing PIP-26 in functions.
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>> --
> > > > > > > >>>>>>>>>> *Ezequiel Lovelle*
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>> On Sat, 26 Jan 2019 at 09:57, Yuva raj <
> > uvaraj6@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>>> Considering the way pulsar is built +1 for doing
> > PIP-26
> > > > in
> > > > > > > >>>>>>> functions.
> > > > > > > >>>>>>>> I
> > > > > > > >>>>>>>>> am
> > > > > > > >>>>>>>>>>> more of thinking in a way like publish it pulsar we
> > will
> > > > > make
> > > > > > > it
> > > > > > > >>>>>>>>> available
> > > > > > > >>>>>>>>>>> in a different queuing system if you need priority
> > and
> > > > > delay
> > > > > > > >>>>>>> messages
> > > > > > > >>>>>>>>>>> support. Pulsar functions would go enough for this
> > kind
> > > > of
> > > > > > use
> > > > > > > >>>>>>> cases.
> > > > > > > >>>>>>>>>>>
> > > > > > > >>>>>>>>>>> On Fri, 25 Jan 2019 at 22:29, Ivan Kelly <
> > > > ivank@apache.org
> > > > > >
> > > > > > > >>>>>> wrote:
> > > > > > > >>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> Correct. PIP-26 can be implemented in Functions. I
> > > > > believe
> > > > > > > the
> > > > > > > >>>>>>>> last
> > > > > > > >>>>>>>>>>>>> discussion in PIP-26 thread kind of agree on
> > functions
> > > > > > > >>>>>> approach.
> > > > > > > >>>>>>>>>>>>> If the community is okay with PIP-26 in functions,
> > I
> > > > > think
> > > > > > > >>>>>> that
> > > > > > > >>>>>>> is
> > > > > > > >>>>>>>>>>>> probably
> > > > > > > >>>>>>>>>>>>> a good approach to start.
> > > > > > > >>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>> +1 for doing it in functions.
> > > > > > > >>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>> -Ivan
> > > > > > > >>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>
> > > > > > > >>>>>>>>>>>
> > > > > > > >>>>>>>>>>> --
> > > > > > > >>>>>>>>>>> *Thanks*
> > > > > > > >>>>>>>>>>>
> > > > > > > >>>>>>>>>>> *Yuvaraj L*
> > > > > > > >>>>>>>>>>>
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>
> > > > > > > >>>>>>
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>
> > > > > > > >>
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -Ali
> > > > >
> > > >
> >

Re: [DISCUSSION] Delayed message delivery

Posted by Ezequiel Lovelle <ez...@gmail.com>.

Hi Matteo!

Great work! Really neat and clear, I like it!

My 2 cents, I prefer adding deliverAt() and deliverAfter() on
ProducerBuidler rather than TypedMessageBuilder.
That would result in a more limited version because delay will be the
same for all the messages, but I think it covers most of the cases.

I consider having a delay per-message is quite ambitious because could
lead to a very compromising situation, e.g: producing messages with a
very wide range of delay.

If this makes sense to you, it also offers the opportunity to have
its counterpart methods receiveAt() and receiveAfter() in ConsumerBuilder
and it also covers all the spectrum of *fixed delay messages* at both
sides.

Thanks!
--
*Ezequiel Lovelle*


On Tue, 16 Apr 2019 at 21:26, Matteo Merli <ma...@gmail.com> wrote:

> Thanks everyone for the feedback.
>
> I actually went through and gave it a shot at implementing this on
> https://github.com/apache/pulsar/pull/4062
>
> I think this implementation should address all the concern exposed in
> this thread.
> Please everyone involved take a deep review of the change.
>
>
> Thanks,
> Matteo
>
> --
> Matteo Merli
> <ma...@gmail.com>
>
> On Thu, Mar 14, 2019 at 8:10 PM Sijie Guo <gu...@gmail.com> wrote:
> >
> > On Fri, Mar 8, 2019 at 11:11 PM Ezequiel Lovelle <
> ezequiellovelle@gmail.com>
> > wrote:
> >
> > > > Seems like we are implementing per message timers.
> > >
> > > As per pr #3155 <https://github.com/apache/pulsar/pull/3155>, nope.
> Each
> > > message won't have a Timer class per se,
> > > just a long field representing its expiration deadline and will be
> > > just one, and only one, scheduled task per consumer at any given time.
> > >
> > > > Seems simpler to just have delay on a topic level.
> > >
> > > I think complexity would be very similar on both sides
> (producer/consumer)
> > > An important aspect here would be the decision to provide this feature
> > > (delay messages on consumer) separately from the producer, hence, the
> > > consumer
> > > can make the decision to 'delay' all messages regardless of the
> producer.
> > >
> > > > if we are able to find a way to plug a new "fixed delay" dispatcher
> > > without touching other dispatcher logic, is that a good approach for
> the
> > > community to proceed on this direction?
> > >
> > > Great question! I like this path.
> > >
> > > One solution that I think of is something similar of what Mateo did
> here:
> > > https://github.com/apache/pulsar/pull/3615
> > >
> > > So, we can have a separated class handling consumers with delay
> extending
> > > normal consumer base. The problem with this approach would be in the
> > > feature
> > > if we want to have consumers with multiple behaviour.
> > >
> > > e.g. delayed consumer plus some future feature not present right now.
> > >
> >
> >
> >
> >
> > >
> > > Anyway, if everyone agrees with Sijie question, we might discuss this
> on a
> > > separated thread.
> > >
> >
> > It seems that there are no objections. So we can probably move forward
> with
> > the idea of having
> > a separate dispatch for fixed delayed subscription. This would isolate
> the
> > impacts of modifying existing dispatchers.
> >
> >
> > >
> > > --
> > > *Ezequiel Lovelle*
> > >
> > >
> > > On Sat, 2 Mar 2019 at 08:45, Ali Ahmed <ah...@gmail.com> wrote:
> > >
> > > > Seems like we are implementing per message timers.
> > > >
> > > > Not aware of any log pub sub that does that expect rocketmq , not
> sure
> > > how
> > > > performant that is.
> > > >
> > > >
> > >
> https://github.com/apache/rocketmq/blob/2b692c912d18c0f9889fd73358581bcccf37bbbe/store/src/main/java/org/apache/rocketmq/store/schedule/ScheduleMessageService.java
> > > >
> > > > Seems simpler to just have delay on a topic level.  The cursor for
> client
> > > > subscriptions can make messages available after a delay.
> > > > I don't know if we can achieve significant throughput with so many
> active
> > > > timers.
> > > >
> > > > On Sat, Mar 2, 2019 at 2:49 AM Sijie Guo <gu...@gmail.com> wrote:
> > > >
> > > > > I am trying to draw a conclusion on this email thread.
> > > > >
> > > > > > Maybe some way to plug to the broker some logic without
> > > > > interfering with its core?
> > > > > >  In our business fixed delay at consumer level regardless of any
> > > > producer
> > > > > > configuration is a big win due to easy implementation and usage.
> > > > >
> > > > > Based on Ezequiel's last comment, if we are able to find a way to
> plug
> > > a
> > > > > new "fixed delay" dispatcher without touching other dispatcher
> logic,
> > > is
> > > > > that a good approach for the community to proceed on this
> direction?
> > > > >
> > > > > - Sijie
> > > > >
> > > > >
> > > > > On Wed, Feb 20, 2019 at 8:26 AM 李鹏辉gmail <co...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Sorry for hear that DLQ causes GC.
> > > > > >
> > > > > > Agree with discussed before, Dispatcher is a performance
> sensitive
> > > > piece
> > > > > > of code.
> > > > > > If we make changes on the dispatcher, we must pay attention to
> memory
> > > > > > overhead and blocking.
> > > > > >
> > > > > > I prefer fixed delayed message solution(aka delayed time level).
> User
> > > > > > can define multi topics with deferent delay.Topic is still a FIFO
> > > > model.
> > > > > >
> > > > > > Improve user experience by packaging client API, topics can be
> > > created
> > > > > > automatically, User can customize the delay level.
> > > > > >
> > > > > > In our scene, This can already meet most of the needs. Currently
> > > > depends
> > > > > > on DLQ feature. We know from the user where the experience is not
> > > very
> > > > > > good.
> > > > > > User need to maintain the message expired.
> > > > > >
> > > > > > So, If we can avoid complexity of use and do not impose a
> performance
> > > > > > burden
> > > > > > on message dispatching. I prefer implement it on broker
> side(broker
> > > do
> > > > > not
> > > > > > need to sorting messages by time, just need to check the tail
> message
> > > > > > can be dispatch, i don’t think this will cause dispatching
> > > performance
> > > > > > problem).
> > > > > >
> > > > > > For more complicated delayed messages(e.g. arbitrary delayed
> > > delivery).
> > > > > > I don’t think pulsar need to support such complicated
> scene(after we
> > > > > > discussed before).
> > > > > > In our scene, we have more complicated message requirement(e.g.
> delay
> > > > > > message can be
> > > > > > paused, stoped, and re-run. e.g. cron messages).
> > > > > >
> > > > > > However these case is not very widely used.
> > > > > >
> > > > > > - Penghui
> > > > > >
> > > > > >
> > > > > > > 在 2019年2月20日，06:37，Sebastián Schepens
> > > > > > <se...@mercadolibre.com.INVALID> 写道：
> > > > > > >
> > > > > > > Hi,
> > > > > > > I am really not into any details of the proposed
> implementation,
> > > but
> > > > > was
> > > > > > > just wondering, has anyone had a look at how Uber implemented
> this
> > > in
> > > > > > > Cherami? Cherami seems very similar to Pulsar, its storage
> system
> > > > also
> > > > > > > seems very similar to bookkeeper. They seem to implement
> delayed
> > > > queues
> > > > > > by
> > > > > > > storing the time as part of the key in rocksdb and using sorted
> > > > > > iterators,
> > > > > > > could this be done in Pulsar as well?
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Sebastian
> > > > > > >
> > > > > > > On Tue, Feb 19, 2019 at 6:02 PM Dave Fisher <
> dave2wave@comcast.net
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > >> Hi -
> > > > > > >>
> > > > > > >> Well, it does, but can this be implemented without building a
> > > > > > delayQueue?
> > > > > > >> It seems to me that a delayQueue both breaks resiliency if the
> > > > broker
> > > > > > goes
> > > > > > >> down and would certainly add overhead. Perhaps my idea to
> discard
> > > > > > responses
> > > > > > >> that are too new and then retrieve once they are out of the
> > > delayed
> > > > > > >> timeframe would be simpler?
> > > > > > >>
> > > > > > >> Again I am somewhat naive to the details. I’m not sure that
> the
> > > path
> > > > > > >> through the code is kept to an absolute minimum when you have
> a
> > > > > Consumer
> > > > > > >> with a nonzero delay?
> > > > > > >>
> > > > > > >> Regards,
> > > > > > >> Dave
> > > > > > >>
> > > > > > >>> On Feb 19, 2019, at 12:39 PM, Ezequiel Lovelle <
> > > > > > >> ezequiellovelle@gmail.com> wrote:
> > > > > > >>>
> > > > > > >>> Hi Dave!
> > > > > > >>>
> > > > > > >>>> I wonder if clients can add an optional argument to the
> broker
> > > > call
> > > > > > when
> > > > > > >>> pulling events. The argument would be the amount of delay.
> Any
> > > > > messages
> > > > > > >>> younger than the delay are not returned by the broker.
> > > > > > >>>
> > > > > > >>> This is exactly what
> https://github.com/apache/pulsar/pull/3155
> > > > does
> > > > > > :).
> > > > > > >>> We still need to decide if we want to add this feature at
> client
> > > > side
> > > > > > or
> > > > > > >>> broker side, the pull request does it on the broker.
> > > > > > >>>
> > > > > > >>> --
> > > > > > >>> *Ezequiel Lovelle*
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> On Tue, 19 Feb 2019 at 17:06, Dave Fisher <
> dave2wave@comcast.net
> > > >
> > > > > > wrote:
> > > > > > >>>
> > > > > > >>>> Hi -
> > > > > > >>>>
> > > > > > >>>> My thoughts here may be completely useless but I wonder if
> > > clients
> > > > > can
> > > > > > >> add
> > > > > > >>>> an optional argument to the broker call when pulling
> events. The
> > > > > > >> argument
> > > > > > >>>> would be the amount of delay. Any messages younger than the
> > > delay
> > > > > are
> > > > > > >> not
> > > > > > >>>> returned by the broker.
> > > > > > >>>>
> > > > > > >>>> Regards,
> > > > > > >>>> Dave
> > > > > > >>>>
> > > > > > >>>>> On Feb 19, 2019, at 11:47 AM, Ezequiel Lovelle <
> > > > > > >>>> ezequiellovelle@gmail.com> wrote:
> > > > > > >>>>>
> > > > > > >>>>>> The recent changes made to support DLQ caused major
> problems
> > > > with
> > > > > > >>>> garbage
> > > > > > >>>>> collection
> > > > > > >>>>>
> > > > > > >>>>> If garbage collection is a big concern maybe we could add
> some
> > > > > config
> > > > > > >>>>> parameter on the broker to disable the usage of this
> feature
> > > and
> > > > > > return
> > > > > > >>>>> BrokerMetadataException in this situation, giving the
> power to
> > > > the
> > > > > > >>>>> administrator whether to offer this feature or not.
> > > > > > >>>>>
> > > > > > >>>>>> is it acceptable to do it at broker side?
> > > > > > >>>>>
> > > > > > >>>>> I think this is the big question that needs to be answered.
> > > > > > >>>>>
> > > > > > >>>>>> can we just have a separated dispatcher for fixed delayed
> > > > > > >> subscription?
> > > > > > >>>>>
> > > > > > >>>>> I will try to do a completely new approach, simpler, and
> more
> > > > > > isolated
> > > > > > >>>>> from broker logic. Maybe some way to plug to the broker
> some
> > > > logic
> > > > > > >>>> without
> > > > > > >>>>> interfering with its core?
> > > > > > >>>>>
> > > > > > >>>>> In our business fixed delay at consumer level regardless
> of any
> > > > > > >> producer
> > > > > > >>>>> configuration is a big win due to easy implementation and
> > > usage.
> > > > > > >>>>>
> > > > > > >>>>> --
> > > > > > >>>>> *Ezequiel Lovelle*
> > > > > > >>>>>
> > > > > > >>>>>
> > > > > > >>>>> On Wed, 13 Feb 2019 at 23:25, Sijie Guo <
> guosijie@gmail.com>
> > > > > wrote:
> > > > > > >>>>>
> > > > > > >>>>>> Agreed that dispatcher is a performance sensitive piece of
> > > code.
> > > > > > Feel
> > > > > > >>>> bad
> > > > > > >>>>>> to hear that DLQ causes GC. Are there any issues tracking
> > > those
> > > > > > items
> > > > > > >>>> you
> > > > > > >>>>>> guys identified with DLQ changes?
> > > > > > >>>>>>
> > > > > > >>>>>>> How is this different from a subscription running behind?
> > > > > > >>>>>>
> > > > > > >>>>>> As far as I understand form the discussion at #3155, I
> don't
> > > > think
> > > > > > >>>> there is
> > > > > > >>>>>> a fundamental difference from a backlogged subscriber.
> > > > > > >>>>>> The discussion point will mainly be - if a delayed
> > > subscription
> > > > > can
> > > > > > be
> > > > > > >>>>>> implemented with a simpler approach at broker side without
> > > > > changing
> > > > > > >>>> other
> > > > > > >>>>>> dispatcher logic,
> > > > > > >>>>>> is it acceptable to do it at broker side? So we don't
> have to
> > > > > > >>>> reimplement
> > > > > > >>>>>> the same mechanism at different language clients. I think
> > > that's
> > > > > the
> > > > > > >>>> same
> > > > > > >>>>>> tradeoff we were discussing for generic delayed messages.
> > > > > > >>>>>>
> > > > > > >>>>>> My thought would be - can we just have a separated
> dispatcher
> > > > for
> > > > > > >> fixed
> > > > > > >>>>>> delayed subscription? The logic can be ISOLATED from other
> > > > normal
> > > > > > >>>>>> dispatchers. if users don't enable delayed subscription,
> they
> > > > will
> > > > > > not
> > > > > > >>>>>> exercise that dispatcher. This can be a good direction to
> > > > explore
> > > > > > for
> > > > > > >>>>>> future changes that are related to dispatchers.
> > > > > > >>>>>>
> > > > > > >>>>>> - Sijie
> > > > > > >>>>>>
> > > > > > >>>>>>
> > > > > > >>>>>> On Thu, Feb 14, 2019 at 8:43 AM Joe F <
> joefrancisk@gmail.com>
> > > > > > wrote:
> > > > > > >>>>>>
> > > > > > >>>>>>> Delayed subscription is simpler, and probably worth
> doing in
> > > > the
> > > > > > >> broker
> > > > > > >>>>>> IF
> > > > > > >>>>>>> done right.
> > > > > > >>>>>>>
> > > > > > >>>>>>> How is this different from a subscription running behind?
> > > Why
> > > > > does
> > > > > > >>>>>>> supporting that require this complex a change in the
> > > > dispatcher,
> > > > > > when
> > > > > > >>>> we
> > > > > > >>>>>>> already support backlogged subscribers?
> > > > > > >>>>>>>
> > > > > > >>>>>>> I am extremely wary of changes in the dispatcher. The
> recent
> > > > > > changes
> > > > > > >>>> made
> > > > > > >>>>>>> to support DLQ caused major problems with garbage
> collection,
> > > > > > broker
> > > > > > >>>>>>> failure  and service interruptions for us. Even though
> we ARE
> > > > NOT
> > > > > > >> using
> > > > > > >>>>>> the
> > > > > > >>>>>>> DLQ feature. Not a pleasant experience.
> > > > > > >>>>>>>
> > > > > > >>>>>>> This is a very performance sensitive piece of code, and
> it
> > > > should
> > > > > > be
> > > > > > >>>>>>> treated as such.
> > > > > > >>>>>>>
> > > > > > >>>>>>> Joe
> > > > > > >>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>> On Wed, Feb 13, 2019 at 3:58 PM Sijie Guo <
> > > guosijie@gmail.com>
> > > > > > >> wrote:
> > > > > > >>>>>>>
> > > > > > >>>>>>>> Hi all,
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> I am going to wrap up the discussion regarding delayed
> > > > delivery
> > > > > > use
> > > > > > >>>>>>> cases.
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> For arbitrary delayed delivery, there are a few +1s to
> doing
> > > > > > PIP-26
> > > > > > >> in
> > > > > > >>>>>>>> functions. I am assuming that we will go down this path,
> > > > unless
> > > > > > >> there
> > > > > > >>>>>> are
> > > > > > >>>>>>>> other proposals.
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> However there is a use case Lovelle pointed out about
> "Fixed
> > > > > > Delayed
> > > > > > >>>>>>>> Message". More specifically it is
> > > > > > >>>>>>>> https://github.com/apache/pulsar/pull/3155
> > > > > > >>>>>>>> (The caption in #3155 is a bit misleading). IMO it is a
> > > > "delayed
> > > > > > >>>>>>>> subscription", basically all messages in the
> subscription is
> > > > > > delayed
> > > > > > >>>> to
> > > > > > >>>>>>>> dispatch in a given time interval. The consensus of this
> > > > feature
> > > > > > is
> > > > > > >>>> not
> > > > > > >>>>>>> yet
> > > > > > >>>>>>>> achieved. Basically, there will be two approaches for
> this:
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> a) DONT treat "fixed delayed message" as a different
> case.
> > > > Just
> > > > > > use
> > > > > > >>>> the
> > > > > > >>>>>>>> same approach as in PIP-26.
> > > > > > >>>>>>>> b) treat "fixed delayed message" as a different case,
> e.g.
> > > we
> > > > > can
> > > > > > >>>>>> better
> > > > > > >>>>>>>> call it "delayed subscription" or whatever can
> distinguish
> > > it
> > > > > from
> > > > > > >>>>>>> general
> > > > > > >>>>>>>> arbitrary delayed delivery. Use the approach
> > > > proposed/discussed
> > > > > in
> > > > > > >>>>>> #3155.
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> I would like the community to discuss this and also
> come to
> > > an
> > > > > > >>>>>> agreement.
> > > > > > >>>>>>>> So Lovelle can move forward with the approach agreed by
> the
> > > > > > >> community.
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> Thanks,
> > > > > > >>>>>>>> Sijie
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> On Tue, Jan 29, 2019 at 6:30 AM Ezequiel Lovelle <
> > > > > > >>>>>>>> ezequiellovelle@gmail.com>
> > > > > > >>>>>>>> wrote:
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>> "I agree, but that is *not what #3155 tries to
> achieve."
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> This typo made this phrase nonsense, sorry!
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> On Mon, 28 Jan 2019, 16:44 Ezequiel Lovelle <
> > > > > > >>>>>> ezequiellovelle@gmail.com
> > > > > > >>>>>>>>> wrote:
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>>>> What exactly is the delayed delivery use case?
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>> This is helpful on systems relaying on pulsar for
> > > persistent
> > > > > > >>>>>>> guarantees
> > > > > > >>>>>>>>>> and using it for synchronization or some sort of
> checks,
> > > but
> > > > > on
> > > > > > >>>>>> such
> > > > > > >>>>>>>>>> systems is common to have some overhead committing
> data on
> > > > > > >>>>>> persistent
> > > > > > >>>>>>>>>> storage maybe due to buffered mechanism or
> distributing
> > > the
> > > > > data
> > > > > > >>>>>>> across
> > > > > > >>>>>>>>>> the network before being available.
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>> Surely would be more use cases I don't came across
> right
> > > > now.
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>>> Random insertion and deletion is not what FIFO queues
> > > like
> > > > > > Pulsar
> > > > > > >>>>>>> are
> > > > > > >>>>>>>>>> designed for.
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>> I agree, but that is now what #3155 tries to achieve.
> > > #3155
> > > > is
> > > > > > >>>>>> just a
> > > > > > >>>>>>>>>> fixed delay for all message in a consumer, that's the
> > > reason
> > > > > > that
> > > > > > >>>>>> the
> > > > > > >>>>>>>>>> implementation of #3155 is quite trivial.
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>> +1 from me for doing PIP-26 in functions.
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>> --
> > > > > > >>>>>>>>>> *Ezequiel Lovelle*
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>> On Sat, 26 Jan 2019 at 09:57, Yuva raj <
> uvaraj6@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>>> Considering the way pulsar is built +1 for doing
> PIP-26
> > > in
> > > > > > >>>>>>> functions.
> > > > > > >>>>>>>> I
> > > > > > >>>>>>>>> am
> > > > > > >>>>>>>>>>> more of thinking in a way like publish it pulsar we
> will
> > > > make
> > > > > > it
> > > > > > >>>>>>>>> available
> > > > > > >>>>>>>>>>> in a different queuing system if you need priority
> and
> > > > delay
> > > > > > >>>>>>> messages
> > > > > > >>>>>>>>>>> support. Pulsar functions would go enough for this
> kind
> > > of
> > > > > use
> > > > > > >>>>>>> cases.
> > > > > > >>>>>>>>>>>
> > > > > > >>>>>>>>>>> On Fri, 25 Jan 2019 at 22:29, Ivan Kelly <
> > > ivank@apache.org
> > > > >
> > > > > > >>>>>> wrote:
> > > > > > >>>>>>>>>>>
> > > > > > >>>>>>>>>>>>> Correct. PIP-26 can be implemented in Functions. I
> > > > believe
> > > > > > the
> > > > > > >>>>>>>> last
> > > > > > >>>>>>>>>>>>> discussion in PIP-26 thread kind of agree on
> functions
> > > > > > >>>>>> approach.
> > > > > > >>>>>>>>>>>>> If the community is okay with PIP-26 in functions,
> I
> > > > think
> > > > > > >>>>>> that
> > > > > > >>>>>>> is
> > > > > > >>>>>>>>>>>> probably
> > > > > > >>>>>>>>>>>>> a good approach to start.
> > > > > > >>>>>>>>>>>>
> > > > > > >>>>>>>>>>>> +1 for doing it in functions.
> > > > > > >>>>>>>>>>>>
> > > > > > >>>>>>>>>>>> -Ivan
> > > > > > >>>>>>>>>>>>
> > > > > > >>>>>>>>>>>
> > > > > > >>>>>>>>>>>
> > > > > > >>>>>>>>>>> --
> > > > > > >>>>>>>>>>> *Thanks*
> > > > > > >>>>>>>>>>>
> > > > > > >>>>>>>>>>> *Yuvaraj L*
> > > > > > >>>>>>>>>>>
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>
> > > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > -Ali
> > > >
> > >
>

Re: [DISCUSSION] Delayed message delivery

Posted by Matteo Merli <ma...@gmail.com>.

Thanks everyone for the feedback.

I actually went through and gave it a shot at implementing this on
https://github.com/apache/pulsar/pull/4062

I think this implementation should address all the concern exposed in
this thread.
Please everyone involved take a deep review of the change.


Thanks,
Matteo

--
Matteo Merli
<ma...@gmail.com>

On Thu, Mar 14, 2019 at 8:10 PM Sijie Guo <gu...@gmail.com> wrote:
>
> On Fri, Mar 8, 2019 at 11:11 PM Ezequiel Lovelle <ez...@gmail.com>
> wrote:
>
> > > Seems like we are implementing per message timers.
> >
> > As per pr #3155 <https://github.com/apache/pulsar/pull/3155>, nope. Each
> > message won't have a Timer class per se,
> > just a long field representing its expiration deadline and will be
> > just one, and only one, scheduled task per consumer at any given time.
> >
> > > Seems simpler to just have delay on a topic level.
> >
> > I think complexity would be very similar on both sides (producer/consumer)
> > An important aspect here would be the decision to provide this feature
> > (delay messages on consumer) separately from the producer, hence, the
> > consumer
> > can make the decision to 'delay' all messages regardless of the producer.
> >
> > > if we are able to find a way to plug a new "fixed delay" dispatcher
> > without touching other dispatcher logic, is that a good approach for the
> > community to proceed on this direction?
> >
> > Great question! I like this path.
> >
> > One solution that I think of is something similar of what Mateo did here:
> > https://github.com/apache/pulsar/pull/3615
> >
> > So, we can have a separated class handling consumers with delay extending
> > normal consumer base. The problem with this approach would be in the
> > feature
> > if we want to have consumers with multiple behaviour.
> >
> > e.g. delayed consumer plus some future feature not present right now.
> >
>
>
>
>
> >
> > Anyway, if everyone agrees with Sijie question, we might discuss this on a
> > separated thread.
> >
>
> It seems that there are no objections. So we can probably move forward with
> the idea of having
> a separate dispatch for fixed delayed subscription. This would isolate the
> impacts of modifying existing dispatchers.
>
>
> >
> > --
> > *Ezequiel Lovelle*
> >
> >
> > On Sat, 2 Mar 2019 at 08:45, Ali Ahmed <ah...@gmail.com> wrote:
> >
> > > Seems like we are implementing per message timers.
> > >
> > > Not aware of any log pub sub that does that expect rocketmq , not sure
> > how
> > > performant that is.
> > >
> > >
> > https://github.com/apache/rocketmq/blob/2b692c912d18c0f9889fd73358581bcccf37bbbe/store/src/main/java/org/apache/rocketmq/store/schedule/ScheduleMessageService.java
> > >
> > > Seems simpler to just have delay on a topic level.  The cursor for client
> > > subscriptions can make messages available after a delay.
> > > I don't know if we can achieve significant throughput with so many active
> > > timers.
> > >
> > > On Sat, Mar 2, 2019 at 2:49 AM Sijie Guo <gu...@gmail.com> wrote:
> > >
> > > > I am trying to draw a conclusion on this email thread.
> > > >
> > > > > Maybe some way to plug to the broker some logic without
> > > > interfering with its core?
> > > > >  In our business fixed delay at consumer level regardless of any
> > > producer
> > > > > configuration is a big win due to easy implementation and usage.
> > > >
> > > > Based on Ezequiel's last comment, if we are able to find a way to plug
> > a
> > > > new "fixed delay" dispatcher without touching other dispatcher logic,
> > is
> > > > that a good approach for the community to proceed on this direction?
> > > >
> > > > - Sijie
> > > >
> > > >
> > > > On Wed, Feb 20, 2019 at 8:26 AM 李鹏辉gmail <co...@gmail.com>
> > > wrote:
> > > >
> > > > > Sorry for hear that DLQ causes GC.
> > > > >
> > > > > Agree with discussed before, Dispatcher is a performance sensitive
> > > piece
> > > > > of code.
> > > > > If we make changes on the dispatcher, we must pay attention to memory
> > > > > overhead and blocking.
> > > > >
> > > > > I prefer fixed delayed message solution(aka delayed time level). User
> > > > > can define multi topics with deferent delay.Topic is still a FIFO
> > > model.
> > > > >
> > > > > Improve user experience by packaging client API, topics can be
> > created
> > > > > automatically, User can customize the delay level.
> > > > >
> > > > > In our scene, This can already meet most of the needs. Currently
> > > depends
> > > > > on DLQ feature. We know from the user where the experience is not
> > very
> > > > > good.
> > > > > User need to maintain the message expired.
> > > > >
> > > > > So, If we can avoid complexity of use and do not impose a performance
> > > > > burden
> > > > > on message dispatching. I prefer implement it on broker side(broker
> > do
> > > > not
> > > > > need to sorting messages by time, just need to check the tail message
> > > > > can be dispatch, i don’t think this will cause dispatching
> > performance
> > > > > problem).
> > > > >
> > > > > For more complicated delayed messages(e.g. arbitrary delayed
> > delivery).
> > > > > I don’t think pulsar need to support such complicated scene(after we
> > > > > discussed before).
> > > > > In our scene, we have more complicated message requirement(e.g. delay
> > > > > message can be
> > > > > paused, stoped, and re-run. e.g. cron messages).
> > > > >
> > > > > However these case is not very widely used.
> > > > >
> > > > > - Penghui
> > > > >
> > > > >
> > > > > > 在 2019年2月20日，06:37，Sebastián Schepens
> > > > > <se...@mercadolibre.com.INVALID> 写道：
> > > > > >
> > > > > > Hi,
> > > > > > I am really not into any details of the proposed implementation,
> > but
> > > > was
> > > > > > just wondering, has anyone had a look at how Uber implemented this
> > in
> > > > > > Cherami? Cherami seems very similar to Pulsar, its storage system
> > > also
> > > > > > seems very similar to bookkeeper. They seem to implement delayed
> > > queues
> > > > > by
> > > > > > storing the time as part of the key in rocksdb and using sorted
> > > > > iterators,
> > > > > > could this be done in Pulsar as well?
> > > > > >
> > > > > > Cheers,
> > > > > > Sebastian
> > > > > >
> > > > > > On Tue, Feb 19, 2019 at 6:02 PM Dave Fisher <dave2wave@comcast.net
> > >
> > > > > wrote:
> > > > > >
> > > > > >> Hi -
> > > > > >>
> > > > > >> Well, it does, but can this be implemented without building a
> > > > > delayQueue?
> > > > > >> It seems to me that a delayQueue both breaks resiliency if the
> > > broker
> > > > > goes
> > > > > >> down and would certainly add overhead. Perhaps my idea to discard
> > > > > responses
> > > > > >> that are too new and then retrieve once they are out of the
> > delayed
> > > > > >> timeframe would be simpler?
> > > > > >>
> > > > > >> Again I am somewhat naive to the details. I’m not sure that the
> > path
> > > > > >> through the code is kept to an absolute minimum when you have a
> > > > Consumer
> > > > > >> with a nonzero delay?
> > > > > >>
> > > > > >> Regards,
> > > > > >> Dave
> > > > > >>
> > > > > >>> On Feb 19, 2019, at 12:39 PM, Ezequiel Lovelle <
> > > > > >> ezequiellovelle@gmail.com> wrote:
> > > > > >>>
> > > > > >>> Hi Dave!
> > > > > >>>
> > > > > >>>> I wonder if clients can add an optional argument to the broker
> > > call
> > > > > when
> > > > > >>> pulling events. The argument would be the amount of delay. Any
> > > > messages
> > > > > >>> younger than the delay are not returned by the broker.
> > > > > >>>
> > > > > >>> This is exactly what https://github.com/apache/pulsar/pull/3155
> > > does
> > > > > :).
> > > > > >>> We still need to decide if we want to add this feature at client
> > > side
> > > > > or
> > > > > >>> broker side, the pull request does it on the broker.
> > > > > >>>
> > > > > >>> --
> > > > > >>> *Ezequiel Lovelle*
> > > > > >>>
> > > > > >>>
> > > > > >>> On Tue, 19 Feb 2019 at 17:06, Dave Fisher <dave2wave@comcast.net
> > >
> > > > > wrote:
> > > > > >>>
> > > > > >>>> Hi -
> > > > > >>>>
> > > > > >>>> My thoughts here may be completely useless but I wonder if
> > clients
> > > > can
> > > > > >> add
> > > > > >>>> an optional argument to the broker call when pulling events. The
> > > > > >> argument
> > > > > >>>> would be the amount of delay. Any messages younger than the
> > delay
> > > > are
> > > > > >> not
> > > > > >>>> returned by the broker.
> > > > > >>>>
> > > > > >>>> Regards,
> > > > > >>>> Dave
> > > > > >>>>
> > > > > >>>>> On Feb 19, 2019, at 11:47 AM, Ezequiel Lovelle <
> > > > > >>>> ezequiellovelle@gmail.com> wrote:
> > > > > >>>>>
> > > > > >>>>>> The recent changes made to support DLQ caused major problems
> > > with
> > > > > >>>> garbage
> > > > > >>>>> collection
> > > > > >>>>>
> > > > > >>>>> If garbage collection is a big concern maybe we could add some
> > > > config
> > > > > >>>>> parameter on the broker to disable the usage of this feature
> > and
> > > > > return
> > > > > >>>>> BrokerMetadataException in this situation, giving the power to
> > > the
> > > > > >>>>> administrator whether to offer this feature or not.
> > > > > >>>>>
> > > > > >>>>>> is it acceptable to do it at broker side?
> > > > > >>>>>
> > > > > >>>>> I think this is the big question that needs to be answered.
> > > > > >>>>>
> > > > > >>>>>> can we just have a separated dispatcher for fixed delayed
> > > > > >> subscription?
> > > > > >>>>>
> > > > > >>>>> I will try to do a completely new approach, simpler, and more
> > > > > isolated
> > > > > >>>>> from broker logic. Maybe some way to plug to the broker some
> > > logic
> > > > > >>>> without
> > > > > >>>>> interfering with its core?
> > > > > >>>>>
> > > > > >>>>> In our business fixed delay at consumer level regardless of any
> > > > > >> producer
> > > > > >>>>> configuration is a big win due to easy implementation and
> > usage.
> > > > > >>>>>
> > > > > >>>>> --
> > > > > >>>>> *Ezequiel Lovelle*
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>> On Wed, 13 Feb 2019 at 23:25, Sijie Guo <gu...@gmail.com>
> > > > wrote:
> > > > > >>>>>
> > > > > >>>>>> Agreed that dispatcher is a performance sensitive piece of
> > code.
> > > > > Feel
> > > > > >>>> bad
> > > > > >>>>>> to hear that DLQ causes GC. Are there any issues tracking
> > those
> > > > > items
> > > > > >>>> you
> > > > > >>>>>> guys identified with DLQ changes?
> > > > > >>>>>>
> > > > > >>>>>>> How is this different from a subscription running behind?
> > > > > >>>>>>
> > > > > >>>>>> As far as I understand form the discussion at #3155, I don't
> > > think
> > > > > >>>> there is
> > > > > >>>>>> a fundamental difference from a backlogged subscriber.
> > > > > >>>>>> The discussion point will mainly be - if a delayed
> > subscription
> > > > can
> > > > > be
> > > > > >>>>>> implemented with a simpler approach at broker side without
> > > > changing
> > > > > >>>> other
> > > > > >>>>>> dispatcher logic,
> > > > > >>>>>> is it acceptable to do it at broker side? So we don't have to
> > > > > >>>> reimplement
> > > > > >>>>>> the same mechanism at different language clients. I think
> > that's
> > > > the
> > > > > >>>> same
> > > > > >>>>>> tradeoff we were discussing for generic delayed messages.
> > > > > >>>>>>
> > > > > >>>>>> My thought would be - can we just have a separated dispatcher
> > > for
> > > > > >> fixed
> > > > > >>>>>> delayed subscription? The logic can be ISOLATED from other
> > > normal
> > > > > >>>>>> dispatchers. if users don't enable delayed subscription, they
> > > will
> > > > > not
> > > > > >>>>>> exercise that dispatcher. This can be a good direction to
> > > explore
> > > > > for
> > > > > >>>>>> future changes that are related to dispatchers.
> > > > > >>>>>>
> > > > > >>>>>> - Sijie
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>>> On Thu, Feb 14, 2019 at 8:43 AM Joe F <jo...@gmail.com>
> > > > > wrote:
> > > > > >>>>>>
> > > > > >>>>>>> Delayed subscription is simpler, and probably worth doing in
> > > the
> > > > > >> broker
> > > > > >>>>>> IF
> > > > > >>>>>>> done right.
> > > > > >>>>>>>
> > > > > >>>>>>> How is this different from a subscription running behind?
> > Why
> > > > does
> > > > > >>>>>>> supporting that require this complex a change in the
> > > dispatcher,
> > > > > when
> > > > > >>>> we
> > > > > >>>>>>> already support backlogged subscribers?
> > > > > >>>>>>>
> > > > > >>>>>>> I am extremely wary of changes in the dispatcher. The recent
> > > > > changes
> > > > > >>>> made
> > > > > >>>>>>> to support DLQ caused major problems with garbage collection,
> > > > > broker
> > > > > >>>>>>> failure  and service interruptions for us. Even though we ARE
> > > NOT
> > > > > >> using
> > > > > >>>>>> the
> > > > > >>>>>>> DLQ feature. Not a pleasant experience.
> > > > > >>>>>>>
> > > > > >>>>>>> This is a very performance sensitive piece of code, and it
> > > should
> > > > > be
> > > > > >>>>>>> treated as such.
> > > > > >>>>>>>
> > > > > >>>>>>> Joe
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> On Wed, Feb 13, 2019 at 3:58 PM Sijie Guo <
> > guosijie@gmail.com>
> > > > > >> wrote:
> > > > > >>>>>>>
> > > > > >>>>>>>> Hi all,
> > > > > >>>>>>>>
> > > > > >>>>>>>> I am going to wrap up the discussion regarding delayed
> > > delivery
> > > > > use
> > > > > >>>>>>> cases.
> > > > > >>>>>>>>
> > > > > >>>>>>>> For arbitrary delayed delivery, there are a few +1s to doing
> > > > > PIP-26
> > > > > >> in
> > > > > >>>>>>>> functions. I am assuming that we will go down this path,
> > > unless
> > > > > >> there
> > > > > >>>>>> are
> > > > > >>>>>>>> other proposals.
> > > > > >>>>>>>>
> > > > > >>>>>>>> However there is a use case Lovelle pointed out about "Fixed
> > > > > Delayed
> > > > > >>>>>>>> Message". More specifically it is
> > > > > >>>>>>>> https://github.com/apache/pulsar/pull/3155
> > > > > >>>>>>>> (The caption in #3155 is a bit misleading). IMO it is a
> > > "delayed
> > > > > >>>>>>>> subscription", basically all messages in the subscription is
> > > > > delayed
> > > > > >>>> to
> > > > > >>>>>>>> dispatch in a given time interval. The consensus of this
> > > feature
> > > > > is
> > > > > >>>> not
> > > > > >>>>>>> yet
> > > > > >>>>>>>> achieved. Basically, there will be two approaches for this:
> > > > > >>>>>>>>
> > > > > >>>>>>>> a) DONT treat "fixed delayed message" as a different case.
> > > Just
> > > > > use
> > > > > >>>> the
> > > > > >>>>>>>> same approach as in PIP-26.
> > > > > >>>>>>>> b) treat "fixed delayed message" as a different case, e.g.
> > we
> > > > can
> > > > > >>>>>> better
> > > > > >>>>>>>> call it "delayed subscription" or whatever can distinguish
> > it
> > > > from
> > > > > >>>>>>> general
> > > > > >>>>>>>> arbitrary delayed delivery. Use the approach
> > > proposed/discussed
> > > > in
> > > > > >>>>>> #3155.
> > > > > >>>>>>>>
> > > > > >>>>>>>> I would like the community to discuss this and also come to
> > an
> > > > > >>>>>> agreement.
> > > > > >>>>>>>> So Lovelle can move forward with the approach agreed by the
> > > > > >> community.
> > > > > >>>>>>>>
> > > > > >>>>>>>> Thanks,
> > > > > >>>>>>>> Sijie
> > > > > >>>>>>>>
> > > > > >>>>>>>> On Tue, Jan 29, 2019 at 6:30 AM Ezequiel Lovelle <
> > > > > >>>>>>>> ezequiellovelle@gmail.com>
> > > > > >>>>>>>> wrote:
> > > > > >>>>>>>>
> > > > > >>>>>>>>> "I agree, but that is *not what #3155 tries to achieve."
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> This typo made this phrase nonsense, sorry!
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> On Mon, 28 Jan 2019, 16:44 Ezequiel Lovelle <
> > > > > >>>>>> ezequiellovelle@gmail.com
> > > > > >>>>>>>>> wrote:
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>>> What exactly is the delayed delivery use case?
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> This is helpful on systems relaying on pulsar for
> > persistent
> > > > > >>>>>>> guarantees
> > > > > >>>>>>>>>> and using it for synchronization or some sort of checks,
> > but
> > > > on
> > > > > >>>>>> such
> > > > > >>>>>>>>>> systems is common to have some overhead committing data on
> > > > > >>>>>> persistent
> > > > > >>>>>>>>>> storage maybe due to buffered mechanism or distributing
> > the
> > > > data
> > > > > >>>>>>> across
> > > > > >>>>>>>>>> the network before being available.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Surely would be more use cases I don't came across right
> > > now.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>> Random insertion and deletion is not what FIFO queues
> > like
> > > > > Pulsar
> > > > > >>>>>>> are
> > > > > >>>>>>>>>> designed for.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> I agree, but that is now what #3155 tries to achieve.
> > #3155
> > > is
> > > > > >>>>>> just a
> > > > > >>>>>>>>>> fixed delay for all message in a consumer, that's the
> > reason
> > > > > that
> > > > > >>>>>> the
> > > > > >>>>>>>>>> implementation of #3155 is quite trivial.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> +1 from me for doing PIP-26 in functions.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> --
> > > > > >>>>>>>>>> *Ezequiel Lovelle*
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> On Sat, 26 Jan 2019 at 09:57, Yuva raj <uvaraj6@gmail.com
> > >
> > > > > wrote:
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>> Considering the way pulsar is built +1 for doing PIP-26
> > in
> > > > > >>>>>>> functions.
> > > > > >>>>>>>> I
> > > > > >>>>>>>>> am
> > > > > >>>>>>>>>>> more of thinking in a way like publish it pulsar we will
> > > make
> > > > > it
> > > > > >>>>>>>>> available
> > > > > >>>>>>>>>>> in a different queuing system if you need priority and
> > > delay
> > > > > >>>>>>> messages
> > > > > >>>>>>>>>>> support. Pulsar functions would go enough for this kind
> > of
> > > > use
> > > > > >>>>>>> cases.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> On Fri, 25 Jan 2019 at 22:29, Ivan Kelly <
> > ivank@apache.org
> > > >
> > > > > >>>>>> wrote:
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>>> Correct. PIP-26 can be implemented in Functions. I
> > > believe
> > > > > the
> > > > > >>>>>>>> last
> > > > > >>>>>>>>>>>>> discussion in PIP-26 thread kind of agree on functions
> > > > > >>>>>> approach.
> > > > > >>>>>>>>>>>>> If the community is okay with PIP-26 in functions, I
> > > think
> > > > > >>>>>> that
> > > > > >>>>>>> is
> > > > > >>>>>>>>>>>> probably
> > > > > >>>>>>>>>>>>> a good approach to start.
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> +1 for doing it in functions.
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> -Ivan
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> --
> > > > > >>>>>>>>>>> *Thanks*
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> *Yuvaraj L*
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>
> > > > > >>
> > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > -Ali
> > >
> >

Re: [DISCUSSION] Delayed message delivery

Posted by Sijie Guo <gu...@gmail.com>.

On Fri, Mar 8, 2019 at 11:11 PM Ezequiel Lovelle <ez...@gmail.com>
wrote:

> > Seems like we are implementing per message timers.
>
> As per pr #3155 <https://github.com/apache/pulsar/pull/3155>, nope. Each
> message won't have a Timer class per se,
> just a long field representing its expiration deadline and will be
> just one, and only one, scheduled task per consumer at any given time.
>
> > Seems simpler to just have delay on a topic level.
>
> I think complexity would be very similar on both sides (producer/consumer)
> An important aspect here would be the decision to provide this feature
> (delay messages on consumer) separately from the producer, hence, the
> consumer
> can make the decision to 'delay' all messages regardless of the producer.
>
> > if we are able to find a way to plug a new "fixed delay" dispatcher
> without touching other dispatcher logic, is that a good approach for the
> community to proceed on this direction?
>
> Great question! I like this path.
>
> One solution that I think of is something similar of what Mateo did here:
> https://github.com/apache/pulsar/pull/3615
>
> So, we can have a separated class handling consumers with delay extending
> normal consumer base. The problem with this approach would be in the
> feature
> if we want to have consumers with multiple behaviour.
>
> e.g. delayed consumer plus some future feature not present right now.
>




>
> Anyway, if everyone agrees with Sijie question, we might discuss this on a
> separated thread.
>

It seems that there are no objections. So we can probably move forward with
the idea of having
a separate dispatch for fixed delayed subscription. This would isolate the
impacts of modifying existing dispatchers.


>
> --
> *Ezequiel Lovelle*
>
>
> On Sat, 2 Mar 2019 at 08:45, Ali Ahmed <ah...@gmail.com> wrote:
>
> > Seems like we are implementing per message timers.
> >
> > Not aware of any log pub sub that does that expect rocketmq , not sure
> how
> > performant that is.
> >
> >
> https://github.com/apache/rocketmq/blob/2b692c912d18c0f9889fd73358581bcccf37bbbe/store/src/main/java/org/apache/rocketmq/store/schedule/ScheduleMessageService.java
> >
> > Seems simpler to just have delay on a topic level.  The cursor for client
> > subscriptions can make messages available after a delay.
> > I don't know if we can achieve significant throughput with so many active
> > timers.
> >
> > On Sat, Mar 2, 2019 at 2:49 AM Sijie Guo <gu...@gmail.com> wrote:
> >
> > > I am trying to draw a conclusion on this email thread.
> > >
> > > > Maybe some way to plug to the broker some logic without
> > > interfering with its core?
> > > >  In our business fixed delay at consumer level regardless of any
> > producer
> > > > configuration is a big win due to easy implementation and usage.
> > >
> > > Based on Ezequiel's last comment, if we are able to find a way to plug
> a
> > > new "fixed delay" dispatcher without touching other dispatcher logic,
> is
> > > that a good approach for the community to proceed on this direction?
> > >
> > > - Sijie
> > >
> > >
> > > On Wed, Feb 20, 2019 at 8:26 AM 李鹏辉gmail <co...@gmail.com>
> > wrote:
> > >
> > > > Sorry for hear that DLQ causes GC.
> > > >
> > > > Agree with discussed before, Dispatcher is a performance sensitive
> > piece
> > > > of code.
> > > > If we make changes on the dispatcher, we must pay attention to memory
> > > > overhead and blocking.
> > > >
> > > > I prefer fixed delayed message solution(aka delayed time level). User
> > > > can define multi topics with deferent delay.Topic is still a FIFO
> > model.
> > > >
> > > > Improve user experience by packaging client API, topics can be
> created
> > > > automatically, User can customize the delay level.
> > > >
> > > > In our scene, This can already meet most of the needs. Currently
> > depends
> > > > on DLQ feature. We know from the user where the experience is not
> very
> > > > good.
> > > > User need to maintain the message expired.
> > > >
> > > > So, If we can avoid complexity of use and do not impose a performance
> > > > burden
> > > > on message dispatching. I prefer implement it on broker side(broker
> do
> > > not
> > > > need to sorting messages by time, just need to check the tail message
> > > > can be dispatch, i don’t think this will cause dispatching
> performance
> > > > problem).
> > > >
> > > > For more complicated delayed messages(e.g. arbitrary delayed
> delivery).
> > > > I don’t think pulsar need to support such complicated scene(after we
> > > > discussed before).
> > > > In our scene, we have more complicated message requirement(e.g. delay
> > > > message can be
> > > > paused, stoped, and re-run. e.g. cron messages).
> > > >
> > > > However these case is not very widely used.
> > > >
> > > > - Penghui
> > > >
> > > >
> > > > > 在 2019年2月20日，06:37，Sebastián Schepens
> > > > <se...@mercadolibre.com.INVALID> 写道：
> > > > >
> > > > > Hi,
> > > > > I am really not into any details of the proposed implementation,
> but
> > > was
> > > > > just wondering, has anyone had a look at how Uber implemented this
> in
> > > > > Cherami? Cherami seems very similar to Pulsar, its storage system
> > also
> > > > > seems very similar to bookkeeper. They seem to implement delayed
> > queues
> > > > by
> > > > > storing the time as part of the key in rocksdb and using sorted
> > > > iterators,
> > > > > could this be done in Pulsar as well?
> > > > >
> > > > > Cheers,
> > > > > Sebastian
> > > > >
> > > > > On Tue, Feb 19, 2019 at 6:02 PM Dave Fisher <dave2wave@comcast.net
> >
> > > > wrote:
> > > > >
> > > > >> Hi -
> > > > >>
> > > > >> Well, it does, but can this be implemented without building a
> > > > delayQueue?
> > > > >> It seems to me that a delayQueue both breaks resiliency if the
> > broker
> > > > goes
> > > > >> down and would certainly add overhead. Perhaps my idea to discard
> > > > responses
> > > > >> that are too new and then retrieve once they are out of the
> delayed
> > > > >> timeframe would be simpler?
> > > > >>
> > > > >> Again I am somewhat naive to the details. I’m not sure that the
> path
> > > > >> through the code is kept to an absolute minimum when you have a
> > > Consumer
> > > > >> with a nonzero delay?
> > > > >>
> > > > >> Regards,
> > > > >> Dave
> > > > >>
> > > > >>> On Feb 19, 2019, at 12:39 PM, Ezequiel Lovelle <
> > > > >> ezequiellovelle@gmail.com> wrote:
> > > > >>>
> > > > >>> Hi Dave!
> > > > >>>
> > > > >>>> I wonder if clients can add an optional argument to the broker
> > call
> > > > when
> > > > >>> pulling events. The argument would be the amount of delay. Any
> > > messages
> > > > >>> younger than the delay are not returned by the broker.
> > > > >>>
> > > > >>> This is exactly what https://github.com/apache/pulsar/pull/3155
> > does
> > > > :).
> > > > >>> We still need to decide if we want to add this feature at client
> > side
> > > > or
> > > > >>> broker side, the pull request does it on the broker.
> > > > >>>
> > > > >>> --
> > > > >>> *Ezequiel Lovelle*
> > > > >>>
> > > > >>>
> > > > >>> On Tue, 19 Feb 2019 at 17:06, Dave Fisher <dave2wave@comcast.net
> >
> > > > wrote:
> > > > >>>
> > > > >>>> Hi -
> > > > >>>>
> > > > >>>> My thoughts here may be completely useless but I wonder if
> clients
> > > can
> > > > >> add
> > > > >>>> an optional argument to the broker call when pulling events. The
> > > > >> argument
> > > > >>>> would be the amount of delay. Any messages younger than the
> delay
> > > are
> > > > >> not
> > > > >>>> returned by the broker.
> > > > >>>>
> > > > >>>> Regards,
> > > > >>>> Dave
> > > > >>>>
> > > > >>>>> On Feb 19, 2019, at 11:47 AM, Ezequiel Lovelle <
> > > > >>>> ezequiellovelle@gmail.com> wrote:
> > > > >>>>>
> > > > >>>>>> The recent changes made to support DLQ caused major problems
> > with
> > > > >>>> garbage
> > > > >>>>> collection
> > > > >>>>>
> > > > >>>>> If garbage collection is a big concern maybe we could add some
> > > config
> > > > >>>>> parameter on the broker to disable the usage of this feature
> and
> > > > return
> > > > >>>>> BrokerMetadataException in this situation, giving the power to
> > the
> > > > >>>>> administrator whether to offer this feature or not.
> > > > >>>>>
> > > > >>>>>> is it acceptable to do it at broker side?
> > > > >>>>>
> > > > >>>>> I think this is the big question that needs to be answered.
> > > > >>>>>
> > > > >>>>>> can we just have a separated dispatcher for fixed delayed
> > > > >> subscription?
> > > > >>>>>
> > > > >>>>> I will try to do a completely new approach, simpler, and more
> > > > isolated
> > > > >>>>> from broker logic. Maybe some way to plug to the broker some
> > logic
> > > > >>>> without
> > > > >>>>> interfering with its core?
> > > > >>>>>
> > > > >>>>> In our business fixed delay at consumer level regardless of any
> > > > >> producer
> > > > >>>>> configuration is a big win due to easy implementation and
> usage.
> > > > >>>>>
> > > > >>>>> --
> > > > >>>>> *Ezequiel Lovelle*
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> On Wed, 13 Feb 2019 at 23:25, Sijie Guo <gu...@gmail.com>
> > > wrote:
> > > > >>>>>
> > > > >>>>>> Agreed that dispatcher is a performance sensitive piece of
> code.
> > > > Feel
> > > > >>>> bad
> > > > >>>>>> to hear that DLQ causes GC. Are there any issues tracking
> those
> > > > items
> > > > >>>> you
> > > > >>>>>> guys identified with DLQ changes?
> > > > >>>>>>
> > > > >>>>>>> How is this different from a subscription running behind?
> > > > >>>>>>
> > > > >>>>>> As far as I understand form the discussion at #3155, I don't
> > think
> > > > >>>> there is
> > > > >>>>>> a fundamental difference from a backlogged subscriber.
> > > > >>>>>> The discussion point will mainly be - if a delayed
> subscription
> > > can
> > > > be
> > > > >>>>>> implemented with a simpler approach at broker side without
> > > changing
> > > > >>>> other
> > > > >>>>>> dispatcher logic,
> > > > >>>>>> is it acceptable to do it at broker side? So we don't have to
> > > > >>>> reimplement
> > > > >>>>>> the same mechanism at different language clients. I think
> that's
> > > the
> > > > >>>> same
> > > > >>>>>> tradeoff we were discussing for generic delayed messages.
> > > > >>>>>>
> > > > >>>>>> My thought would be - can we just have a separated dispatcher
> > for
> > > > >> fixed
> > > > >>>>>> delayed subscription? The logic can be ISOLATED from other
> > normal
> > > > >>>>>> dispatchers. if users don't enable delayed subscription, they
> > will
> > > > not
> > > > >>>>>> exercise that dispatcher. This can be a good direction to
> > explore
> > > > for
> > > > >>>>>> future changes that are related to dispatchers.
> > > > >>>>>>
> > > > >>>>>> - Sijie
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>> On Thu, Feb 14, 2019 at 8:43 AM Joe F <jo...@gmail.com>
> > > > wrote:
> > > > >>>>>>
> > > > >>>>>>> Delayed subscription is simpler, and probably worth doing in
> > the
> > > > >> broker
> > > > >>>>>> IF
> > > > >>>>>>> done right.
> > > > >>>>>>>
> > > > >>>>>>> How is this different from a subscription running behind?
> Why
> > > does
> > > > >>>>>>> supporting that require this complex a change in the
> > dispatcher,
> > > > when
> > > > >>>> we
> > > > >>>>>>> already support backlogged subscribers?
> > > > >>>>>>>
> > > > >>>>>>> I am extremely wary of changes in the dispatcher. The recent
> > > > changes
> > > > >>>> made
> > > > >>>>>>> to support DLQ caused major problems with garbage collection,
> > > > broker
> > > > >>>>>>> failure  and service interruptions for us. Even though we ARE
> > NOT
> > > > >> using
> > > > >>>>>> the
> > > > >>>>>>> DLQ feature. Not a pleasant experience.
> > > > >>>>>>>
> > > > >>>>>>> This is a very performance sensitive piece of code, and it
> > should
> > > > be
> > > > >>>>>>> treated as such.
> > > > >>>>>>>
> > > > >>>>>>> Joe
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> On Wed, Feb 13, 2019 at 3:58 PM Sijie Guo <
> guosijie@gmail.com>
> > > > >> wrote:
> > > > >>>>>>>
> > > > >>>>>>>> Hi all,
> > > > >>>>>>>>
> > > > >>>>>>>> I am going to wrap up the discussion regarding delayed
> > delivery
> > > > use
> > > > >>>>>>> cases.
> > > > >>>>>>>>
> > > > >>>>>>>> For arbitrary delayed delivery, there are a few +1s to doing
> > > > PIP-26
> > > > >> in
> > > > >>>>>>>> functions. I am assuming that we will go down this path,
> > unless
> > > > >> there
> > > > >>>>>> are
> > > > >>>>>>>> other proposals.
> > > > >>>>>>>>
> > > > >>>>>>>> However there is a use case Lovelle pointed out about "Fixed
> > > > Delayed
> > > > >>>>>>>> Message". More specifically it is
> > > > >>>>>>>> https://github.com/apache/pulsar/pull/3155
> > > > >>>>>>>> (The caption in #3155 is a bit misleading). IMO it is a
> > "delayed
> > > > >>>>>>>> subscription", basically all messages in the subscription is
> > > > delayed
> > > > >>>> to
> > > > >>>>>>>> dispatch in a given time interval. The consensus of this
> > feature
> > > > is
> > > > >>>> not
> > > > >>>>>>> yet
> > > > >>>>>>>> achieved. Basically, there will be two approaches for this:
> > > > >>>>>>>>
> > > > >>>>>>>> a) DONT treat "fixed delayed message" as a different case.
> > Just
> > > > use
> > > > >>>> the
> > > > >>>>>>>> same approach as in PIP-26.
> > > > >>>>>>>> b) treat "fixed delayed message" as a different case, e.g.
> we
> > > can
> > > > >>>>>> better
> > > > >>>>>>>> call it "delayed subscription" or whatever can distinguish
> it
> > > from
> > > > >>>>>>> general
> > > > >>>>>>>> arbitrary delayed delivery. Use the approach
> > proposed/discussed
> > > in
> > > > >>>>>> #3155.
> > > > >>>>>>>>
> > > > >>>>>>>> I would like the community to discuss this and also come to
> an
> > > > >>>>>> agreement.
> > > > >>>>>>>> So Lovelle can move forward with the approach agreed by the
> > > > >> community.
> > > > >>>>>>>>
> > > > >>>>>>>> Thanks,
> > > > >>>>>>>> Sijie
> > > > >>>>>>>>
> > > > >>>>>>>> On Tue, Jan 29, 2019 at 6:30 AM Ezequiel Lovelle <
> > > > >>>>>>>> ezequiellovelle@gmail.com>
> > > > >>>>>>>> wrote:
> > > > >>>>>>>>
> > > > >>>>>>>>> "I agree, but that is *not what #3155 tries to achieve."
> > > > >>>>>>>>>
> > > > >>>>>>>>> This typo made this phrase nonsense, sorry!
> > > > >>>>>>>>>
> > > > >>>>>>>>> On Mon, 28 Jan 2019, 16:44 Ezequiel Lovelle <
> > > > >>>>>> ezequiellovelle@gmail.com
> > > > >>>>>>>>> wrote:
> > > > >>>>>>>>>
> > > > >>>>>>>>>>> What exactly is the delayed delivery use case?
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> This is helpful on systems relaying on pulsar for
> persistent
> > > > >>>>>>> guarantees
> > > > >>>>>>>>>> and using it for synchronization or some sort of checks,
> but
> > > on
> > > > >>>>>> such
> > > > >>>>>>>>>> systems is common to have some overhead committing data on
> > > > >>>>>> persistent
> > > > >>>>>>>>>> storage maybe due to buffered mechanism or distributing
> the
> > > data
> > > > >>>>>>> across
> > > > >>>>>>>>>> the network before being available.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Surely would be more use cases I don't came across right
> > now.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>> Random insertion and deletion is not what FIFO queues
> like
> > > > Pulsar
> > > > >>>>>>> are
> > > > >>>>>>>>>> designed for.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> I agree, but that is now what #3155 tries to achieve.
> #3155
> > is
> > > > >>>>>> just a
> > > > >>>>>>>>>> fixed delay for all message in a consumer, that's the
> reason
> > > > that
> > > > >>>>>> the
> > > > >>>>>>>>>> implementation of #3155 is quite trivial.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> +1 from me for doing PIP-26 in functions.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> --
> > > > >>>>>>>>>> *Ezequiel Lovelle*
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> On Sat, 26 Jan 2019 at 09:57, Yuva raj <uvaraj6@gmail.com
> >
> > > > wrote:
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>> Considering the way pulsar is built +1 for doing PIP-26
> in
> > > > >>>>>>> functions.
> > > > >>>>>>>> I
> > > > >>>>>>>>> am
> > > > >>>>>>>>>>> more of thinking in a way like publish it pulsar we will
> > make
> > > > it
> > > > >>>>>>>>> available
> > > > >>>>>>>>>>> in a different queuing system if you need priority and
> > delay
> > > > >>>>>>> messages
> > > > >>>>>>>>>>> support. Pulsar functions would go enough for this kind
> of
> > > use
> > > > >>>>>>> cases.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> On Fri, 25 Jan 2019 at 22:29, Ivan Kelly <
> ivank@apache.org
> > >
> > > > >>>>>> wrote:
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>>> Correct. PIP-26 can be implemented in Functions. I
> > believe
> > > > the
> > > > >>>>>>>> last
> > > > >>>>>>>>>>>>> discussion in PIP-26 thread kind of agree on functions
> > > > >>>>>> approach.
> > > > >>>>>>>>>>>>> If the community is okay with PIP-26 in functions, I
> > think
> > > > >>>>>> that
> > > > >>>>>>> is
> > > > >>>>>>>>>>>> probably
> > > > >>>>>>>>>>>>> a good approach to start.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> +1 for doing it in functions.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> -Ivan
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> --
> > > > >>>>>>>>>>> *Thanks*
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> *Yuvaraj L*
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>
> > > > >>>>
> > > > >>
> > > > >>
> > > >
> > > >
> > >
> >
> >
> > --
> > -Ali
> >
>

Re: [DISCUSSION] Delayed message delivery

Posted by Ezequiel Lovelle <ez...@gmail.com>.

> Seems like we are implementing per message timers.

As per pr #3155 <https://github.com/apache/pulsar/pull/3155>, nope. Each
message won't have a Timer class per se,
just a long field representing its expiration deadline and will be
just one, and only one, scheduled task per consumer at any given time.

> Seems simpler to just have delay on a topic level.

I think complexity would be very similar on both sides (producer/consumer)
An important aspect here would be the decision to provide this feature
(delay messages on consumer) separately from the producer, hence, the
consumer
can make the decision to 'delay' all messages regardless of the producer.

> if we are able to find a way to plug a new "fixed delay" dispatcher
without touching other dispatcher logic, is that a good approach for the
community to proceed on this direction?

Great question! I like this path.

One solution that I think of is something similar of what Mateo did here:
https://github.com/apache/pulsar/pull/3615

So, we can have a separated class handling consumers with delay extending
normal consumer base. The problem with this approach would be in the
feature
if we want to have consumers with multiple behaviour.

e.g. delayed consumer plus some future feature not present right now.

Anyway, if everyone agrees with Sijie question, we might discuss this on a
separated thread.

--
*Ezequiel Lovelle*


On Sat, 2 Mar 2019 at 08:45, Ali Ahmed <ah...@gmail.com> wrote:

> Seems like we are implementing per message timers.
>
> Not aware of any log pub sub that does that expect rocketmq , not sure how
> performant that is.
>
> https://github.com/apache/rocketmq/blob/2b692c912d18c0f9889fd73358581bcccf37bbbe/store/src/main/java/org/apache/rocketmq/store/schedule/ScheduleMessageService.java
>
> Seems simpler to just have delay on a topic level.  The cursor for client
> subscriptions can make messages available after a delay.
> I don't know if we can achieve significant throughput with so many active
> timers.
>
> On Sat, Mar 2, 2019 at 2:49 AM Sijie Guo <gu...@gmail.com> wrote:
>
> > I am trying to draw a conclusion on this email thread.
> >
> > > Maybe some way to plug to the broker some logic without
> > interfering with its core?
> > >  In our business fixed delay at consumer level regardless of any
> producer
> > > configuration is a big win due to easy implementation and usage.
> >
> > Based on Ezequiel's last comment, if we are able to find a way to plug a
> > new "fixed delay" dispatcher without touching other dispatcher logic, is
> > that a good approach for the community to proceed on this direction?
> >
> > - Sijie
> >
> >
> > On Wed, Feb 20, 2019 at 8:26 AM 李鹏辉gmail <co...@gmail.com>
> wrote:
> >
> > > Sorry for hear that DLQ causes GC.
> > >
> > > Agree with discussed before, Dispatcher is a performance sensitive
> piece
> > > of code.
> > > If we make changes on the dispatcher, we must pay attention to memory
> > > overhead and blocking.
> > >
> > > I prefer fixed delayed message solution(aka delayed time level). User
> > > can define multi topics with deferent delay.Topic is still a FIFO
> model.
> > >
> > > Improve user experience by packaging client API, topics can be created
> > > automatically, User can customize the delay level.
> > >
> > > In our scene, This can already meet most of the needs. Currently
> depends
> > > on DLQ feature. We know from the user where the experience is not very
> > > good.
> > > User need to maintain the message expired.
> > >
> > > So, If we can avoid complexity of use and do not impose a performance
> > > burden
> > > on message dispatching. I prefer implement it on broker side(broker do
> > not
> > > need to sorting messages by time, just need to check the tail message
> > > can be dispatch, i don’t think this will cause dispatching performance
> > > problem).
> > >
> > > For more complicated delayed messages(e.g. arbitrary delayed delivery).
> > > I don’t think pulsar need to support such complicated scene(after we
> > > discussed before).
> > > In our scene, we have more complicated message requirement(e.g. delay
> > > message can be
> > > paused, stoped, and re-run. e.g. cron messages).
> > >
> > > However these case is not very widely used.
> > >
> > > - Penghui
> > >
> > >
> > > > 在 2019年2月20日，06:37，Sebastián Schepens
> > > <se...@mercadolibre.com.INVALID> 写道：
> > > >
> > > > Hi,
> > > > I am really not into any details of the proposed implementation, but
> > was
> > > > just wondering, has anyone had a look at how Uber implemented this in
> > > > Cherami? Cherami seems very similar to Pulsar, its storage system
> also
> > > > seems very similar to bookkeeper. They seem to implement delayed
> queues
> > > by
> > > > storing the time as part of the key in rocksdb and using sorted
> > > iterators,
> > > > could this be done in Pulsar as well?
> > > >
> > > > Cheers,
> > > > Sebastian
> > > >
> > > > On Tue, Feb 19, 2019 at 6:02 PM Dave Fisher <da...@comcast.net>
> > > wrote:
> > > >
> > > >> Hi -
> > > >>
> > > >> Well, it does, but can this be implemented without building a
> > > delayQueue?
> > > >> It seems to me that a delayQueue both breaks resiliency if the
> broker
> > > goes
> > > >> down and would certainly add overhead. Perhaps my idea to discard
> > > responses
> > > >> that are too new and then retrieve once they are out of the delayed
> > > >> timeframe would be simpler?
> > > >>
> > > >> Again I am somewhat naive to the details. I’m not sure that the path
> > > >> through the code is kept to an absolute minimum when you have a
> > Consumer
> > > >> with a nonzero delay?
> > > >>
> > > >> Regards,
> > > >> Dave
> > > >>
> > > >>> On Feb 19, 2019, at 12:39 PM, Ezequiel Lovelle <
> > > >> ezequiellovelle@gmail.com> wrote:
> > > >>>
> > > >>> Hi Dave!
> > > >>>
> > > >>>> I wonder if clients can add an optional argument to the broker
> call
> > > when
> > > >>> pulling events. The argument would be the amount of delay. Any
> > messages
> > > >>> younger than the delay are not returned by the broker.
> > > >>>
> > > >>> This is exactly what https://github.com/apache/pulsar/pull/3155
> does
> > > :).
> > > >>> We still need to decide if we want to add this feature at client
> side
> > > or
> > > >>> broker side, the pull request does it on the broker.
> > > >>>
> > > >>> --
> > > >>> *Ezequiel Lovelle*
> > > >>>
> > > >>>
> > > >>> On Tue, 19 Feb 2019 at 17:06, Dave Fisher <da...@comcast.net>
> > > wrote:
> > > >>>
> > > >>>> Hi -
> > > >>>>
> > > >>>> My thoughts here may be completely useless but I wonder if clients
> > can
> > > >> add
> > > >>>> an optional argument to the broker call when pulling events. The
> > > >> argument
> > > >>>> would be the amount of delay. Any messages younger than the delay
> > are
> > > >> not
> > > >>>> returned by the broker.
> > > >>>>
> > > >>>> Regards,
> > > >>>> Dave
> > > >>>>
> > > >>>>> On Feb 19, 2019, at 11:47 AM, Ezequiel Lovelle <
> > > >>>> ezequiellovelle@gmail.com> wrote:
> > > >>>>>
> > > >>>>>> The recent changes made to support DLQ caused major problems
> with
> > > >>>> garbage
> > > >>>>> collection
> > > >>>>>
> > > >>>>> If garbage collection is a big concern maybe we could add some
> > config
> > > >>>>> parameter on the broker to disable the usage of this feature and
> > > return
> > > >>>>> BrokerMetadataException in this situation, giving the power to
> the
> > > >>>>> administrator whether to offer this feature or not.
> > > >>>>>
> > > >>>>>> is it acceptable to do it at broker side?
> > > >>>>>
> > > >>>>> I think this is the big question that needs to be answered.
> > > >>>>>
> > > >>>>>> can we just have a separated dispatcher for fixed delayed
> > > >> subscription?
> > > >>>>>
> > > >>>>> I will try to do a completely new approach, simpler, and more
> > > isolated
> > > >>>>> from broker logic. Maybe some way to plug to the broker some
> logic
> > > >>>> without
> > > >>>>> interfering with its core?
> > > >>>>>
> > > >>>>> In our business fixed delay at consumer level regardless of any
> > > >> producer
> > > >>>>> configuration is a big win due to easy implementation and usage.
> > > >>>>>
> > > >>>>> --
> > > >>>>> *Ezequiel Lovelle*
> > > >>>>>
> > > >>>>>
> > > >>>>> On Wed, 13 Feb 2019 at 23:25, Sijie Guo <gu...@gmail.com>
> > wrote:
> > > >>>>>
> > > >>>>>> Agreed that dispatcher is a performance sensitive piece of code.
> > > Feel
> > > >>>> bad
> > > >>>>>> to hear that DLQ causes GC. Are there any issues tracking those
> > > items
> > > >>>> you
> > > >>>>>> guys identified with DLQ changes?
> > > >>>>>>
> > > >>>>>>> How is this different from a subscription running behind?
> > > >>>>>>
> > > >>>>>> As far as I understand form the discussion at #3155, I don't
> think
> > > >>>> there is
> > > >>>>>> a fundamental difference from a backlogged subscriber.
> > > >>>>>> The discussion point will mainly be - if a delayed subscription
> > can
> > > be
> > > >>>>>> implemented with a simpler approach at broker side without
> > changing
> > > >>>> other
> > > >>>>>> dispatcher logic,
> > > >>>>>> is it acceptable to do it at broker side? So we don't have to
> > > >>>> reimplement
> > > >>>>>> the same mechanism at different language clients. I think that's
> > the
> > > >>>> same
> > > >>>>>> tradeoff we were discussing for generic delayed messages.
> > > >>>>>>
> > > >>>>>> My thought would be - can we just have a separated dispatcher
> for
> > > >> fixed
> > > >>>>>> delayed subscription? The logic can be ISOLATED from other
> normal
> > > >>>>>> dispatchers. if users don't enable delayed subscription, they
> will
> > > not
> > > >>>>>> exercise that dispatcher. This can be a good direction to
> explore
> > > for
> > > >>>>>> future changes that are related to dispatchers.
> > > >>>>>>
> > > >>>>>> - Sijie
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> On Thu, Feb 14, 2019 at 8:43 AM Joe F <jo...@gmail.com>
> > > wrote:
> > > >>>>>>
> > > >>>>>>> Delayed subscription is simpler, and probably worth doing in
> the
> > > >> broker
> > > >>>>>> IF
> > > >>>>>>> done right.
> > > >>>>>>>
> > > >>>>>>> How is this different from a subscription running behind?  Why
> > does
> > > >>>>>>> supporting that require this complex a change in the
> dispatcher,
> > > when
> > > >>>> we
> > > >>>>>>> already support backlogged subscribers?
> > > >>>>>>>
> > > >>>>>>> I am extremely wary of changes in the dispatcher. The recent
> > > changes
> > > >>>> made
> > > >>>>>>> to support DLQ caused major problems with garbage collection,
> > > broker
> > > >>>>>>> failure  and service interruptions for us. Even though we ARE
> NOT
> > > >> using
> > > >>>>>> the
> > > >>>>>>> DLQ feature. Not a pleasant experience.
> > > >>>>>>>
> > > >>>>>>> This is a very performance sensitive piece of code, and it
> should
> > > be
> > > >>>>>>> treated as such.
> > > >>>>>>>
> > > >>>>>>> Joe
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> On Wed, Feb 13, 2019 at 3:58 PM Sijie Guo <gu...@gmail.com>
> > > >> wrote:
> > > >>>>>>>
> > > >>>>>>>> Hi all,
> > > >>>>>>>>
> > > >>>>>>>> I am going to wrap up the discussion regarding delayed
> delivery
> > > use
> > > >>>>>>> cases.
> > > >>>>>>>>
> > > >>>>>>>> For arbitrary delayed delivery, there are a few +1s to doing
> > > PIP-26
> > > >> in
> > > >>>>>>>> functions. I am assuming that we will go down this path,
> unless
> > > >> there
> > > >>>>>> are
> > > >>>>>>>> other proposals.
> > > >>>>>>>>
> > > >>>>>>>> However there is a use case Lovelle pointed out about "Fixed
> > > Delayed
> > > >>>>>>>> Message". More specifically it is
> > > >>>>>>>> https://github.com/apache/pulsar/pull/3155
> > > >>>>>>>> (The caption in #3155 is a bit misleading). IMO it is a
> "delayed
> > > >>>>>>>> subscription", basically all messages in the subscription is
> > > delayed
> > > >>>> to
> > > >>>>>>>> dispatch in a given time interval. The consensus of this
> feature
> > > is
> > > >>>> not
> > > >>>>>>> yet
> > > >>>>>>>> achieved. Basically, there will be two approaches for this:
> > > >>>>>>>>
> > > >>>>>>>> a) DONT treat "fixed delayed message" as a different case.
> Just
> > > use
> > > >>>> the
> > > >>>>>>>> same approach as in PIP-26.
> > > >>>>>>>> b) treat "fixed delayed message" as a different case, e.g. we
> > can
> > > >>>>>> better
> > > >>>>>>>> call it "delayed subscription" or whatever can distinguish it
> > from
> > > >>>>>>> general
> > > >>>>>>>> arbitrary delayed delivery. Use the approach
> proposed/discussed
> > in
> > > >>>>>> #3155.
> > > >>>>>>>>
> > > >>>>>>>> I would like the community to discuss this and also come to an
> > > >>>>>> agreement.
> > > >>>>>>>> So Lovelle can move forward with the approach agreed by the
> > > >> community.
> > > >>>>>>>>
> > > >>>>>>>> Thanks,
> > > >>>>>>>> Sijie
> > > >>>>>>>>
> > > >>>>>>>> On Tue, Jan 29, 2019 at 6:30 AM Ezequiel Lovelle <
> > > >>>>>>>> ezequiellovelle@gmail.com>
> > > >>>>>>>> wrote:
> > > >>>>>>>>
> > > >>>>>>>>> "I agree, but that is *not what #3155 tries to achieve."
> > > >>>>>>>>>
> > > >>>>>>>>> This typo made this phrase nonsense, sorry!
> > > >>>>>>>>>
> > > >>>>>>>>> On Mon, 28 Jan 2019, 16:44 Ezequiel Lovelle <
> > > >>>>>> ezequiellovelle@gmail.com
> > > >>>>>>>>> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>>>> What exactly is the delayed delivery use case?
> > > >>>>>>>>>>
> > > >>>>>>>>>> This is helpful on systems relaying on pulsar for persistent
> > > >>>>>>> guarantees
> > > >>>>>>>>>> and using it for synchronization or some sort of checks, but
> > on
> > > >>>>>> such
> > > >>>>>>>>>> systems is common to have some overhead committing data on
> > > >>>>>> persistent
> > > >>>>>>>>>> storage maybe due to buffered mechanism or distributing the
> > data
> > > >>>>>>> across
> > > >>>>>>>>>> the network before being available.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Surely would be more use cases I don't came across right
> now.
> > > >>>>>>>>>>
> > > >>>>>>>>>>> Random insertion and deletion is not what FIFO queues like
> > > Pulsar
> > > >>>>>>> are
> > > >>>>>>>>>> designed for.
> > > >>>>>>>>>>
> > > >>>>>>>>>> I agree, but that is now what #3155 tries to achieve. #3155
> is
> > > >>>>>> just a
> > > >>>>>>>>>> fixed delay for all message in a consumer, that's the reason
> > > that
> > > >>>>>> the
> > > >>>>>>>>>> implementation of #3155 is quite trivial.
> > > >>>>>>>>>>
> > > >>>>>>>>>> +1 from me for doing PIP-26 in functions.
> > > >>>>>>>>>>
> > > >>>>>>>>>> --
> > > >>>>>>>>>> *Ezequiel Lovelle*
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>> On Sat, 26 Jan 2019 at 09:57, Yuva raj <uv...@gmail.com>
> > > wrote:
> > > >>>>>>>>>>
> > > >>>>>>>>>>> Considering the way pulsar is built +1 for doing PIP-26 in
> > > >>>>>>> functions.
> > > >>>>>>>> I
> > > >>>>>>>>> am
> > > >>>>>>>>>>> more of thinking in a way like publish it pulsar we will
> make
> > > it
> > > >>>>>>>>> available
> > > >>>>>>>>>>> in a different queuing system if you need priority and
> delay
> > > >>>>>>> messages
> > > >>>>>>>>>>> support. Pulsar functions would go enough for this kind of
> > use
> > > >>>>>>> cases.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> On Fri, 25 Jan 2019 at 22:29, Ivan Kelly <ivank@apache.org
> >
> > > >>>>>> wrote:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>>> Correct. PIP-26 can be implemented in Functions. I
> believe
> > > the
> > > >>>>>>>> last
> > > >>>>>>>>>>>>> discussion in PIP-26 thread kind of agree on functions
> > > >>>>>> approach.
> > > >>>>>>>>>>>>> If the community is okay with PIP-26 in functions, I
> think
> > > >>>>>> that
> > > >>>>>>> is
> > > >>>>>>>>>>>> probably
> > > >>>>>>>>>>>>> a good approach to start.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> +1 for doing it in functions.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> -Ivan
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> --
> > > >>>>>>>>>>> *Thanks*
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> *Yuvaraj L*
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>
> > > >>>>
> > > >>
> > > >>
> > >
> > >
> >
>
>
> --
> -Ali
>

Re: [DISCUSSION] Delayed message delivery

Posted by Ali Ahmed <ah...@gmail.com>.

Seems like we are implementing per message timers.

Not aware of any log pub sub that does that expect rocketmq , not sure how
performant that is.
https://github.com/apache/rocketmq/blob/2b692c912d18c0f9889fd73358581bcccf37bbbe/store/src/main/java/org/apache/rocketmq/store/schedule/ScheduleMessageService.java

Seems simpler to just have delay on a topic level.  The cursor for client
subscriptions can make messages available after a delay.
I don't know if we can achieve significant throughput with so many active
timers.

On Sat, Mar 2, 2019 at 2:49 AM Sijie Guo <gu...@gmail.com> wrote:

> I am trying to draw a conclusion on this email thread.
>
> > Maybe some way to plug to the broker some logic without
> interfering with its core?
> >  In our business fixed delay at consumer level regardless of any producer
> > configuration is a big win due to easy implementation and usage.
>
> Based on Ezequiel's last comment, if we are able to find a way to plug a
> new "fixed delay" dispatcher without touching other dispatcher logic, is
> that a good approach for the community to proceed on this direction?
>
> - Sijie
>
>
> On Wed, Feb 20, 2019 at 8:26 AM 李鹏辉gmail <co...@gmail.com> wrote:
>
> > Sorry for hear that DLQ causes GC.
> >
> > Agree with discussed before, Dispatcher is a performance sensitive piece
> > of code.
> > If we make changes on the dispatcher, we must pay attention to memory
> > overhead and blocking.
> >
> > I prefer fixed delayed message solution(aka delayed time level). User
> > can define multi topics with deferent delay.Topic is still a FIFO model.
> >
> > Improve user experience by packaging client API, topics can be created
> > automatically, User can customize the delay level.
> >
> > In our scene, This can already meet most of the needs. Currently depends
> > on DLQ feature. We know from the user where the experience is not very
> > good.
> > User need to maintain the message expired.
> >
> > So, If we can avoid complexity of use and do not impose a performance
> > burden
> > on message dispatching. I prefer implement it on broker side(broker do
> not
> > need to sorting messages by time, just need to check the tail message
> > can be dispatch, i don’t think this will cause dispatching performance
> > problem).
> >
> > For more complicated delayed messages(e.g. arbitrary delayed delivery).
> > I don’t think pulsar need to support such complicated scene(after we
> > discussed before).
> > In our scene, we have more complicated message requirement(e.g. delay
> > message can be
> > paused, stoped, and re-run. e.g. cron messages).
> >
> > However these case is not very widely used.
> >
> > - Penghui
> >
> >
> > > 在 2019年2月20日，06:37，Sebastián Schepens
> > <se...@mercadolibre.com.INVALID> 写道：
> > >
> > > Hi,
> > > I am really not into any details of the proposed implementation, but
> was
> > > just wondering, has anyone had a look at how Uber implemented this in
> > > Cherami? Cherami seems very similar to Pulsar, its storage system also
> > > seems very similar to bookkeeper. They seem to implement delayed queues
> > by
> > > storing the time as part of the key in rocksdb and using sorted
> > iterators,
> > > could this be done in Pulsar as well?
> > >
> > > Cheers,
> > > Sebastian
> > >
> > > On Tue, Feb 19, 2019 at 6:02 PM Dave Fisher <da...@comcast.net>
> > wrote:
> > >
> > >> Hi -
> > >>
> > >> Well, it does, but can this be implemented without building a
> > delayQueue?
> > >> It seems to me that a delayQueue both breaks resiliency if the broker
> > goes
> > >> down and would certainly add overhead. Perhaps my idea to discard
> > responses
> > >> that are too new and then retrieve once they are out of the delayed
> > >> timeframe would be simpler?
> > >>
> > >> Again I am somewhat naive to the details. I’m not sure that the path
> > >> through the code is kept to an absolute minimum when you have a
> Consumer
> > >> with a nonzero delay?
> > >>
> > >> Regards,
> > >> Dave
> > >>
> > >>> On Feb 19, 2019, at 12:39 PM, Ezequiel Lovelle <
> > >> ezequiellovelle@gmail.com> wrote:
> > >>>
> > >>> Hi Dave!
> > >>>
> > >>>> I wonder if clients can add an optional argument to the broker call
> > when
> > >>> pulling events. The argument would be the amount of delay. Any
> messages
> > >>> younger than the delay are not returned by the broker.
> > >>>
> > >>> This is exactly what https://github.com/apache/pulsar/pull/3155 does
> > :).
> > >>> We still need to decide if we want to add this feature at client side
> > or
> > >>> broker side, the pull request does it on the broker.
> > >>>
> > >>> --
> > >>> *Ezequiel Lovelle*
> > >>>
> > >>>
> > >>> On Tue, 19 Feb 2019 at 17:06, Dave Fisher <da...@comcast.net>
> > wrote:
> > >>>
> > >>>> Hi -
> > >>>>
> > >>>> My thoughts here may be completely useless but I wonder if clients
> can
> > >> add
> > >>>> an optional argument to the broker call when pulling events. The
> > >> argument
> > >>>> would be the amount of delay. Any messages younger than the delay
> are
> > >> not
> > >>>> returned by the broker.
> > >>>>
> > >>>> Regards,
> > >>>> Dave
> > >>>>
> > >>>>> On Feb 19, 2019, at 11:47 AM, Ezequiel Lovelle <
> > >>>> ezequiellovelle@gmail.com> wrote:
> > >>>>>
> > >>>>>> The recent changes made to support DLQ caused major problems with
> > >>>> garbage
> > >>>>> collection
> > >>>>>
> > >>>>> If garbage collection is a big concern maybe we could add some
> config
> > >>>>> parameter on the broker to disable the usage of this feature and
> > return
> > >>>>> BrokerMetadataException in this situation, giving the power to the
> > >>>>> administrator whether to offer this feature or not.
> > >>>>>
> > >>>>>> is it acceptable to do it at broker side?
> > >>>>>
> > >>>>> I think this is the big question that needs to be answered.
> > >>>>>
> > >>>>>> can we just have a separated dispatcher for fixed delayed
> > >> subscription?
> > >>>>>
> > >>>>> I will try to do a completely new approach, simpler, and more
> > isolated
> > >>>>> from broker logic. Maybe some way to plug to the broker some logic
> > >>>> without
> > >>>>> interfering with its core?
> > >>>>>
> > >>>>> In our business fixed delay at consumer level regardless of any
> > >> producer
> > >>>>> configuration is a big win due to easy implementation and usage.
> > >>>>>
> > >>>>> --
> > >>>>> *Ezequiel Lovelle*
> > >>>>>
> > >>>>>
> > >>>>> On Wed, 13 Feb 2019 at 23:25, Sijie Guo <gu...@gmail.com>
> wrote:
> > >>>>>
> > >>>>>> Agreed that dispatcher is a performance sensitive piece of code.
> > Feel
> > >>>> bad
> > >>>>>> to hear that DLQ causes GC. Are there any issues tracking those
> > items
> > >>>> you
> > >>>>>> guys identified with DLQ changes?
> > >>>>>>
> > >>>>>>> How is this different from a subscription running behind?
> > >>>>>>
> > >>>>>> As far as I understand form the discussion at #3155, I don't think
> > >>>> there is
> > >>>>>> a fundamental difference from a backlogged subscriber.
> > >>>>>> The discussion point will mainly be - if a delayed subscription
> can
> > be
> > >>>>>> implemented with a simpler approach at broker side without
> changing
> > >>>> other
> > >>>>>> dispatcher logic,
> > >>>>>> is it acceptable to do it at broker side? So we don't have to
> > >>>> reimplement
> > >>>>>> the same mechanism at different language clients. I think that's
> the
> > >>>> same
> > >>>>>> tradeoff we were discussing for generic delayed messages.
> > >>>>>>
> > >>>>>> My thought would be - can we just have a separated dispatcher for
> > >> fixed
> > >>>>>> delayed subscription? The logic can be ISOLATED from other normal
> > >>>>>> dispatchers. if users don't enable delayed subscription, they will
> > not
> > >>>>>> exercise that dispatcher. This can be a good direction to explore
> > for
> > >>>>>> future changes that are related to dispatchers.
> > >>>>>>
> > >>>>>> - Sijie
> > >>>>>>
> > >>>>>>
> > >>>>>> On Thu, Feb 14, 2019 at 8:43 AM Joe F <jo...@gmail.com>
> > wrote:
> > >>>>>>
> > >>>>>>> Delayed subscription is simpler, and probably worth doing in the
> > >> broker
> > >>>>>> IF
> > >>>>>>> done right.
> > >>>>>>>
> > >>>>>>> How is this different from a subscription running behind?  Why
> does
> > >>>>>>> supporting that require this complex a change in the dispatcher,
> > when
> > >>>> we
> > >>>>>>> already support backlogged subscribers?
> > >>>>>>>
> > >>>>>>> I am extremely wary of changes in the dispatcher. The recent
> > changes
> > >>>> made
> > >>>>>>> to support DLQ caused major problems with garbage collection,
> > broker
> > >>>>>>> failure  and service interruptions for us. Even though we ARE NOT
> > >> using
> > >>>>>> the
> > >>>>>>> DLQ feature. Not a pleasant experience.
> > >>>>>>>
> > >>>>>>> This is a very performance sensitive piece of code, and it should
> > be
> > >>>>>>> treated as such.
> > >>>>>>>
> > >>>>>>> Joe
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Wed, Feb 13, 2019 at 3:58 PM Sijie Guo <gu...@gmail.com>
> > >> wrote:
> > >>>>>>>
> > >>>>>>>> Hi all,
> > >>>>>>>>
> > >>>>>>>> I am going to wrap up the discussion regarding delayed delivery
> > use
> > >>>>>>> cases.
> > >>>>>>>>
> > >>>>>>>> For arbitrary delayed delivery, there are a few +1s to doing
> > PIP-26
> > >> in
> > >>>>>>>> functions. I am assuming that we will go down this path, unless
> > >> there
> > >>>>>> are
> > >>>>>>>> other proposals.
> > >>>>>>>>
> > >>>>>>>> However there is a use case Lovelle pointed out about "Fixed
> > Delayed
> > >>>>>>>> Message". More specifically it is
> > >>>>>>>> https://github.com/apache/pulsar/pull/3155
> > >>>>>>>> (The caption in #3155 is a bit misleading). IMO it is a "delayed
> > >>>>>>>> subscription", basically all messages in the subscription is
> > delayed
> > >>>> to
> > >>>>>>>> dispatch in a given time interval. The consensus of this feature
> > is
> > >>>> not
> > >>>>>>> yet
> > >>>>>>>> achieved. Basically, there will be two approaches for this:
> > >>>>>>>>
> > >>>>>>>> a) DONT treat "fixed delayed message" as a different case. Just
> > use
> > >>>> the
> > >>>>>>>> same approach as in PIP-26.
> > >>>>>>>> b) treat "fixed delayed message" as a different case, e.g. we
> can
> > >>>>>> better
> > >>>>>>>> call it "delayed subscription" or whatever can distinguish it
> from
> > >>>>>>> general
> > >>>>>>>> arbitrary delayed delivery. Use the approach proposed/discussed
> in
> > >>>>>> #3155.
> > >>>>>>>>
> > >>>>>>>> I would like the community to discuss this and also come to an
> > >>>>>> agreement.
> > >>>>>>>> So Lovelle can move forward with the approach agreed by the
> > >> community.
> > >>>>>>>>
> > >>>>>>>> Thanks,
> > >>>>>>>> Sijie
> > >>>>>>>>
> > >>>>>>>> On Tue, Jan 29, 2019 at 6:30 AM Ezequiel Lovelle <
> > >>>>>>>> ezequiellovelle@gmail.com>
> > >>>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> "I agree, but that is *not what #3155 tries to achieve."
> > >>>>>>>>>
> > >>>>>>>>> This typo made this phrase nonsense, sorry!
> > >>>>>>>>>
> > >>>>>>>>> On Mon, 28 Jan 2019, 16:44 Ezequiel Lovelle <
> > >>>>>> ezequiellovelle@gmail.com
> > >>>>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>>> What exactly is the delayed delivery use case?
> > >>>>>>>>>>
> > >>>>>>>>>> This is helpful on systems relaying on pulsar for persistent
> > >>>>>>> guarantees
> > >>>>>>>>>> and using it for synchronization or some sort of checks, but
> on
> > >>>>>> such
> > >>>>>>>>>> systems is common to have some overhead committing data on
> > >>>>>> persistent
> > >>>>>>>>>> storage maybe due to buffered mechanism or distributing the
> data
> > >>>>>>> across
> > >>>>>>>>>> the network before being available.
> > >>>>>>>>>>
> > >>>>>>>>>> Surely would be more use cases I don't came across right now.
> > >>>>>>>>>>
> > >>>>>>>>>>> Random insertion and deletion is not what FIFO queues like
> > Pulsar
> > >>>>>>> are
> > >>>>>>>>>> designed for.
> > >>>>>>>>>>
> > >>>>>>>>>> I agree, but that is now what #3155 tries to achieve. #3155 is
> > >>>>>> just a
> > >>>>>>>>>> fixed delay for all message in a consumer, that's the reason
> > that
> > >>>>>> the
> > >>>>>>>>>> implementation of #3155 is quite trivial.
> > >>>>>>>>>>
> > >>>>>>>>>> +1 from me for doing PIP-26 in functions.
> > >>>>>>>>>>
> > >>>>>>>>>> --
> > >>>>>>>>>> *Ezequiel Lovelle*
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> On Sat, 26 Jan 2019 at 09:57, Yuva raj <uv...@gmail.com>
> > wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>> Considering the way pulsar is built +1 for doing PIP-26 in
> > >>>>>>> functions.
> > >>>>>>>> I
> > >>>>>>>>> am
> > >>>>>>>>>>> more of thinking in a way like publish it pulsar we will make
> > it
> > >>>>>>>>> available
> > >>>>>>>>>>> in a different queuing system if you need priority and delay
> > >>>>>>> messages
> > >>>>>>>>>>> support. Pulsar functions would go enough for this kind of
> use
> > >>>>>>> cases.
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Fri, 25 Jan 2019 at 22:29, Ivan Kelly <iv...@apache.org>
> > >>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>>> Correct. PIP-26 can be implemented in Functions. I believe
> > the
> > >>>>>>>> last
> > >>>>>>>>>>>>> discussion in PIP-26 thread kind of agree on functions
> > >>>>>> approach.
> > >>>>>>>>>>>>> If the community is okay with PIP-26 in functions, I think
> > >>>>>> that
> > >>>>>>> is
> > >>>>>>>>>>>> probably
> > >>>>>>>>>>>>> a good approach to start.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> +1 for doing it in functions.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> -Ivan
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> --
> > >>>>>>>>>>> *Thanks*
> > >>>>>>>>>>>
> > >>>>>>>>>>> *Yuvaraj L*
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>
> > >>>>
> > >>
> > >>
> >
> >
>


-- 
-Ali

Re: [DISCUSSION] Delayed message delivery

Posted by Sijie Guo <gu...@gmail.com>.

I am trying to draw a conclusion on this email thread.

> Maybe some way to plug to the broker some logic without
interfering with its core?
>  In our business fixed delay at consumer level regardless of any producer
> configuration is a big win due to easy implementation and usage.

Based on Ezequiel's last comment, if we are able to find a way to plug a
new "fixed delay" dispatcher without touching other dispatcher logic, is
that a good approach for the community to proceed on this direction?

- Sijie


On Wed, Feb 20, 2019 at 8:26 AM 李鹏辉gmail <co...@gmail.com> wrote:

> Sorry for hear that DLQ causes GC.
>
> Agree with discussed before, Dispatcher is a performance sensitive piece
> of code.
> If we make changes on the dispatcher, we must pay attention to memory
> overhead and blocking.
>
> I prefer fixed delayed message solution(aka delayed time level). User
> can define multi topics with deferent delay.Topic is still a FIFO model.
>
> Improve user experience by packaging client API, topics can be created
> automatically, User can customize the delay level.
>
> In our scene, This can already meet most of the needs. Currently depends
> on DLQ feature. We know from the user where the experience is not very
> good.
> User need to maintain the message expired.
>
> So, If we can avoid complexity of use and do not impose a performance
> burden
> on message dispatching. I prefer implement it on broker side(broker do not
> need to sorting messages by time, just need to check the tail message
> can be dispatch, i don’t think this will cause dispatching performance
> problem).
>
> For more complicated delayed messages(e.g. arbitrary delayed delivery).
> I don’t think pulsar need to support such complicated scene(after we
> discussed before).
> In our scene, we have more complicated message requirement(e.g. delay
> message can be
> paused, stoped, and re-run. e.g. cron messages).
>
> However these case is not very widely used.
>
> - Penghui
>
>
> > 在 2019年2月20日，06:37，Sebastián Schepens
> <se...@mercadolibre.com.INVALID> 写道：
> >
> > Hi,
> > I am really not into any details of the proposed implementation, but was
> > just wondering, has anyone had a look at how Uber implemented this in
> > Cherami? Cherami seems very similar to Pulsar, its storage system also
> > seems very similar to bookkeeper. They seem to implement delayed queues
> by
> > storing the time as part of the key in rocksdb and using sorted
> iterators,
> > could this be done in Pulsar as well?
> >
> > Cheers,
> > Sebastian
> >
> > On Tue, Feb 19, 2019 at 6:02 PM Dave Fisher <da...@comcast.net>
> wrote:
> >
> >> Hi -
> >>
> >> Well, it does, but can this be implemented without building a
> delayQueue?
> >> It seems to me that a delayQueue both breaks resiliency if the broker
> goes
> >> down and would certainly add overhead. Perhaps my idea to discard
> responses
> >> that are too new and then retrieve once they are out of the delayed
> >> timeframe would be simpler?
> >>
> >> Again I am somewhat naive to the details. I’m not sure that the path
> >> through the code is kept to an absolute minimum when you have a Consumer
> >> with a nonzero delay?
> >>
> >> Regards,
> >> Dave
> >>
> >>> On Feb 19, 2019, at 12:39 PM, Ezequiel Lovelle <
> >> ezequiellovelle@gmail.com> wrote:
> >>>
> >>> Hi Dave!
> >>>
> >>>> I wonder if clients can add an optional argument to the broker call
> when
> >>> pulling events. The argument would be the amount of delay. Any messages
> >>> younger than the delay are not returned by the broker.
> >>>
> >>> This is exactly what https://github.com/apache/pulsar/pull/3155 does
> :).
> >>> We still need to decide if we want to add this feature at client side
> or
> >>> broker side, the pull request does it on the broker.
> >>>
> >>> --
> >>> *Ezequiel Lovelle*
> >>>
> >>>
> >>> On Tue, 19 Feb 2019 at 17:06, Dave Fisher <da...@comcast.net>
> wrote:
> >>>
> >>>> Hi -
> >>>>
> >>>> My thoughts here may be completely useless but I wonder if clients can
> >> add
> >>>> an optional argument to the broker call when pulling events. The
> >> argument
> >>>> would be the amount of delay. Any messages younger than the delay are
> >> not
> >>>> returned by the broker.
> >>>>
> >>>> Regards,
> >>>> Dave
> >>>>
> >>>>> On Feb 19, 2019, at 11:47 AM, Ezequiel Lovelle <
> >>>> ezequiellovelle@gmail.com> wrote:
> >>>>>
> >>>>>> The recent changes made to support DLQ caused major problems with
> >>>> garbage
> >>>>> collection
> >>>>>
> >>>>> If garbage collection is a big concern maybe we could add some config
> >>>>> parameter on the broker to disable the usage of this feature and
> return
> >>>>> BrokerMetadataException in this situation, giving the power to the
> >>>>> administrator whether to offer this feature or not.
> >>>>>
> >>>>>> is it acceptable to do it at broker side?
> >>>>>
> >>>>> I think this is the big question that needs to be answered.
> >>>>>
> >>>>>> can we just have a separated dispatcher for fixed delayed
> >> subscription?
> >>>>>
> >>>>> I will try to do a completely new approach, simpler, and more
> isolated
> >>>>> from broker logic. Maybe some way to plug to the broker some logic
> >>>> without
> >>>>> interfering with its core?
> >>>>>
> >>>>> In our business fixed delay at consumer level regardless of any
> >> producer
> >>>>> configuration is a big win due to easy implementation and usage.
> >>>>>
> >>>>> --
> >>>>> *Ezequiel Lovelle*
> >>>>>
> >>>>>
> >>>>> On Wed, 13 Feb 2019 at 23:25, Sijie Guo <gu...@gmail.com> wrote:
> >>>>>
> >>>>>> Agreed that dispatcher is a performance sensitive piece of code.
> Feel
> >>>> bad
> >>>>>> to hear that DLQ causes GC. Are there any issues tracking those
> items
> >>>> you
> >>>>>> guys identified with DLQ changes?
> >>>>>>
> >>>>>>> How is this different from a subscription running behind?
> >>>>>>
> >>>>>> As far as I understand form the discussion at #3155, I don't think
> >>>> there is
> >>>>>> a fundamental difference from a backlogged subscriber.
> >>>>>> The discussion point will mainly be - if a delayed subscription can
> be
> >>>>>> implemented with a simpler approach at broker side without changing
> >>>> other
> >>>>>> dispatcher logic,
> >>>>>> is it acceptable to do it at broker side? So we don't have to
> >>>> reimplement
> >>>>>> the same mechanism at different language clients. I think that's the
> >>>> same
> >>>>>> tradeoff we were discussing for generic delayed messages.
> >>>>>>
> >>>>>> My thought would be - can we just have a separated dispatcher for
> >> fixed
> >>>>>> delayed subscription? The logic can be ISOLATED from other normal
> >>>>>> dispatchers. if users don't enable delayed subscription, they will
> not
> >>>>>> exercise that dispatcher. This can be a good direction to explore
> for
> >>>>>> future changes that are related to dispatchers.
> >>>>>>
> >>>>>> - Sijie
> >>>>>>
> >>>>>>
> >>>>>> On Thu, Feb 14, 2019 at 8:43 AM Joe F <jo...@gmail.com>
> wrote:
> >>>>>>
> >>>>>>> Delayed subscription is simpler, and probably worth doing in the
> >> broker
> >>>>>> IF
> >>>>>>> done right.
> >>>>>>>
> >>>>>>> How is this different from a subscription running behind?  Why does
> >>>>>>> supporting that require this complex a change in the dispatcher,
> when
> >>>> we
> >>>>>>> already support backlogged subscribers?
> >>>>>>>
> >>>>>>> I am extremely wary of changes in the dispatcher. The recent
> changes
> >>>> made
> >>>>>>> to support DLQ caused major problems with garbage collection,
> broker
> >>>>>>> failure  and service interruptions for us. Even though we ARE NOT
> >> using
> >>>>>> the
> >>>>>>> DLQ feature. Not a pleasant experience.
> >>>>>>>
> >>>>>>> This is a very performance sensitive piece of code, and it should
> be
> >>>>>>> treated as such.
> >>>>>>>
> >>>>>>> Joe
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Wed, Feb 13, 2019 at 3:58 PM Sijie Guo <gu...@gmail.com>
> >> wrote:
> >>>>>>>
> >>>>>>>> Hi all,
> >>>>>>>>
> >>>>>>>> I am going to wrap up the discussion regarding delayed delivery
> use
> >>>>>>> cases.
> >>>>>>>>
> >>>>>>>> For arbitrary delayed delivery, there are a few +1s to doing
> PIP-26
> >> in
> >>>>>>>> functions. I am assuming that we will go down this path, unless
> >> there
> >>>>>> are
> >>>>>>>> other proposals.
> >>>>>>>>
> >>>>>>>> However there is a use case Lovelle pointed out about "Fixed
> Delayed
> >>>>>>>> Message". More specifically it is
> >>>>>>>> https://github.com/apache/pulsar/pull/3155
> >>>>>>>> (The caption in #3155 is a bit misleading). IMO it is a "delayed
> >>>>>>>> subscription", basically all messages in the subscription is
> delayed
> >>>> to
> >>>>>>>> dispatch in a given time interval. The consensus of this feature
> is
> >>>> not
> >>>>>>> yet
> >>>>>>>> achieved. Basically, there will be two approaches for this:
> >>>>>>>>
> >>>>>>>> a) DONT treat "fixed delayed message" as a different case. Just
> use
> >>>> the
> >>>>>>>> same approach as in PIP-26.
> >>>>>>>> b) treat "fixed delayed message" as a different case, e.g. we can
> >>>>>> better
> >>>>>>>> call it "delayed subscription" or whatever can distinguish it from
> >>>>>>> general
> >>>>>>>> arbitrary delayed delivery. Use the approach proposed/discussed in
> >>>>>> #3155.
> >>>>>>>>
> >>>>>>>> I would like the community to discuss this and also come to an
> >>>>>> agreement.
> >>>>>>>> So Lovelle can move forward with the approach agreed by the
> >> community.
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Sijie
> >>>>>>>>
> >>>>>>>> On Tue, Jan 29, 2019 at 6:30 AM Ezequiel Lovelle <
> >>>>>>>> ezequiellovelle@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> "I agree, but that is *not what #3155 tries to achieve."
> >>>>>>>>>
> >>>>>>>>> This typo made this phrase nonsense, sorry!
> >>>>>>>>>
> >>>>>>>>> On Mon, 28 Jan 2019, 16:44 Ezequiel Lovelle <
> >>>>>> ezequiellovelle@gmail.com
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>>> What exactly is the delayed delivery use case?
> >>>>>>>>>>
> >>>>>>>>>> This is helpful on systems relaying on pulsar for persistent
> >>>>>>> guarantees
> >>>>>>>>>> and using it for synchronization or some sort of checks, but on
> >>>>>> such
> >>>>>>>>>> systems is common to have some overhead committing data on
> >>>>>> persistent
> >>>>>>>>>> storage maybe due to buffered mechanism or distributing the data
> >>>>>>> across
> >>>>>>>>>> the network before being available.
> >>>>>>>>>>
> >>>>>>>>>> Surely would be more use cases I don't came across right now.
> >>>>>>>>>>
> >>>>>>>>>>> Random insertion and deletion is not what FIFO queues like
> Pulsar
> >>>>>>> are
> >>>>>>>>>> designed for.
> >>>>>>>>>>
> >>>>>>>>>> I agree, but that is now what #3155 tries to achieve. #3155 is
> >>>>>> just a
> >>>>>>>>>> fixed delay for all message in a consumer, that's the reason
> that
> >>>>>> the
> >>>>>>>>>> implementation of #3155 is quite trivial.
> >>>>>>>>>>
> >>>>>>>>>> +1 from me for doing PIP-26 in functions.
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> *Ezequiel Lovelle*
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Sat, 26 Jan 2019 at 09:57, Yuva raj <uv...@gmail.com>
> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Considering the way pulsar is built +1 for doing PIP-26 in
> >>>>>>> functions.
> >>>>>>>> I
> >>>>>>>>> am
> >>>>>>>>>>> more of thinking in a way like publish it pulsar we will make
> it
> >>>>>>>>> available
> >>>>>>>>>>> in a different queuing system if you need priority and delay
> >>>>>>> messages
> >>>>>>>>>>> support. Pulsar functions would go enough for this kind of use
> >>>>>>> cases.
> >>>>>>>>>>>
> >>>>>>>>>>> On Fri, 25 Jan 2019 at 22:29, Ivan Kelly <iv...@apache.org>
> >>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>>> Correct. PIP-26 can be implemented in Functions. I believe
> the
> >>>>>>>> last
> >>>>>>>>>>>>> discussion in PIP-26 thread kind of agree on functions
> >>>>>> approach.
> >>>>>>>>>>>>> If the community is okay with PIP-26 in functions, I think
> >>>>>> that
> >>>>>>> is
> >>>>>>>>>>>> probably
> >>>>>>>>>>>>> a good approach to start.
> >>>>>>>>>>>>
> >>>>>>>>>>>> +1 for doing it in functions.
> >>>>>>>>>>>>
> >>>>>>>>>>>> -Ivan
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>> *Thanks*
> >>>>>>>>>>>
> >>>>>>>>>>> *Yuvaraj L*
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
> >>
>
>

Re: [DISCUSSION] Delayed message delivery

Posted by 李鹏辉gmail <co...@gmail.com>.

Sorry for hear that DLQ causes GC.

Agree with discussed before, Dispatcher is a performance sensitive piece of code.
If we make changes on the dispatcher, we must pay attention to memory 
overhead and blocking.

I prefer fixed delayed message solution(aka delayed time level). User
can define multi topics with deferent delay.Topic is still a FIFO model.

Improve user experience by packaging client API, topics can be created
automatically, User can customize the delay level.

In our scene, This can already meet most of the needs. Currently depends
on DLQ feature. We know from the user where the experience is not very good.
User need to maintain the message expired.

So, If we can avoid complexity of use and do not impose a performance burden
on message dispatching. I prefer implement it on broker side(broker do not
need to sorting messages by time, just need to check the tail message 
can be dispatch, i don’t think this will cause dispatching performance problem).

For more complicated delayed messages(e.g. arbitrary delayed delivery).
I don’t think pulsar need to support such complicated scene(after we discussed before).
In our scene, we have more complicated message requirement(e.g. delay message can be
paused, stoped, and re-run. e.g. cron messages).

However these case is not very widely used.

- Penghui


> 在 2019年2月20日，06:37，Sebastián Schepens <se...@mercadolibre.com.INVALID> 写道：
> 
> Hi,
> I am really not into any details of the proposed implementation, but was
> just wondering, has anyone had a look at how Uber implemented this in
> Cherami? Cherami seems very similar to Pulsar, its storage system also
> seems very similar to bookkeeper. They seem to implement delayed queues by
> storing the time as part of the key in rocksdb and using sorted iterators,
> could this be done in Pulsar as well?
> 
> Cheers,
> Sebastian
> 
> On Tue, Feb 19, 2019 at 6:02 PM Dave Fisher <da...@comcast.net> wrote:
> 
>> Hi -
>> 
>> Well, it does, but can this be implemented without building a delayQueue?
>> It seems to me that a delayQueue both breaks resiliency if the broker goes
>> down and would certainly add overhead. Perhaps my idea to discard responses
>> that are too new and then retrieve once they are out of the delayed
>> timeframe would be simpler?
>> 
>> Again I am somewhat naive to the details. I’m not sure that the path
>> through the code is kept to an absolute minimum when you have a Consumer
>> with a nonzero delay?
>> 
>> Regards,
>> Dave
>> 
>>> On Feb 19, 2019, at 12:39 PM, Ezequiel Lovelle <
>> ezequiellovelle@gmail.com> wrote:
>>> 
>>> Hi Dave!
>>> 
>>>> I wonder if clients can add an optional argument to the broker call when
>>> pulling events. The argument would be the amount of delay. Any messages
>>> younger than the delay are not returned by the broker.
>>> 
>>> This is exactly what https://github.com/apache/pulsar/pull/3155 does :).
>>> We still need to decide if we want to add this feature at client side or
>>> broker side, the pull request does it on the broker.
>>> 
>>> --
>>> *Ezequiel Lovelle*
>>> 
>>> 
>>> On Tue, 19 Feb 2019 at 17:06, Dave Fisher <da...@comcast.net> wrote:
>>> 
>>>> Hi -
>>>> 
>>>> My thoughts here may be completely useless but I wonder if clients can
>> add
>>>> an optional argument to the broker call when pulling events. The
>> argument
>>>> would be the amount of delay. Any messages younger than the delay are
>> not
>>>> returned by the broker.
>>>> 
>>>> Regards,
>>>> Dave
>>>> 
>>>>> On Feb 19, 2019, at 11:47 AM, Ezequiel Lovelle <
>>>> ezequiellovelle@gmail.com> wrote:
>>>>> 
>>>>>> The recent changes made to support DLQ caused major problems with
>>>> garbage
>>>>> collection
>>>>> 
>>>>> If garbage collection is a big concern maybe we could add some config
>>>>> parameter on the broker to disable the usage of this feature and return
>>>>> BrokerMetadataException in this situation, giving the power to the
>>>>> administrator whether to offer this feature or not.
>>>>> 
>>>>>> is it acceptable to do it at broker side?
>>>>> 
>>>>> I think this is the big question that needs to be answered.
>>>>> 
>>>>>> can we just have a separated dispatcher for fixed delayed
>> subscription?
>>>>> 
>>>>> I will try to do a completely new approach, simpler, and more isolated
>>>>> from broker logic. Maybe some way to plug to the broker some logic
>>>> without
>>>>> interfering with its core?
>>>>> 
>>>>> In our business fixed delay at consumer level regardless of any
>> producer
>>>>> configuration is a big win due to easy implementation and usage.
>>>>> 
>>>>> --
>>>>> *Ezequiel Lovelle*
>>>>> 
>>>>> 
>>>>> On Wed, 13 Feb 2019 at 23:25, Sijie Guo <gu...@gmail.com> wrote:
>>>>> 
>>>>>> Agreed that dispatcher is a performance sensitive piece of code. Feel
>>>> bad
>>>>>> to hear that DLQ causes GC. Are there any issues tracking those items
>>>> you
>>>>>> guys identified with DLQ changes?
>>>>>> 
>>>>>>> How is this different from a subscription running behind?
>>>>>> 
>>>>>> As far as I understand form the discussion at #3155, I don't think
>>>> there is
>>>>>> a fundamental difference from a backlogged subscriber.
>>>>>> The discussion point will mainly be - if a delayed subscription can be
>>>>>> implemented with a simpler approach at broker side without changing
>>>> other
>>>>>> dispatcher logic,
>>>>>> is it acceptable to do it at broker side? So we don't have to
>>>> reimplement
>>>>>> the same mechanism at different language clients. I think that's the
>>>> same
>>>>>> tradeoff we were discussing for generic delayed messages.
>>>>>> 
>>>>>> My thought would be - can we just have a separated dispatcher for
>> fixed
>>>>>> delayed subscription? The logic can be ISOLATED from other normal
>>>>>> dispatchers. if users don't enable delayed subscription, they will not
>>>>>> exercise that dispatcher. This can be a good direction to explore for
>>>>>> future changes that are related to dispatchers.
>>>>>> 
>>>>>> - Sijie
>>>>>> 
>>>>>> 
>>>>>> On Thu, Feb 14, 2019 at 8:43 AM Joe F <jo...@gmail.com> wrote:
>>>>>> 
>>>>>>> Delayed subscription is simpler, and probably worth doing in the
>> broker
>>>>>> IF
>>>>>>> done right.
>>>>>>> 
>>>>>>> How is this different from a subscription running behind?  Why does
>>>>>>> supporting that require this complex a change in the dispatcher, when
>>>> we
>>>>>>> already support backlogged subscribers?
>>>>>>> 
>>>>>>> I am extremely wary of changes in the dispatcher. The recent changes
>>>> made
>>>>>>> to support DLQ caused major problems with garbage collection, broker
>>>>>>> failure  and service interruptions for us. Even though we ARE NOT
>> using
>>>>>> the
>>>>>>> DLQ feature. Not a pleasant experience.
>>>>>>> 
>>>>>>> This is a very performance sensitive piece of code, and it should be
>>>>>>> treated as such.
>>>>>>> 
>>>>>>> Joe
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, Feb 13, 2019 at 3:58 PM Sijie Guo <gu...@gmail.com>
>> wrote:
>>>>>>> 
>>>>>>>> Hi all,
>>>>>>>> 
>>>>>>>> I am going to wrap up the discussion regarding delayed delivery use
>>>>>>> cases.
>>>>>>>> 
>>>>>>>> For arbitrary delayed delivery, there are a few +1s to doing PIP-26
>> in
>>>>>>>> functions. I am assuming that we will go down this path, unless
>> there
>>>>>> are
>>>>>>>> other proposals.
>>>>>>>> 
>>>>>>>> However there is a use case Lovelle pointed out about "Fixed Delayed
>>>>>>>> Message". More specifically it is
>>>>>>>> https://github.com/apache/pulsar/pull/3155
>>>>>>>> (The caption in #3155 is a bit misleading). IMO it is a "delayed
>>>>>>>> subscription", basically all messages in the subscription is delayed
>>>> to
>>>>>>>> dispatch in a given time interval. The consensus of this feature is
>>>> not
>>>>>>> yet
>>>>>>>> achieved. Basically, there will be two approaches for this:
>>>>>>>> 
>>>>>>>> a) DONT treat "fixed delayed message" as a different case. Just use
>>>> the
>>>>>>>> same approach as in PIP-26.
>>>>>>>> b) treat "fixed delayed message" as a different case, e.g. we can
>>>>>> better
>>>>>>>> call it "delayed subscription" or whatever can distinguish it from
>>>>>>> general
>>>>>>>> arbitrary delayed delivery. Use the approach proposed/discussed in
>>>>>> #3155.
>>>>>>>> 
>>>>>>>> I would like the community to discuss this and also come to an
>>>>>> agreement.
>>>>>>>> So Lovelle can move forward with the approach agreed by the
>> community.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Sijie
>>>>>>>> 
>>>>>>>> On Tue, Jan 29, 2019 at 6:30 AM Ezequiel Lovelle <
>>>>>>>> ezequiellovelle@gmail.com>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> "I agree, but that is *not what #3155 tries to achieve."
>>>>>>>>> 
>>>>>>>>> This typo made this phrase nonsense, sorry!
>>>>>>>>> 
>>>>>>>>> On Mon, 28 Jan 2019, 16:44 Ezequiel Lovelle <
>>>>>> ezequiellovelle@gmail.com
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>>> What exactly is the delayed delivery use case?
>>>>>>>>>> 
>>>>>>>>>> This is helpful on systems relaying on pulsar for persistent
>>>>>>> guarantees
>>>>>>>>>> and using it for synchronization or some sort of checks, but on
>>>>>> such
>>>>>>>>>> systems is common to have some overhead committing data on
>>>>>> persistent
>>>>>>>>>> storage maybe due to buffered mechanism or distributing the data
>>>>>>> across
>>>>>>>>>> the network before being available.
>>>>>>>>>> 
>>>>>>>>>> Surely would be more use cases I don't came across right now.
>>>>>>>>>> 
>>>>>>>>>>> Random insertion and deletion is not what FIFO queues like Pulsar
>>>>>>> are
>>>>>>>>>> designed for.
>>>>>>>>>> 
>>>>>>>>>> I agree, but that is now what #3155 tries to achieve. #3155 is
>>>>>> just a
>>>>>>>>>> fixed delay for all message in a consumer, that's the reason that
>>>>>> the
>>>>>>>>>> implementation of #3155 is quite trivial.
>>>>>>>>>> 
>>>>>>>>>> +1 from me for doing PIP-26 in functions.
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> *Ezequiel Lovelle*
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Sat, 26 Jan 2019 at 09:57, Yuva raj <uv...@gmail.com> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Considering the way pulsar is built +1 for doing PIP-26 in
>>>>>>> functions.
>>>>>>>> I
>>>>>>>>> am
>>>>>>>>>>> more of thinking in a way like publish it pulsar we will make it
>>>>>>>>> available
>>>>>>>>>>> in a different queuing system if you need priority and delay
>>>>>>> messages
>>>>>>>>>>> support. Pulsar functions would go enough for this kind of use
>>>>>>> cases.
>>>>>>>>>>> 
>>>>>>>>>>> On Fri, 25 Jan 2019 at 22:29, Ivan Kelly <iv...@apache.org>
>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>>> Correct. PIP-26 can be implemented in Functions. I believe the
>>>>>>>> last
>>>>>>>>>>>>> discussion in PIP-26 thread kind of agree on functions
>>>>>> approach.
>>>>>>>>>>>>> If the community is okay with PIP-26 in functions, I think
>>>>>> that
>>>>>>> is
>>>>>>>>>>>> probably
>>>>>>>>>>>>> a good approach to start.
>>>>>>>>>>>> 
>>>>>>>>>>>> +1 for doing it in functions.
>>>>>>>>>>>> 
>>>>>>>>>>>> -Ivan
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> *Thanks*
>>>>>>>>>>> 
>>>>>>>>>>> *Yuvaraj L*
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>>

Re: [DISCUSSION] Delayed message delivery

Posted by Sebastián Schepens <se...@mercadolibre.com.INVALID>.

Hi,
I am really not into any details of the proposed implementation, but was
just wondering, has anyone had a look at how Uber implemented this in
Cherami? Cherami seems very similar to Pulsar, its storage system also
seems very similar to bookkeeper. They seem to implement delayed queues by
storing the time as part of the key in rocksdb and using sorted iterators,
could this be done in Pulsar as well?

Cheers,
Sebastian

On Tue, Feb 19, 2019 at 6:02 PM Dave Fisher <da...@comcast.net> wrote:

> Hi -
>
> Well, it does, but can this be implemented without building a delayQueue?
> It seems to me that a delayQueue both breaks resiliency if the broker goes
> down and would certainly add overhead. Perhaps my idea to discard responses
> that are too new and then retrieve once they are out of the delayed
> timeframe would be simpler?
>
> Again I am somewhat naive to the details. I’m not sure that the path
> through the code is kept to an absolute minimum when you have a Consumer
> with a nonzero delay?
>
> Regards,
> Dave
>
> > On Feb 19, 2019, at 12:39 PM, Ezequiel Lovelle <
> ezequiellovelle@gmail.com> wrote:
> >
> > Hi Dave!
> >
> >> I wonder if clients can add an optional argument to the broker call when
> > pulling events. The argument would be the amount of delay. Any messages
> > younger than the delay are not returned by the broker.
> >
> > This is exactly what https://github.com/apache/pulsar/pull/3155 does :).
> > We still need to decide if we want to add this feature at client side or
> > broker side, the pull request does it on the broker.
> >
> > --
> > *Ezequiel Lovelle*
> >
> >
> > On Tue, 19 Feb 2019 at 17:06, Dave Fisher <da...@comcast.net> wrote:
> >
> >> Hi -
> >>
> >> My thoughts here may be completely useless but I wonder if clients can
> add
> >> an optional argument to the broker call when pulling events. The
> argument
> >> would be the amount of delay. Any messages younger than the delay are
> not
> >> returned by the broker.
> >>
> >> Regards,
> >> Dave
> >>
> >>> On Feb 19, 2019, at 11:47 AM, Ezequiel Lovelle <
> >> ezequiellovelle@gmail.com> wrote:
> >>>
> >>>> The recent changes made to support DLQ caused major problems with
> >> garbage
> >>> collection
> >>>
> >>> If garbage collection is a big concern maybe we could add some config
> >>> parameter on the broker to disable the usage of this feature and return
> >>> BrokerMetadataException in this situation, giving the power to the
> >>> administrator whether to offer this feature or not.
> >>>
> >>>> is it acceptable to do it at broker side?
> >>>
> >>> I think this is the big question that needs to be answered.
> >>>
> >>>> can we just have a separated dispatcher for fixed delayed
> subscription?
> >>>
> >>> I will try to do a completely new approach, simpler, and more isolated
> >>> from broker logic. Maybe some way to plug to the broker some logic
> >> without
> >>> interfering with its core?
> >>>
> >>> In our business fixed delay at consumer level regardless of any
> producer
> >>> configuration is a big win due to easy implementation and usage.
> >>>
> >>> --
> >>> *Ezequiel Lovelle*
> >>>
> >>>
> >>> On Wed, 13 Feb 2019 at 23:25, Sijie Guo <gu...@gmail.com> wrote:
> >>>
> >>>> Agreed that dispatcher is a performance sensitive piece of code. Feel
> >> bad
> >>>> to hear that DLQ causes GC. Are there any issues tracking those items
> >> you
> >>>> guys identified with DLQ changes?
> >>>>
> >>>>> How is this different from a subscription running behind?
> >>>>
> >>>> As far as I understand form the discussion at #3155, I don't think
> >> there is
> >>>> a fundamental difference from a backlogged subscriber.
> >>>> The discussion point will mainly be - if a delayed subscription can be
> >>>> implemented with a simpler approach at broker side without changing
> >> other
> >>>> dispatcher logic,
> >>>> is it acceptable to do it at broker side? So we don't have to
> >> reimplement
> >>>> the same mechanism at different language clients. I think that's the
> >> same
> >>>> tradeoff we were discussing for generic delayed messages.
> >>>>
> >>>> My thought would be - can we just have a separated dispatcher for
> fixed
> >>>> delayed subscription? The logic can be ISOLATED from other normal
> >>>> dispatchers. if users don't enable delayed subscription, they will not
> >>>> exercise that dispatcher. This can be a good direction to explore for
> >>>> future changes that are related to dispatchers.
> >>>>
> >>>> - Sijie
> >>>>
> >>>>
> >>>> On Thu, Feb 14, 2019 at 8:43 AM Joe F <jo...@gmail.com> wrote:
> >>>>
> >>>>> Delayed subscription is simpler, and probably worth doing in the
> broker
> >>>> IF
> >>>>> done right.
> >>>>>
> >>>>> How is this different from a subscription running behind?  Why does
> >>>>> supporting that require this complex a change in the dispatcher, when
> >> we
> >>>>> already support backlogged subscribers?
> >>>>>
> >>>>> I am extremely wary of changes in the dispatcher. The recent changes
> >> made
> >>>>> to support DLQ caused major problems with garbage collection, broker
> >>>>> failure  and service interruptions for us. Even though we ARE NOT
> using
> >>>> the
> >>>>> DLQ feature. Not a pleasant experience.
> >>>>>
> >>>>> This is a very performance sensitive piece of code, and it should be
> >>>>> treated as such.
> >>>>>
> >>>>> Joe
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Wed, Feb 13, 2019 at 3:58 PM Sijie Guo <gu...@gmail.com>
> wrote:
> >>>>>
> >>>>>> Hi all,
> >>>>>>
> >>>>>> I am going to wrap up the discussion regarding delayed delivery use
> >>>>> cases.
> >>>>>>
> >>>>>> For arbitrary delayed delivery, there are a few +1s to doing PIP-26
> in
> >>>>>> functions. I am assuming that we will go down this path, unless
> there
> >>>> are
> >>>>>> other proposals.
> >>>>>>
> >>>>>> However there is a use case Lovelle pointed out about "Fixed Delayed
> >>>>>> Message". More specifically it is
> >>>>>> https://github.com/apache/pulsar/pull/3155
> >>>>>> (The caption in #3155 is a bit misleading). IMO it is a "delayed
> >>>>>> subscription", basically all messages in the subscription is delayed
> >> to
> >>>>>> dispatch in a given time interval. The consensus of this feature is
> >> not
> >>>>> yet
> >>>>>> achieved. Basically, there will be two approaches for this:
> >>>>>>
> >>>>>> a) DONT treat "fixed delayed message" as a different case. Just use
> >> the
> >>>>>> same approach as in PIP-26.
> >>>>>> b) treat "fixed delayed message" as a different case, e.g. we can
> >>>> better
> >>>>>> call it "delayed subscription" or whatever can distinguish it from
> >>>>> general
> >>>>>> arbitrary delayed delivery. Use the approach proposed/discussed in
> >>>> #3155.
> >>>>>>
> >>>>>> I would like the community to discuss this and also come to an
> >>>> agreement.
> >>>>>> So Lovelle can move forward with the approach agreed by the
> community.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Sijie
> >>>>>>
> >>>>>> On Tue, Jan 29, 2019 at 6:30 AM Ezequiel Lovelle <
> >>>>>> ezequiellovelle@gmail.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> "I agree, but that is *not what #3155 tries to achieve."
> >>>>>>>
> >>>>>>> This typo made this phrase nonsense, sorry!
> >>>>>>>
> >>>>>>> On Mon, 28 Jan 2019, 16:44 Ezequiel Lovelle <
> >>>> ezequiellovelle@gmail.com
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>>> What exactly is the delayed delivery use case?
> >>>>>>>>
> >>>>>>>> This is helpful on systems relaying on pulsar for persistent
> >>>>> guarantees
> >>>>>>>> and using it for synchronization or some sort of checks, but on
> >>>> such
> >>>>>>>> systems is common to have some overhead committing data on
> >>>> persistent
> >>>>>>>> storage maybe due to buffered mechanism or distributing the data
> >>>>> across
> >>>>>>>> the network before being available.
> >>>>>>>>
> >>>>>>>> Surely would be more use cases I don't came across right now.
> >>>>>>>>
> >>>>>>>>> Random insertion and deletion is not what FIFO queues like Pulsar
> >>>>> are
> >>>>>>>> designed for.
> >>>>>>>>
> >>>>>>>> I agree, but that is now what #3155 tries to achieve. #3155 is
> >>>> just a
> >>>>>>>> fixed delay for all message in a consumer, that's the reason that
> >>>> the
> >>>>>>>> implementation of #3155 is quite trivial.
> >>>>>>>>
> >>>>>>>> +1 from me for doing PIP-26 in functions.
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> *Ezequiel Lovelle*
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Sat, 26 Jan 2019 at 09:57, Yuva raj <uv...@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>>> Considering the way pulsar is built +1 for doing PIP-26 in
> >>>>> functions.
> >>>>>> I
> >>>>>>> am
> >>>>>>>>> more of thinking in a way like publish it pulsar we will make it
> >>>>>>> available
> >>>>>>>>> in a different queuing system if you need priority and delay
> >>>>> messages
> >>>>>>>>> support. Pulsar functions would go enough for this kind of use
> >>>>> cases.
> >>>>>>>>>
> >>>>>>>>> On Fri, 25 Jan 2019 at 22:29, Ivan Kelly <iv...@apache.org>
> >>>> wrote:
> >>>>>>>>>
> >>>>>>>>>>> Correct. PIP-26 can be implemented in Functions. I believe the
> >>>>>> last
> >>>>>>>>>>> discussion in PIP-26 thread kind of agree on functions
> >>>> approach.
> >>>>>>>>>>> If the community is okay with PIP-26 in functions, I think
> >>>> that
> >>>>> is
> >>>>>>>>>> probably
> >>>>>>>>>>> a good approach to start.
> >>>>>>>>>>
> >>>>>>>>>> +1 for doing it in functions.
> >>>>>>>>>>
> >>>>>>>>>> -Ivan
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> *Thanks*
> >>>>>>>>>
> >>>>>>>>> *Yuvaraj L*
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>
> >>
>
>

Re: [DISCUSSION] Delayed message delivery

Posted by Dave Fisher <da...@comcast.net>.

Hi -

Well, it does, but can this be implemented without building a delayQueue? It seems to me that a delayQueue both breaks resiliency if the broker goes down and would certainly add overhead. Perhaps my idea to discard responses that are too new and then retrieve once they are out of the delayed timeframe would be simpler?

Again I am somewhat naive to the details. I’m not sure that the path through the code is kept to an absolute minimum when you have a Consumer with a nonzero delay?

Regards,
Dave

> On Feb 19, 2019, at 12:39 PM, Ezequiel Lovelle <ez...@gmail.com> wrote:
> 
> Hi Dave!
> 
>> I wonder if clients can add an optional argument to the broker call when
> pulling events. The argument would be the amount of delay. Any messages
> younger than the delay are not returned by the broker.
> 
> This is exactly what https://github.com/apache/pulsar/pull/3155 does :).
> We still need to decide if we want to add this feature at client side or
> broker side, the pull request does it on the broker.
> 
> --
> *Ezequiel Lovelle*
> 
> 
> On Tue, 19 Feb 2019 at 17:06, Dave Fisher <da...@comcast.net> wrote:
> 
>> Hi -
>> 
>> My thoughts here may be completely useless but I wonder if clients can add
>> an optional argument to the broker call when pulling events. The argument
>> would be the amount of delay. Any messages younger than the delay are not
>> returned by the broker.
>> 
>> Regards,
>> Dave
>> 
>>> On Feb 19, 2019, at 11:47 AM, Ezequiel Lovelle <
>> ezequiellovelle@gmail.com> wrote:
>>> 
>>>> The recent changes made to support DLQ caused major problems with
>> garbage
>>> collection
>>> 
>>> If garbage collection is a big concern maybe we could add some config
>>> parameter on the broker to disable the usage of this feature and return
>>> BrokerMetadataException in this situation, giving the power to the
>>> administrator whether to offer this feature or not.
>>> 
>>>> is it acceptable to do it at broker side?
>>> 
>>> I think this is the big question that needs to be answered.
>>> 
>>>> can we just have a separated dispatcher for fixed delayed subscription?
>>> 
>>> I will try to do a completely new approach, simpler, and more isolated
>>> from broker logic. Maybe some way to plug to the broker some logic
>> without
>>> interfering with its core?
>>> 
>>> In our business fixed delay at consumer level regardless of any producer
>>> configuration is a big win due to easy implementation and usage.
>>> 
>>> --
>>> *Ezequiel Lovelle*
>>> 
>>> 
>>> On Wed, 13 Feb 2019 at 23:25, Sijie Guo <gu...@gmail.com> wrote:
>>> 
>>>> Agreed that dispatcher is a performance sensitive piece of code. Feel
>> bad
>>>> to hear that DLQ causes GC. Are there any issues tracking those items
>> you
>>>> guys identified with DLQ changes?
>>>> 
>>>>> How is this different from a subscription running behind?
>>>> 
>>>> As far as I understand form the discussion at #3155, I don't think
>> there is
>>>> a fundamental difference from a backlogged subscriber.
>>>> The discussion point will mainly be - if a delayed subscription can be
>>>> implemented with a simpler approach at broker side without changing
>> other
>>>> dispatcher logic,
>>>> is it acceptable to do it at broker side? So we don't have to
>> reimplement
>>>> the same mechanism at different language clients. I think that's the
>> same
>>>> tradeoff we were discussing for generic delayed messages.
>>>> 
>>>> My thought would be - can we just have a separated dispatcher for fixed
>>>> delayed subscription? The logic can be ISOLATED from other normal
>>>> dispatchers. if users don't enable delayed subscription, they will not
>>>> exercise that dispatcher. This can be a good direction to explore for
>>>> future changes that are related to dispatchers.
>>>> 
>>>> - Sijie
>>>> 
>>>> 
>>>> On Thu, Feb 14, 2019 at 8:43 AM Joe F <jo...@gmail.com> wrote:
>>>> 
>>>>> Delayed subscription is simpler, and probably worth doing in the broker
>>>> IF
>>>>> done right.
>>>>> 
>>>>> How is this different from a subscription running behind?  Why does
>>>>> supporting that require this complex a change in the dispatcher, when
>> we
>>>>> already support backlogged subscribers?
>>>>> 
>>>>> I am extremely wary of changes in the dispatcher. The recent changes
>> made
>>>>> to support DLQ caused major problems with garbage collection, broker
>>>>> failure  and service interruptions for us. Even though we ARE NOT using
>>>> the
>>>>> DLQ feature. Not a pleasant experience.
>>>>> 
>>>>> This is a very performance sensitive piece of code, and it should be
>>>>> treated as such.
>>>>> 
>>>>> Joe
>>>>> 
>>>>> 
>>>>> 
>>>>> On Wed, Feb 13, 2019 at 3:58 PM Sijie Guo <gu...@gmail.com> wrote:
>>>>> 
>>>>>> Hi all,
>>>>>> 
>>>>>> I am going to wrap up the discussion regarding delayed delivery use
>>>>> cases.
>>>>>> 
>>>>>> For arbitrary delayed delivery, there are a few +1s to doing PIP-26 in
>>>>>> functions. I am assuming that we will go down this path, unless there
>>>> are
>>>>>> other proposals.
>>>>>> 
>>>>>> However there is a use case Lovelle pointed out about "Fixed Delayed
>>>>>> Message". More specifically it is
>>>>>> https://github.com/apache/pulsar/pull/3155
>>>>>> (The caption in #3155 is a bit misleading). IMO it is a "delayed
>>>>>> subscription", basically all messages in the subscription is delayed
>> to
>>>>>> dispatch in a given time interval. The consensus of this feature is
>> not
>>>>> yet
>>>>>> achieved. Basically, there will be two approaches for this:
>>>>>> 
>>>>>> a) DONT treat "fixed delayed message" as a different case. Just use
>> the
>>>>>> same approach as in PIP-26.
>>>>>> b) treat "fixed delayed message" as a different case, e.g. we can
>>>> better
>>>>>> call it "delayed subscription" or whatever can distinguish it from
>>>>> general
>>>>>> arbitrary delayed delivery. Use the approach proposed/discussed in
>>>> #3155.
>>>>>> 
>>>>>> I would like the community to discuss this and also come to an
>>>> agreement.
>>>>>> So Lovelle can move forward with the approach agreed by the community.
>>>>>> 
>>>>>> Thanks,
>>>>>> Sijie
>>>>>> 
>>>>>> On Tue, Jan 29, 2019 at 6:30 AM Ezequiel Lovelle <
>>>>>> ezequiellovelle@gmail.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> "I agree, but that is *not what #3155 tries to achieve."
>>>>>>> 
>>>>>>> This typo made this phrase nonsense, sorry!
>>>>>>> 
>>>>>>> On Mon, 28 Jan 2019, 16:44 Ezequiel Lovelle <
>>>> ezequiellovelle@gmail.com
>>>>>>> wrote:
>>>>>>> 
>>>>>>>>> What exactly is the delayed delivery use case?
>>>>>>>> 
>>>>>>>> This is helpful on systems relaying on pulsar for persistent
>>>>> guarantees
>>>>>>>> and using it for synchronization or some sort of checks, but on
>>>> such
>>>>>>>> systems is common to have some overhead committing data on
>>>> persistent
>>>>>>>> storage maybe due to buffered mechanism or distributing the data
>>>>> across
>>>>>>>> the network before being available.
>>>>>>>> 
>>>>>>>> Surely would be more use cases I don't came across right now.
>>>>>>>> 
>>>>>>>>> Random insertion and deletion is not what FIFO queues like Pulsar
>>>>> are
>>>>>>>> designed for.
>>>>>>>> 
>>>>>>>> I agree, but that is now what #3155 tries to achieve. #3155 is
>>>> just a
>>>>>>>> fixed delay for all message in a consumer, that's the reason that
>>>> the
>>>>>>>> implementation of #3155 is quite trivial.
>>>>>>>> 
>>>>>>>> +1 from me for doing PIP-26 in functions.
>>>>>>>> 
>>>>>>>> --
>>>>>>>> *Ezequiel Lovelle*
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Sat, 26 Jan 2019 at 09:57, Yuva raj <uv...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>>> Considering the way pulsar is built +1 for doing PIP-26 in
>>>>> functions.
>>>>>> I
>>>>>>> am
>>>>>>>>> more of thinking in a way like publish it pulsar we will make it
>>>>>>> available
>>>>>>>>> in a different queuing system if you need priority and delay
>>>>> messages
>>>>>>>>> support. Pulsar functions would go enough for this kind of use
>>>>> cases.
>>>>>>>>> 
>>>>>>>>> On Fri, 25 Jan 2019 at 22:29, Ivan Kelly <iv...@apache.org>
>>>> wrote:
>>>>>>>>> 
>>>>>>>>>>> Correct. PIP-26 can be implemented in Functions. I believe the
>>>>>> last
>>>>>>>>>>> discussion in PIP-26 thread kind of agree on functions
>>>> approach.
>>>>>>>>>>> If the community is okay with PIP-26 in functions, I think
>>>> that
>>>>> is
>>>>>>>>>> probably
>>>>>>>>>>> a good approach to start.
>>>>>>>>>> 
>>>>>>>>>> +1 for doing it in functions.
>>>>>>>>>> 
>>>>>>>>>> -Ivan
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> *Thanks*
>>>>>>>>> 
>>>>>>>>> *Yuvaraj L*
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 
>>

Re: [DISCUSSION] Delayed message delivery

Posted by Ezequiel Lovelle <ez...@gmail.com>.

Hi Dave!

> I wonder if clients can add an optional argument to the broker call when
pulling events. The argument would be the amount of delay. Any messages
younger than the delay are not returned by the broker.

This is exactly what https://github.com/apache/pulsar/pull/3155 does :).
We still need to decide if we want to add this feature at client side or
broker side, the pull request does it on the broker.

--
*Ezequiel Lovelle*


On Tue, 19 Feb 2019 at 17:06, Dave Fisher <da...@comcast.net> wrote:

> Hi -
>
> My thoughts here may be completely useless but I wonder if clients can add
> an optional argument to the broker call when pulling events. The argument
> would be the amount of delay. Any messages younger than the delay are not
> returned by the broker.
>
> Regards,
> Dave
>
> > On Feb 19, 2019, at 11:47 AM, Ezequiel Lovelle <
> ezequiellovelle@gmail.com> wrote:
> >
> >> The recent changes made to support DLQ caused major problems with
> garbage
> > collection
> >
> > If garbage collection is a big concern maybe we could add some config
> > parameter on the broker to disable the usage of this feature and return
> > BrokerMetadataException in this situation, giving the power to the
> > administrator whether to offer this feature or not.
> >
> >> is it acceptable to do it at broker side?
> >
> > I think this is the big question that needs to be answered.
> >
> >> can we just have a separated dispatcher for fixed delayed subscription?
> >
> > I will try to do a completely new approach, simpler, and more isolated
> > from broker logic. Maybe some way to plug to the broker some logic
> without
> > interfering with its core?
> >
> > In our business fixed delay at consumer level regardless of any producer
> > configuration is a big win due to easy implementation and usage.
> >
> > --
> > *Ezequiel Lovelle*
> >
> >
> > On Wed, 13 Feb 2019 at 23:25, Sijie Guo <gu...@gmail.com> wrote:
> >
> >> Agreed that dispatcher is a performance sensitive piece of code. Feel
> bad
> >> to hear that DLQ causes GC. Are there any issues tracking those items
> you
> >> guys identified with DLQ changes?
> >>
> >>> How is this different from a subscription running behind?
> >>
> >> As far as I understand form the discussion at #3155, I don't think
> there is
> >> a fundamental difference from a backlogged subscriber.
> >> The discussion point will mainly be - if a delayed subscription can be
> >> implemented with a simpler approach at broker side without changing
> other
> >> dispatcher logic,
> >> is it acceptable to do it at broker side? So we don't have to
> reimplement
> >> the same mechanism at different language clients. I think that's the
> same
> >> tradeoff we were discussing for generic delayed messages.
> >>
> >> My thought would be - can we just have a separated dispatcher for fixed
> >> delayed subscription? The logic can be ISOLATED from other normal
> >> dispatchers. if users don't enable delayed subscription, they will not
> >> exercise that dispatcher. This can be a good direction to explore for
> >> future changes that are related to dispatchers.
> >>
> >> - Sijie
> >>
> >>
> >> On Thu, Feb 14, 2019 at 8:43 AM Joe F <jo...@gmail.com> wrote:
> >>
> >>> Delayed subscription is simpler, and probably worth doing in the broker
> >> IF
> >>> done right.
> >>>
> >>> How is this different from a subscription running behind?  Why does
> >>> supporting that require this complex a change in the dispatcher, when
> we
> >>> already support backlogged subscribers?
> >>>
> >>> I am extremely wary of changes in the dispatcher. The recent changes
> made
> >>> to support DLQ caused major problems with garbage collection, broker
> >>> failure  and service interruptions for us. Even though we ARE NOT using
> >> the
> >>> DLQ feature. Not a pleasant experience.
> >>>
> >>> This is a very performance sensitive piece of code, and it should be
> >>> treated as such.
> >>>
> >>> Joe
> >>>
> >>>
> >>>
> >>> On Wed, Feb 13, 2019 at 3:58 PM Sijie Guo <gu...@gmail.com> wrote:
> >>>
> >>>> Hi all,
> >>>>
> >>>> I am going to wrap up the discussion regarding delayed delivery use
> >>> cases.
> >>>>
> >>>> For arbitrary delayed delivery, there are a few +1s to doing PIP-26 in
> >>>> functions. I am assuming that we will go down this path, unless there
> >> are
> >>>> other proposals.
> >>>>
> >>>> However there is a use case Lovelle pointed out about "Fixed Delayed
> >>>> Message". More specifically it is
> >>>> https://github.com/apache/pulsar/pull/3155
> >>>> (The caption in #3155 is a bit misleading). IMO it is a "delayed
> >>>> subscription", basically all messages in the subscription is delayed
> to
> >>>> dispatch in a given time interval. The consensus of this feature is
> not
> >>> yet
> >>>> achieved. Basically, there will be two approaches for this:
> >>>>
> >>>> a) DONT treat "fixed delayed message" as a different case. Just use
> the
> >>>> same approach as in PIP-26.
> >>>> b) treat "fixed delayed message" as a different case, e.g. we can
> >> better
> >>>> call it "delayed subscription" or whatever can distinguish it from
> >>> general
> >>>> arbitrary delayed delivery. Use the approach proposed/discussed in
> >> #3155.
> >>>>
> >>>> I would like the community to discuss this and also come to an
> >> agreement.
> >>>> So Lovelle can move forward with the approach agreed by the community.
> >>>>
> >>>> Thanks,
> >>>> Sijie
> >>>>
> >>>> On Tue, Jan 29, 2019 at 6:30 AM Ezequiel Lovelle <
> >>>> ezequiellovelle@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> "I agree, but that is *not what #3155 tries to achieve."
> >>>>>
> >>>>> This typo made this phrase nonsense, sorry!
> >>>>>
> >>>>> On Mon, 28 Jan 2019, 16:44 Ezequiel Lovelle <
> >> ezequiellovelle@gmail.com
> >>>>> wrote:
> >>>>>
> >>>>>>> What exactly is the delayed delivery use case?
> >>>>>>
> >>>>>> This is helpful on systems relaying on pulsar for persistent
> >>> guarantees
> >>>>>> and using it for synchronization or some sort of checks, but on
> >> such
> >>>>>> systems is common to have some overhead committing data on
> >> persistent
> >>>>>> storage maybe due to buffered mechanism or distributing the data
> >>> across
> >>>>>> the network before being available.
> >>>>>>
> >>>>>> Surely would be more use cases I don't came across right now.
> >>>>>>
> >>>>>>> Random insertion and deletion is not what FIFO queues like Pulsar
> >>> are
> >>>>>> designed for.
> >>>>>>
> >>>>>> I agree, but that is now what #3155 tries to achieve. #3155 is
> >> just a
> >>>>>> fixed delay for all message in a consumer, that's the reason that
> >> the
> >>>>>> implementation of #3155 is quite trivial.
> >>>>>>
> >>>>>> +1 from me for doing PIP-26 in functions.
> >>>>>>
> >>>>>> --
> >>>>>> *Ezequiel Lovelle*
> >>>>>>
> >>>>>>
> >>>>>> On Sat, 26 Jan 2019 at 09:57, Yuva raj <uv...@gmail.com> wrote:
> >>>>>>
> >>>>>>> Considering the way pulsar is built +1 for doing PIP-26 in
> >>> functions.
> >>>> I
> >>>>> am
> >>>>>>> more of thinking in a way like publish it pulsar we will make it
> >>>>> available
> >>>>>>> in a different queuing system if you need priority and delay
> >>> messages
> >>>>>>> support. Pulsar functions would go enough for this kind of use
> >>> cases.
> >>>>>>>
> >>>>>>> On Fri, 25 Jan 2019 at 22:29, Ivan Kelly <iv...@apache.org>
> >> wrote:
> >>>>>>>
> >>>>>>>>> Correct. PIP-26 can be implemented in Functions. I believe the
> >>>> last
> >>>>>>>>> discussion in PIP-26 thread kind of agree on functions
> >> approach.
> >>>>>>>>> If the community is okay with PIP-26 in functions, I think
> >> that
> >>> is
> >>>>>>>> probably
> >>>>>>>>> a good approach to start.
> >>>>>>>>
> >>>>>>>> +1 for doing it in functions.
> >>>>>>>>
> >>>>>>>> -Ivan
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> *Thanks*
> >>>>>>>
> >>>>>>> *Yuvaraj L*
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
>
>

Re: [DISCUSSION] Delayed message delivery

Posted by Dave Fisher <da...@comcast.net>.

Hi -

My thoughts here may be completely useless but I wonder if clients can add an optional argument to the broker call when pulling events. The argument would be the amount of delay. Any messages younger than the delay are not returned by the broker.

Regards,
Dave

> On Feb 19, 2019, at 11:47 AM, Ezequiel Lovelle <ez...@gmail.com> wrote:
> 
>> The recent changes made to support DLQ caused major problems with garbage
> collection
> 
> If garbage collection is a big concern maybe we could add some config
> parameter on the broker to disable the usage of this feature and return
> BrokerMetadataException in this situation, giving the power to the
> administrator whether to offer this feature or not.
> 
>> is it acceptable to do it at broker side?
> 
> I think this is the big question that needs to be answered.
> 
>> can we just have a separated dispatcher for fixed delayed subscription?
> 
> I will try to do a completely new approach, simpler, and more isolated
> from broker logic. Maybe some way to plug to the broker some logic without
> interfering with its core?
> 
> In our business fixed delay at consumer level regardless of any producer
> configuration is a big win due to easy implementation and usage.
> 
> --
> *Ezequiel Lovelle*
> 
> 
> On Wed, 13 Feb 2019 at 23:25, Sijie Guo <gu...@gmail.com> wrote:
> 
>> Agreed that dispatcher is a performance sensitive piece of code. Feel bad
>> to hear that DLQ causes GC. Are there any issues tracking those items you
>> guys identified with DLQ changes?
>> 
>>> How is this different from a subscription running behind?
>> 
>> As far as I understand form the discussion at #3155, I don't think there is
>> a fundamental difference from a backlogged subscriber.
>> The discussion point will mainly be - if a delayed subscription can be
>> implemented with a simpler approach at broker side without changing other
>> dispatcher logic,
>> is it acceptable to do it at broker side? So we don't have to reimplement
>> the same mechanism at different language clients. I think that's the same
>> tradeoff we were discussing for generic delayed messages.
>> 
>> My thought would be - can we just have a separated dispatcher for fixed
>> delayed subscription? The logic can be ISOLATED from other normal
>> dispatchers. if users don't enable delayed subscription, they will not
>> exercise that dispatcher. This can be a good direction to explore for
>> future changes that are related to dispatchers.
>> 
>> - Sijie
>> 
>> 
>> On Thu, Feb 14, 2019 at 8:43 AM Joe F <jo...@gmail.com> wrote:
>> 
>>> Delayed subscription is simpler, and probably worth doing in the broker
>> IF
>>> done right.
>>> 
>>> How is this different from a subscription running behind?  Why does
>>> supporting that require this complex a change in the dispatcher, when we
>>> already support backlogged subscribers?
>>> 
>>> I am extremely wary of changes in the dispatcher. The recent changes made
>>> to support DLQ caused major problems with garbage collection, broker
>>> failure  and service interruptions for us. Even though we ARE NOT using
>> the
>>> DLQ feature. Not a pleasant experience.
>>> 
>>> This is a very performance sensitive piece of code, and it should be
>>> treated as such.
>>> 
>>> Joe
>>> 
>>> 
>>> 
>>> On Wed, Feb 13, 2019 at 3:58 PM Sijie Guo <gu...@gmail.com> wrote:
>>> 
>>>> Hi all,
>>>> 
>>>> I am going to wrap up the discussion regarding delayed delivery use
>>> cases.
>>>> 
>>>> For arbitrary delayed delivery, there are a few +1s to doing PIP-26 in
>>>> functions. I am assuming that we will go down this path, unless there
>> are
>>>> other proposals.
>>>> 
>>>> However there is a use case Lovelle pointed out about "Fixed Delayed
>>>> Message". More specifically it is
>>>> https://github.com/apache/pulsar/pull/3155
>>>> (The caption in #3155 is a bit misleading). IMO it is a "delayed
>>>> subscription", basically all messages in the subscription is delayed to
>>>> dispatch in a given time interval. The consensus of this feature is not
>>> yet
>>>> achieved. Basically, there will be two approaches for this:
>>>> 
>>>> a) DONT treat "fixed delayed message" as a different case. Just use the
>>>> same approach as in PIP-26.
>>>> b) treat "fixed delayed message" as a different case, e.g. we can
>> better
>>>> call it "delayed subscription" or whatever can distinguish it from
>>> general
>>>> arbitrary delayed delivery. Use the approach proposed/discussed in
>> #3155.
>>>> 
>>>> I would like the community to discuss this and also come to an
>> agreement.
>>>> So Lovelle can move forward with the approach agreed by the community.
>>>> 
>>>> Thanks,
>>>> Sijie
>>>> 
>>>> On Tue, Jan 29, 2019 at 6:30 AM Ezequiel Lovelle <
>>>> ezequiellovelle@gmail.com>
>>>> wrote:
>>>> 
>>>>> "I agree, but that is *not what #3155 tries to achieve."
>>>>> 
>>>>> This typo made this phrase nonsense, sorry!
>>>>> 
>>>>> On Mon, 28 Jan 2019, 16:44 Ezequiel Lovelle <
>> ezequiellovelle@gmail.com
>>>>> wrote:
>>>>> 
>>>>>>> What exactly is the delayed delivery use case?
>>>>>> 
>>>>>> This is helpful on systems relaying on pulsar for persistent
>>> guarantees
>>>>>> and using it for synchronization or some sort of checks, but on
>> such
>>>>>> systems is common to have some overhead committing data on
>> persistent
>>>>>> storage maybe due to buffered mechanism or distributing the data
>>> across
>>>>>> the network before being available.
>>>>>> 
>>>>>> Surely would be more use cases I don't came across right now.
>>>>>> 
>>>>>>> Random insertion and deletion is not what FIFO queues like Pulsar
>>> are
>>>>>> designed for.
>>>>>> 
>>>>>> I agree, but that is now what #3155 tries to achieve. #3155 is
>> just a
>>>>>> fixed delay for all message in a consumer, that's the reason that
>> the
>>>>>> implementation of #3155 is quite trivial.
>>>>>> 
>>>>>> +1 from me for doing PIP-26 in functions.
>>>>>> 
>>>>>> --
>>>>>> *Ezequiel Lovelle*
>>>>>> 
>>>>>> 
>>>>>> On Sat, 26 Jan 2019 at 09:57, Yuva raj <uv...@gmail.com> wrote:
>>>>>> 
>>>>>>> Considering the way pulsar is built +1 for doing PIP-26 in
>>> functions.
>>>> I
>>>>> am
>>>>>>> more of thinking in a way like publish it pulsar we will make it
>>>>> available
>>>>>>> in a different queuing system if you need priority and delay
>>> messages
>>>>>>> support. Pulsar functions would go enough for this kind of use
>>> cases.
>>>>>>> 
>>>>>>> On Fri, 25 Jan 2019 at 22:29, Ivan Kelly <iv...@apache.org>
>> wrote:
>>>>>>> 
>>>>>>>>> Correct. PIP-26 can be implemented in Functions. I believe the
>>>> last
>>>>>>>>> discussion in PIP-26 thread kind of agree on functions
>> approach.
>>>>>>>>> If the community is okay with PIP-26 in functions, I think
>> that
>>> is
>>>>>>>> probably
>>>>>>>>> a good approach to start.
>>>>>>>> 
>>>>>>>> +1 for doing it in functions.
>>>>>>>> 
>>>>>>>> -Ivan
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> *Thanks*
>>>>>>> 
>>>>>>> *Yuvaraj L*
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>>

Re: [DISCUSSION] Delayed message delivery

Posted by Ezequiel Lovelle <ez...@gmail.com>.

> The recent changes made to support DLQ caused major problems with garbage
collection

If garbage collection is a big concern maybe we could add some config
parameter on the broker to disable the usage of this feature and return
BrokerMetadataException in this situation, giving the power to the
administrator whether to offer this feature or not.

> is it acceptable to do it at broker side?

I think this is the big question that needs to be answered.

> can we just have a separated dispatcher for fixed delayed subscription?

I will try to do a completely new approach, simpler, and more isolated
from broker logic. Maybe some way to plug to the broker some logic without
interfering with its core?

In our business fixed delay at consumer level regardless of any producer
configuration is a big win due to easy implementation and usage.

--
*Ezequiel Lovelle*


On Wed, 13 Feb 2019 at 23:25, Sijie Guo <gu...@gmail.com> wrote:

> Agreed that dispatcher is a performance sensitive piece of code. Feel bad
> to hear that DLQ causes GC. Are there any issues tracking those items you
> guys identified with DLQ changes?
>
> > How is this different from a subscription running behind?
>
> As far as I understand form the discussion at #3155, I don't think there is
> a fundamental difference from a backlogged subscriber.
> The discussion point will mainly be - if a delayed subscription can be
> implemented with a simpler approach at broker side without changing other
> dispatcher logic,
> is it acceptable to do it at broker side? So we don't have to reimplement
> the same mechanism at different language clients. I think that's the same
> tradeoff we were discussing for generic delayed messages.
>
> My thought would be - can we just have a separated dispatcher for fixed
> delayed subscription? The logic can be ISOLATED from other normal
> dispatchers. if users don't enable delayed subscription, they will not
> exercise that dispatcher. This can be a good direction to explore for
> future changes that are related to dispatchers.
>
> - Sijie
>
>
> On Thu, Feb 14, 2019 at 8:43 AM Joe F <jo...@gmail.com> wrote:
>
> > Delayed subscription is simpler, and probably worth doing in the broker
> IF
> > done right.
> >
> > How is this different from a subscription running behind?  Why does
> > supporting that require this complex a change in the dispatcher, when we
> > already support backlogged subscribers?
> >
> > I am extremely wary of changes in the dispatcher. The recent changes made
> > to support DLQ caused major problems with garbage collection, broker
> > failure  and service interruptions for us. Even though we ARE NOT using
> the
> > DLQ feature. Not a pleasant experience.
> >
> > This is a very performance sensitive piece of code, and it should be
> > treated as such.
> >
> > Joe
> >
> >
> >
> > On Wed, Feb 13, 2019 at 3:58 PM Sijie Guo <gu...@gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > I am going to wrap up the discussion regarding delayed delivery use
> > cases.
> > >
> > > For arbitrary delayed delivery, there are a few +1s to doing PIP-26 in
> > > functions. I am assuming that we will go down this path, unless there
> are
> > > other proposals.
> > >
> > > However there is a use case Lovelle pointed out about "Fixed Delayed
> > > Message". More specifically it is
> > > https://github.com/apache/pulsar/pull/3155
> > > (The caption in #3155 is a bit misleading). IMO it is a "delayed
> > > subscription", basically all messages in the subscription is delayed to
> > > dispatch in a given time interval. The consensus of this feature is not
> > yet
> > > achieved. Basically, there will be two approaches for this:
> > >
> > > a) DONT treat "fixed delayed message" as a different case. Just use the
> > > same approach as in PIP-26.
> > > b) treat "fixed delayed message" as a different case, e.g. we can
> better
> > > call it "delayed subscription" or whatever can distinguish it from
> > general
> > > arbitrary delayed delivery. Use the approach proposed/discussed in
> #3155.
> > >
> > > I would like the community to discuss this and also come to an
> agreement.
> > > So Lovelle can move forward with the approach agreed by the community.
> > >
> > > Thanks,
> > > Sijie
> > >
> > > On Tue, Jan 29, 2019 at 6:30 AM Ezequiel Lovelle <
> > > ezequiellovelle@gmail.com>
> > > wrote:
> > >
> > > > "I agree, but that is *not what #3155 tries to achieve."
> > > >
> > > > This typo made this phrase nonsense, sorry!
> > > >
> > > > On Mon, 28 Jan 2019, 16:44 Ezequiel Lovelle <
> ezequiellovelle@gmail.com
> > > > wrote:
> > > >
> > > > > > What exactly is the delayed delivery use case?
> > > > >
> > > > > This is helpful on systems relaying on pulsar for persistent
> > guarantees
> > > > > and using it for synchronization or some sort of checks, but on
> such
> > > > > systems is common to have some overhead committing data on
> persistent
> > > > > storage maybe due to buffered mechanism or distributing the data
> > across
> > > > > the network before being available.
> > > > >
> > > > > Surely would be more use cases I don't came across right now.
> > > > >
> > > > > > Random insertion and deletion is not what FIFO queues like Pulsar
> > are
> > > > > designed for.
> > > > >
> > > > > I agree, but that is now what #3155 tries to achieve. #3155 is
> just a
> > > > > fixed delay for all message in a consumer, that's the reason that
> the
> > > > > implementation of #3155 is quite trivial.
> > > > >
> > > > > +1 from me for doing PIP-26 in functions.
> > > > >
> > > > > --
> > > > > *Ezequiel Lovelle*
> > > > >
> > > > >
> > > > > On Sat, 26 Jan 2019 at 09:57, Yuva raj <uv...@gmail.com> wrote:
> > > > >
> > > > >> Considering the way pulsar is built +1 for doing PIP-26 in
> > functions.
> > > I
> > > > am
> > > > >> more of thinking in a way like publish it pulsar we will make it
> > > > available
> > > > >> in a different queuing system if you need priority and delay
> > messages
> > > > >> support. Pulsar functions would go enough for this kind of use
> > cases.
> > > > >>
> > > > >> On Fri, 25 Jan 2019 at 22:29, Ivan Kelly <iv...@apache.org>
> wrote:
> > > > >>
> > > > >> > > Correct. PIP-26 can be implemented in Functions. I believe the
> > > last
> > > > >> > > discussion in PIP-26 thread kind of agree on functions
> approach.
> > > > >> > > If the community is okay with PIP-26 in functions, I think
> that
> > is
> > > > >> > probably
> > > > >> > > a good approach to start.
> > > > >> >
> > > > >> > +1 for doing it in functions.
> > > > >> >
> > > > >> > -Ivan
> > > > >> >
> > > > >>
> > > > >>
> > > > >> --
> > > > >> *Thanks*
> > > > >>
> > > > >> *Yuvaraj L*
> > > > >>
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSSION] Delayed message delivery

Posted by Sijie Guo <gu...@gmail.com>.

Agreed that dispatcher is a performance sensitive piece of code. Feel bad
to hear that DLQ causes GC. Are there any issues tracking those items you
guys identified with DLQ changes?

> How is this different from a subscription running behind?

As far as I understand form the discussion at #3155, I don't think there is
a fundamental difference from a backlogged subscriber.
The discussion point will mainly be - if a delayed subscription can be
implemented with a simpler approach at broker side without changing other
dispatcher logic,
is it acceptable to do it at broker side? So we don't have to reimplement
the same mechanism at different language clients. I think that's the same
tradeoff we were discussing for generic delayed messages.

My thought would be - can we just have a separated dispatcher for fixed
delayed subscription? The logic can be ISOLATED from other normal
dispatchers. if users don't enable delayed subscription, they will not
exercise that dispatcher. This can be a good direction to explore for
future changes that are related to dispatchers.

- Sijie


On Thu, Feb 14, 2019 at 8:43 AM Joe F <jo...@gmail.com> wrote:

> Delayed subscription is simpler, and probably worth doing in the broker IF
> done right.
>
> How is this different from a subscription running behind?  Why does
> supporting that require this complex a change in the dispatcher, when we
> already support backlogged subscribers?
>
> I am extremely wary of changes in the dispatcher. The recent changes made
> to support DLQ caused major problems with garbage collection, broker
> failure  and service interruptions for us. Even though we ARE NOT using the
> DLQ feature. Not a pleasant experience.
>
> This is a very performance sensitive piece of code, and it should be
> treated as such.
>
> Joe
>
>
>
> On Wed, Feb 13, 2019 at 3:58 PM Sijie Guo <gu...@gmail.com> wrote:
>
> > Hi all,
> >
> > I am going to wrap up the discussion regarding delayed delivery use
> cases.
> >
> > For arbitrary delayed delivery, there are a few +1s to doing PIP-26 in
> > functions. I am assuming that we will go down this path, unless there are
> > other proposals.
> >
> > However there is a use case Lovelle pointed out about "Fixed Delayed
> > Message". More specifically it is
> > https://github.com/apache/pulsar/pull/3155
> > (The caption in #3155 is a bit misleading). IMO it is a "delayed
> > subscription", basically all messages in the subscription is delayed to
> > dispatch in a given time interval. The consensus of this feature is not
> yet
> > achieved. Basically, there will be two approaches for this:
> >
> > a) DONT treat "fixed delayed message" as a different case. Just use the
> > same approach as in PIP-26.
> > b) treat "fixed delayed message" as a different case, e.g. we can better
> > call it "delayed subscription" or whatever can distinguish it from
> general
> > arbitrary delayed delivery. Use the approach proposed/discussed in #3155.
> >
> > I would like the community to discuss this and also come to an agreement.
> > So Lovelle can move forward with the approach agreed by the community.
> >
> > Thanks,
> > Sijie
> >
> > On Tue, Jan 29, 2019 at 6:30 AM Ezequiel Lovelle <
> > ezequiellovelle@gmail.com>
> > wrote:
> >
> > > "I agree, but that is *not what #3155 tries to achieve."
> > >
> > > This typo made this phrase nonsense, sorry!
> > >
> > > On Mon, 28 Jan 2019, 16:44 Ezequiel Lovelle <ezequiellovelle@gmail.com
> > > wrote:
> > >
> > > > > What exactly is the delayed delivery use case?
> > > >
> > > > This is helpful on systems relaying on pulsar for persistent
> guarantees
> > > > and using it for synchronization or some sort of checks, but on such
> > > > systems is common to have some overhead committing data on persistent
> > > > storage maybe due to buffered mechanism or distributing the data
> across
> > > > the network before being available.
> > > >
> > > > Surely would be more use cases I don't came across right now.
> > > >
> > > > > Random insertion and deletion is not what FIFO queues like Pulsar
> are
> > > > designed for.
> > > >
> > > > I agree, but that is now what #3155 tries to achieve. #3155 is just a
> > > > fixed delay for all message in a consumer, that's the reason that the
> > > > implementation of #3155 is quite trivial.
> > > >
> > > > +1 from me for doing PIP-26 in functions.
> > > >
> > > > --
> > > > *Ezequiel Lovelle*
> > > >
> > > >
> > > > On Sat, 26 Jan 2019 at 09:57, Yuva raj <uv...@gmail.com> wrote:
> > > >
> > > >> Considering the way pulsar is built +1 for doing PIP-26 in
> functions.
> > I
> > > am
> > > >> more of thinking in a way like publish it pulsar we will make it
> > > available
> > > >> in a different queuing system if you need priority and delay
> messages
> > > >> support. Pulsar functions would go enough for this kind of use
> cases.
> > > >>
> > > >> On Fri, 25 Jan 2019 at 22:29, Ivan Kelly <iv...@apache.org> wrote:
> > > >>
> > > >> > > Correct. PIP-26 can be implemented in Functions. I believe the
> > last
> > > >> > > discussion in PIP-26 thread kind of agree on functions approach.
> > > >> > > If the community is okay with PIP-26 in functions, I think that
> is
> > > >> > probably
> > > >> > > a good approach to start.
> > > >> >
> > > >> > +1 for doing it in functions.
> > > >> >
> > > >> > -Ivan
> > > >> >
> > > >>
> > > >>
> > > >> --
> > > >> *Thanks*
> > > >>
> > > >> *Yuvaraj L*
> > > >>
> > > >
> > >
> >
>

Re: [DISCUSSION] Delayed message delivery

Posted by Joe F <jo...@gmail.com>.

Delayed subscription is simpler, and probably worth doing in the broker IF
done right.

How is this different from a subscription running behind?  Why does
supporting that require this complex a change in the dispatcher, when we
already support backlogged subscribers?

I am extremely wary of changes in the dispatcher. The recent changes made
to support DLQ caused major problems with garbage collection, broker
failure  and service interruptions for us. Even though we ARE NOT using the
DLQ feature. Not a pleasant experience.

This is a very performance sensitive piece of code, and it should be
treated as such.

Joe



On Wed, Feb 13, 2019 at 3:58 PM Sijie Guo <gu...@gmail.com> wrote:

> Hi all,
>
> I am going to wrap up the discussion regarding delayed delivery use cases.
>
> For arbitrary delayed delivery, there are a few +1s to doing PIP-26 in
> functions. I am assuming that we will go down this path, unless there are
> other proposals.
>
> However there is a use case Lovelle pointed out about "Fixed Delayed
> Message". More specifically it is
> https://github.com/apache/pulsar/pull/3155
> (The caption in #3155 is a bit misleading). IMO it is a "delayed
> subscription", basically all messages in the subscription is delayed to
> dispatch in a given time interval. The consensus of this feature is not yet
> achieved. Basically, there will be two approaches for this:
>
> a) DONT treat "fixed delayed message" as a different case. Just use the
> same approach as in PIP-26.
> b) treat "fixed delayed message" as a different case, e.g. we can better
> call it "delayed subscription" or whatever can distinguish it from general
> arbitrary delayed delivery. Use the approach proposed/discussed in #3155.
>
> I would like the community to discuss this and also come to an agreement.
> So Lovelle can move forward with the approach agreed by the community.
>
> Thanks,
> Sijie
>
> On Tue, Jan 29, 2019 at 6:30 AM Ezequiel Lovelle <
> ezequiellovelle@gmail.com>
> wrote:
>
> > "I agree, but that is *not what #3155 tries to achieve."
> >
> > This typo made this phrase nonsense, sorry!
> >
> > On Mon, 28 Jan 2019, 16:44 Ezequiel Lovelle <ezequiellovelle@gmail.com
> > wrote:
> >
> > > > What exactly is the delayed delivery use case?
> > >
> > > This is helpful on systems relaying on pulsar for persistent guarantees
> > > and using it for synchronization or some sort of checks, but on such
> > > systems is common to have some overhead committing data on persistent
> > > storage maybe due to buffered mechanism or distributing the data across
> > > the network before being available.
> > >
> > > Surely would be more use cases I don't came across right now.
> > >
> > > > Random insertion and deletion is not what FIFO queues like Pulsar are
> > > designed for.
> > >
> > > I agree, but that is now what #3155 tries to achieve. #3155 is just a
> > > fixed delay for all message in a consumer, that's the reason that the
> > > implementation of #3155 is quite trivial.
> > >
> > > +1 from me for doing PIP-26 in functions.
> > >
> > > --
> > > *Ezequiel Lovelle*
> > >
> > >
> > > On Sat, 26 Jan 2019 at 09:57, Yuva raj <uv...@gmail.com> wrote:
> > >
> > >> Considering the way pulsar is built +1 for doing PIP-26 in functions.
> I
> > am
> > >> more of thinking in a way like publish it pulsar we will make it
> > available
> > >> in a different queuing system if you need priority and delay messages
> > >> support. Pulsar functions would go enough for this kind of use cases.
> > >>
> > >> On Fri, 25 Jan 2019 at 22:29, Ivan Kelly <iv...@apache.org> wrote:
> > >>
> > >> > > Correct. PIP-26 can be implemented in Functions. I believe the
> last
> > >> > > discussion in PIP-26 thread kind of agree on functions approach.
> > >> > > If the community is okay with PIP-26 in functions, I think that is
> > >> > probably
> > >> > > a good approach to start.
> > >> >
> > >> > +1 for doing it in functions.
> > >> >
> > >> > -Ivan
> > >> >
> > >>
> > >>
> > >> --
> > >> *Thanks*
> > >>
> > >> *Yuvaraj L*
> > >>
> > >
> >
>

Re: [DISCUSSION] Delayed message delivery

Posted by Sijie Guo <gu...@gmail.com>.

Hi all,

I am going to wrap up the discussion regarding delayed delivery use cases.

For arbitrary delayed delivery, there are a few +1s to doing PIP-26 in
functions. I am assuming that we will go down this path, unless there are
other proposals.

However there is a use case Lovelle pointed out about "Fixed Delayed
Message". More specifically it is https://github.com/apache/pulsar/pull/3155
(The caption in #3155 is a bit misleading). IMO it is a "delayed
subscription", basically all messages in the subscription is delayed to
dispatch in a given time interval. The consensus of this feature is not yet
achieved. Basically, there will be two approaches for this:

a) DONT treat "fixed delayed message" as a different case. Just use the
same approach as in PIP-26.
b) treat "fixed delayed message" as a different case, e.g. we can better
call it "delayed subscription" or whatever can distinguish it from general
arbitrary delayed delivery. Use the approach proposed/discussed in #3155.

I would like the community to discuss this and also come to an agreement.
So Lovelle can move forward with the approach agreed by the community.

Thanks,
Sijie

On Tue, Jan 29, 2019 at 6:30 AM Ezequiel Lovelle <ez...@gmail.com>
wrote:

> "I agree, but that is *not what #3155 tries to achieve."
>
> This typo made this phrase nonsense, sorry!
>
> On Mon, 28 Jan 2019, 16:44 Ezequiel Lovelle <ezequiellovelle@gmail.com
> wrote:
>
> > > What exactly is the delayed delivery use case?
> >
> > This is helpful on systems relaying on pulsar for persistent guarantees
> > and using it for synchronization or some sort of checks, but on such
> > systems is common to have some overhead committing data on persistent
> > storage maybe due to buffered mechanism or distributing the data across
> > the network before being available.
> >
> > Surely would be more use cases I don't came across right now.
> >
> > > Random insertion and deletion is not what FIFO queues like Pulsar are
> > designed for.
> >
> > I agree, but that is now what #3155 tries to achieve. #3155 is just a
> > fixed delay for all message in a consumer, that's the reason that the
> > implementation of #3155 is quite trivial.
> >
> > +1 from me for doing PIP-26 in functions.
> >
> > --
> > *Ezequiel Lovelle*
> >
> >
> > On Sat, 26 Jan 2019 at 09:57, Yuva raj <uv...@gmail.com> wrote:
> >
> >> Considering the way pulsar is built +1 for doing PIP-26 in functions. I
> am
> >> more of thinking in a way like publish it pulsar we will make it
> available
> >> in a different queuing system if you need priority and delay messages
> >> support. Pulsar functions would go enough for this kind of use cases.
> >>
> >> On Fri, 25 Jan 2019 at 22:29, Ivan Kelly <iv...@apache.org> wrote:
> >>
> >> > > Correct. PIP-26 can be implemented in Functions. I believe the last
> >> > > discussion in PIP-26 thread kind of agree on functions approach.
> >> > > If the community is okay with PIP-26 in functions, I think that is
> >> > probably
> >> > > a good approach to start.
> >> >
> >> > +1 for doing it in functions.
> >> >
> >> > -Ivan
> >> >
> >>
> >>
> >> --
> >> *Thanks*
> >>
> >> *Yuvaraj L*
> >>
> >
>

Re: [DISCUSSION] Delayed message delivery

Posted by Ezequiel Lovelle <ez...@gmail.com>.

"I agree, but that is *not what #3155 tries to achieve."

This typo made this phrase nonsense, sorry!

On Mon, 28 Jan 2019, 16:44 Ezequiel Lovelle <ezequiellovelle@gmail.com
wrote:

> > What exactly is the delayed delivery use case?
>
> This is helpful on systems relaying on pulsar for persistent guarantees
> and using it for synchronization or some sort of checks, but on such
> systems is common to have some overhead committing data on persistent
> storage maybe due to buffered mechanism or distributing the data across
> the network before being available.
>
> Surely would be more use cases I don't came across right now.
>
> > Random insertion and deletion is not what FIFO queues like Pulsar are
> designed for.
>
> I agree, but that is now what #3155 tries to achieve. #3155 is just a
> fixed delay for all message in a consumer, that's the reason that the
> implementation of #3155 is quite trivial.
>
> +1 from me for doing PIP-26 in functions.
>
> --
> *Ezequiel Lovelle*
>
>
> On Sat, 26 Jan 2019 at 09:57, Yuva raj <uv...@gmail.com> wrote:
>
>> Considering the way pulsar is built +1 for doing PIP-26 in functions. I am
>> more of thinking in a way like publish it pulsar we will make it available
>> in a different queuing system if you need priority and delay messages
>> support. Pulsar functions would go enough for this kind of use cases.
>>
>> On Fri, 25 Jan 2019 at 22:29, Ivan Kelly <iv...@apache.org> wrote:
>>
>> > > Correct. PIP-26 can be implemented in Functions. I believe the last
>> > > discussion in PIP-26 thread kind of agree on functions approach.
>> > > If the community is okay with PIP-26 in functions, I think that is
>> > probably
>> > > a good approach to start.
>> >
>> > +1 for doing it in functions.
>> >
>> > -Ivan
>> >
>>
>>
>> --
>> *Thanks*
>>
>> *Yuvaraj L*
>>
>

Re: [DISCUSSION] Delayed message delivery

Posted by Ezequiel Lovelle <ez...@gmail.com>.

> What exactly is the delayed delivery use case?

This is helpful on systems relaying on pulsar for persistent guarantees
and using it for synchronization or some sort of checks, but on such
systems is common to have some overhead committing data on persistent
storage maybe due to buffered mechanism or distributing the data across
the network before being available.

Surely would be more use cases I don't came across right now.

> Random insertion and deletion is not what FIFO queues like Pulsar are
designed for.

I agree, but that is now what #3155 tries to achieve. #3155 is just a
fixed delay for all message in a consumer, that's the reason that the
implementation of #3155 is quite trivial.

+1 from me for doing PIP-26 in functions.

--
*Ezequiel Lovelle*

On Sat, 26 Jan 2019 at 09:57, Yuva raj <uv...@gmail.com> wrote:

> Considering the way pulsar is built +1 for doing PIP-26 in functions. I am
> more of thinking in a way like publish it pulsar we will make it available
> in a different queuing system if you need priority and delay messages
> support. Pulsar functions would go enough for this kind of use cases.
>
> On Fri, 25 Jan 2019 at 22:29, Ivan Kelly <iv...@apache.org> wrote:
>
> > > Correct. PIP-26 can be implemented in Functions. I believe the last
> > > discussion in PIP-26 thread kind of agree on functions approach.
> > > If the community is okay with PIP-26 in functions, I think that is
> > probably
> > > a good approach to start.
> >
> > +1 for doing it in functions.
> >
> > -Ivan
> >
>
>
> --
> *Thanks*
>
> *Yuvaraj L*
>

Re: [DISCUSSION] Delayed message delivery

Posted by Yuva raj <uv...@gmail.com>.

Considering the way pulsar is built +1 for doing PIP-26 in functions. I am
more of thinking in a way like publish it pulsar we will make it available
in a different queuing system if you need priority and delay messages
support. Pulsar functions would go enough for this kind of use cases.

On Fri, 25 Jan 2019 at 22:29, Ivan Kelly <iv...@apache.org> wrote:

> > Correct. PIP-26 can be implemented in Functions. I believe the last
> > discussion in PIP-26 thread kind of agree on functions approach.
> > If the community is okay with PIP-26 in functions, I think that is
> probably
> > a good approach to start.
>
> +1 for doing it in functions.
>
> -Ivan
>

-- 
*Thanks*

*Yuvaraj L*

Re: [DISCUSSION] Delayed message delivery

Posted by Ivan Kelly <iv...@apache.org>.

> Correct. PIP-26 can be implemented in Functions. I believe the last
> discussion in PIP-26 thread kind of agree on functions approach.
> If the community is okay with PIP-26 in functions, I think that is probably
> a good approach to start.

+1 for doing it in functions.

-Ivan

Re: [DISCUSSION] Delayed message delivery

Posted by Sijie Guo <gu...@gmail.com>.

On Thu, Jan 24, 2019 at 8:13 AM Joe F <jo...@gmail.com> wrote:

>   To me this discussion presupposes that a streaming system should provide
> a service like a database. Before we   discuss about how to implement this,
> we should look at whether this is something that fits into what is the core
> of Pulsar. I still have the same concerns against doing this in the broker
> dispatch side.
>
> What exactly is the delayed delivery use case?  Random insertion, dynamic
> sorting,  and deletion from the top of the sort.  That is a priority queue.
> It is best implemented as a heap. For larger sets it's some sort of tree
> structure. You can simulate that on a database with an index.
>
> Random insertion and deletion is not what FIFO queues like Pulsar are
> designed for.  The closest thing I can think of with Pulsar is to build an
> in-mem priority queue in a Pulsar function, feed it from an input topic and
> publish the top of the queue into a separate output topic.





> In fact the
> entire logic proposed in PIP-26 can be done outside the broker in a Pulsar
> function.
>

Correct. PIP-26 can be implemented in Functions. I believe the last
discussion in PIP-26 thread kind of agree on functions approach.
If the community is okay with PIP-26 in functions, I think that is probably
a good approach to start.


>
> For a small scale setup, these distinctions do not matter - you can use a
> database as a queue and a queue as a database. But at any larger scale, a
> streaming system is not the correct solution for a priority queue use case,
> whether it's Pulsar or some other streaming system. So far I have not seen
> any mention of the target scale for the design, or the specific use case
> requirements
>
> -joe
>
>
> On Sat, Jan 19, 2019 at 6:43 PM PengHui Li <co...@gmail.com>
> wrote:
>
> > Hi All,
> >
> > Actually, I also prefer to simplify at broker side.
> >
> > If pulsar support set arbitrary timeout on each message, if not cluster
> > failure or consumer failure,
> > it needs to behave normally(on time). Otherwise, user need to understand
> > how pulsar dispatching
> > messages and how limit of unacked messages change the delay message
> > behavior. This may
> > lead users to hesitate, this feature may be misused when the user does
> not
> > fully understand how it works.
> >
> > When user depends arbitrary timeout message feature, users just need to
> > keep producer and consumer
> > is work well, and administrator of pulsar need to keep pulsar cluster
> work
> > well.
> >
> > I don't think pulsar is very necessary to support this feature(arbitrary
> > timeout message),
> > In most scenarios, #3155 can work well, In a few cases, even if support
> > arbitrary timeout message in
> > client side, i believe that still can not meet the requirement of all
> > delayed messages.
> >
> > To me, i’m not against support arbitrary timeout on each message on
> client
> > side, maybe this is useful
> > for other users. In some of our scenarios, we also need a more functional
> > alternative(a task service).
> >
> > Of course, If we can integrate a task service, we can use pulsar to
> > guaranteed delivery of messages,
> > task service guaranteed send message to pulsar success. Or pulsar broker
> > support filter server.
> > This way users can implement their own task services.
> >
> > Ezequiel Lovelle <ez...@gmail.com> 于2019年1月20日周日 上午12:28写道：
> >
> > > > If the goal is to minimize the amount of redeliveries from broker ->
> > > client, there are multiple ways to achieve that with the client based
> > > approach
> > > (eg. send message id and delay time instead of the full payload to
> > > consumers
> > > as Ivan proposed).
> > >
> > > But the main reason to put this logic on client side was not adding
> delay
> > > related logic on broker side, in order to do this optimisations the
> > broker
> > > must be aware of delayed message and only send message id and delay
> time
> > > without payload.
> > >
> > > > I don't necessarily agree with that. NTP is widely available
> > > and understood. Any application that's doing anything time-related
> would
> > > have
> > > to make sure the clocks are reasonably synced.
> > >
> > > Yep, that's true, but from my point of view a system that depends on
> > client
> > > side clock is weaker than a system that does this kind of calculation
> at
> > > a more controlled environment aka backend. This adds one more factor
> that
> > > depends on the user doing things right, which is not always the case.
> > >
> > > One possible solution might be the broker send periodically its current
> > > epoch time and the client do the calculations with this data, or send
> > epoch
> > > time initially at subscription and do the rest of calculations doing
> > delta
> > > of
> > > time using the initial time from broker as a base (time flows equally
> for
> > > both
> > > the important thing is which one is positioned at the very present
> time).
> > >
> > > Anyway this mentioned approach sound like an a hack just from the fact
> of
> > > not doing the time calculations in the backend.
> > >
> > > > Lastly, i do agree client side approaches have better scalability
> than
> > > server side approaches in most cases. However I don’t believe that it
> is
> > > the case here. And I don’t see anyone have a clear explanation on why a
> > > broker approach is less scalable than the client side approach.
> > >
> > > Yes, I agree with this. At least for fixed time delay at pr #3155.
> > >
> > > The only remained concern to me would be Gc usage of stored positions
> > > next to be expired, anyway, since the nature of a fixed delay and
> > > from the fact that process a ledger tend to be in a sequentially
> manner,
> > > we could store a range of positions id for some delta when intensive
> > > traffic is going on, I believe I did this mention on the pr.
> > >
> > > > Again, in general I'm more concerned of stuff that happens in broker
> > > because
> > > it will have to be scaled up 10s of thousands of times in a single
> > > process, while in client typically the requirements are much simpler.
> > >
> > > I agree that adding logic to broker should be considered with deep
> care,
> > > but in this specific scenario at worst case we will only have one and
> > only
> > > one scheduled task per consumer which will take all expired positions
> > > from a DelayQueue.
> > >
> > > --
> > > *Ezequiel Lovelle*
> > >
> > >
> > > On Sat, 19 Jan 2019 at 01:02, Matteo Merli <ma...@gmail.com>
> > wrote:
> > >
> > > > Just a quick correction:
> > > >
> > > > > And I don’t see anyone have a clear explanation on why a
> > > > broker approach is less scalable than the client side approach.
> > > >
> > > > I haven't said it less or more scalable. I was meaning that it's
> > > > "easier" to scale, in that we don't have to do lots of fancy stuff
> > > > and add more and more control to make sure that the implementation
> > > > will not become a concern point at scale (eg: limit the overall
> > > > amount of memory used in broker, across all topics, and the
> > > > impact on GC of these long-living objects).
> > > >
> > > > > However, clock skew in a brokerside approach
> > > > is easier to manage and more predictable, but clock skew in a
> > clientside
> > > > approach is much harder to manage and more unpredictable
> > > >
> > > > I don't necessarily agree with that. NTP is widely available
> > > > and understood.
> > > > Any application that's doing anything time-related would have
> > > > to make sure the clocks are reasonably synced.
> > > >
> > > > --
> > > > Matteo Merli
> > > > <ma...@gmail.com>
> > > >
> > > > On Fri, Jan 18, 2019 at 7:46 PM Sijie Guo <gu...@gmail.com>
> wrote:
> > > > >
> > > > > On Sat, Jan 19, 2019 at 9:45 AM Matteo Merli <
> matteo.merli@gmail.com
> > >
> > > > wrote:
> > > > >
> > > > > > Trying to group and compress responses here.
> > > > > >
> > > > > > > If consumer control the delayed message specific execution time
> > we
> > > > must
> > > > > > trust clock of consumer, this can cause delayed message process
> > ahead
> > > > of
> > > > > > time, some applications cannot tolerate this condition.
> > > > > >
> > > > > > This is a problem that cannot be solved.
> > > > > > Even assuming the timestamps are assigned by brokers and are
> > > guaranteed
> > > > > > to be monotonic, this won't prevent 2 brokers from having clock
> > > skews.
> > > > > > That would results in different delivery delays.
> > > > > >
> > > > > > Similarly, the broker timestamp might be assigned later compared
> to
> > > > when a
> > > > > > publisher was "intending" to start the clock.
> > > > > >
> > > > > > Barring super-precise clock synchronization techniques (which are
> > way
> > > > out
> > > > > > of the scope of this discussion), the only reasonable way to
> think
> > > > about
> > > > > > this is
> > > > > > that delays needs to be orders of magnitudes bigger than the
> > average
> > > > clock
> > > > > > skew experienced with common techniques (eg: NTP). NTP clock skew
> > > will
> > > > > > generally be in the 10s of millis. Any delay > 1 seconds will
> > hardly
> > > be
> > > > > > noticeably affected by these skews.
> > > > > >
> > > > > > Additionally, any optimization on the timeouts handling (like the
> > > > > > hash-wheel
> > > > > > timer proposed in PIP-26) will trade off precision for
> efficiency.
> > In
> > > > that
> > > > > > case,
> > > > > > the delays are managed in buckets, and can result in higher
> delays
> > > that
> > > > > > what was requested.
> > > > > >
> > > > > > > 1. Fixed timeout, e.g.(with 10s, 30s, 10min delayed), this is
> the
> > > > largest
> > > > > > proportion in throughput of delayed message . A subscription
> with a
> > > > fixed
> > > > > > delayed time can approach to this scene.
> > > > > >
> > > > > > I don't think that for fixed delays, any server-side
> implementation
> > > > > > would provide
> > > > > > any advantage compared to doing:
> > > > > >
> > > > > > ```
> > > > > > while (true) {
> > > > > >     Message msg = consumer.receive();
> > > > > >     long delayMillis = calculateDelay(msg)
> > > > > >     if (delayMillis > 0) {
> > > > > >         Thread.sleep(delayMillis);
> > > > > >     }
> > > > > >
> > > > > >     // Do something
> > > > > >     consumer.acknowledge(msg);
> > > > > > }
> > > > > > ```
> > > > > >
> > > > > > This will not need any support from broker. Also, there will be
> no
> > > > > > redeliveries.
> > > > > >
> > > > > > It could be wrapped in the client API, although I don't see that
> as
> > > > > > big of a problem.
> > > > > >
> > > > > > > My concern of this category of approaches is "bandwidth" usage.
> > It
> > > is
> > > > > > basically trading bandwidth for complexity.
> > > > > >
> > > > > > With mixed delays on a single topic, in any case there has to be
> > some
> > > > kind
> > > > > > of time-based sorting of the messages that needs to happen either
> > at
> > > > broker
> > > > > > or at client.
> > > > > >
> > > > > > Functionally, I believe that either place is equivalent (from a
> > user
> > > > > > point of view),
> > > > > > barring the different implementation requirements.
> > > > > >
> > > > > > In my view, the bigger cost here is not bandwidth but rather the
> > disk
> > > > > > IO, that will
> > > > > > happen exactly in the same way in both cases. Messages can be
> > cached,
> > > > > > up to a certain point, either in broker or in client library.
> After
> > > > > > that, in both cases,
> > > > > > the messages will have to be fetched from bookies.
> > > > > >
> > > > > > Also, when implementing the delay feature in the client, the
> > existing
> > > > > > flow control
> > > > > > mechanism is naturally applied to limit the overall amount of
> > > > information
> > > > > > that
> > > > > > we have to keep track (the "currently tracked" messages). Some
> > other
> > > > > > mechanism
> > > > > > would have to be done in the broker as well.
> > > > > >
> > > > > > Again, in general I'm more concerned of stuff that happens in
> > broker
> > > > > > because
> > > > > > it will have to be scaled up 10s of thousands of times in a
> single
> > > > > > process, while
> > > > > > in client typically the requirements are much simpler.
> > > > > >
> > > > > > If the goal is to minimize the amount of redeliveries from broker
> > ->
> > > > > > client, there
> > > > > > are multiple ways to achieve that with the client based approach
> > (eg.
> > > > send
> > > > > > message id and delay time instead of the full payload to
> consumers
> > as
> > > > Ivan
> > > > > > proposed).
> > > > > >
> > > > > > This seems to be simpler and with less overhead than having to
> > > persist
> > > > > > the whole
> > > > > > hashweel timer state into a ledger.
> > > > >
> > > > >
> > > > > I agree with that there are many optimizations can be applied at a
> > > client
> > > > > side approach. In a stable world, these approaches are technically
> > > > > equivalent.
> > > > >
> > > > > However I do not agree with a few points:
> > > > >
> > > > > First, based on my past production experiences, network bandwidth
> on
> > > > broker
> > > > > is the bigger cost than io cost in a multi subscription case.
> Also, I
> > > > have
> > > > > heard a few production users have experienced latency issues where
> > > broker
> > > > > network bandwidth is saturated. So any mechanisms that rely on
> > > > redeliveries
> > > > > are a big red flag to me.
> > > > >
> > > > > Secondly, currently pulsar is using more bandwidth on brokers, than
> > > > > bandwidth on bookies. It is not a balanced state. I am more leaning
> > > > towards
> > > > > an approach that can leverage bookies’ idle bandwidth, rather than
> > > > > potentially using more bandwidth on brokers.
> > > > >
> > > > > Thirdly, in my view, clock skew concern is not a technical issue,
> > but a
> > > > > management issue. As what Ivan and you have pointed out, there are
> > many
> > > > > ways on addressing clock skew. However, clock skew in a brokerside
> > > > approach
> > > > > is easier to manage and more predictable, but clock skew in a
> > > clientside
> > > > > approach is much harder to manage and more unpredictable. This
> > > > > unpredictability can significantly change the io or network pattern
> > > when
> > > > > things go bad. When such unpredictability happens, it can cause bad
> > > > things
> > > > > and saturating broker network in a redeliver-ish approach. If we
> are
> > > > > building a distributed system that can handle this
> unpredictability,
> > a
> > > > > broker-side approach is much more friendly to managebility and
> > incident
> > > > > management.
> > > > >
> > > > > Lastly, i do agree client side approaches have better scalability
> > than
> > > > > server side approaches in most cases. However I don’t believe that
> it
> > > is
> > > > > the case here. And I don’t see anyone have a clear explanation on
> > why a
> > > > > broker approach is less scalable than the client side approach.
> > > > >
> > > > > Anyway, for managebility, bandwidth usage, client simplicity, I am
> > more
> > > > in
> > > > > favor of a broker side approach, or at least an approach that is
> not
> > > > > redelivery based. However since the feature is requested by Penghui
> > > > > and Ezequiel,
> > > > > I am also fine with this client side approach if they are okay with
> > > that.
> > > > >
> > > > > - Sijie
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Matteo Merli
> > > > > > <ma...@gmail.com>
> > > > > >
> > > > > >
> > > > > > On Fri, Jan 18, 2019 at 6:35 AM Ezequiel Lovelle
> > > > > > <ez...@gmail.com> wrote:
> > > > > > >
> > > > > > > Hi All! and sorry for delay :)
> > > > > > >
> > > > > > > Probably I'm going to say some things already said, so sorry
> > > > beforehand.
> > > > > > >
> > > > > > > The two main needed features I think are the proposed:
> > > > > > > A. Producer delay PIP-26. B. Consumers delay PR #3155
> > > > > > >
> > > > > > > Of course PIP-26 would result in consumers receiving delayed
> > > messages
> > > > > > > but the important thing here is one of them made the decision
> > about
> > > > > > delay.
> > > > > > >
> > > > > > > First, the easy one, PR #3155. Consumers delay:
> > > > > > >
> > > > > > > As others have stated before, this is a more trivial approach
> > > because
> > > > > > > of the nature of having the exactly same period of delay for
> each
> > > > message
> > > > > > > which is predictable.
> > > > > > >
> > > > > > > I agree that adding logic at broker should be avoided, but, for
> > > this
> > > > > > > specific feature #3155 which I don't think is complex I believe
> > > there
> > > > > > > are others serious advantages:
> > > > > > >
> > > > > > >  1. Simplicity at client side, we don't need to add any code
> > which
> > > is
> > > > > > >     less error prone.
> > > > > > >  2. Clock issues from client side being outdated and causing
> > > headache
> > > > > > >     to users detecting this.
> > > > > > >  3. Avoids huge overhead delivering non expired messages across
> > the
> > > > > > >     network unnecessary.
> > > > > > >  4. Consumers are free to decide to consume messages with delay
> > > > > > regardless
> > > > > > >     of the producer.
> > > > > > >  5. Delay is uniform for all messages, which sometimes is the
> > > > solution
> > > > > > >     to the problem rather than arbitrary delays.
> > > > > > >
> > > > > > > I think that would be great if pulsar can provide this kind of
> > > > features
> > > > > > > without relaying on users needing to know heavy details about
> the
> > > > > > > mechanism.
> > > > > > >
> > > > > > > For PIP-26:
> > > > > > >
> > > > > > > I think we can offer this with the purpose of message's with a
> > more
> > > > long
> > > > > > > delay in terms of time? hours / days?
> > > > > > >
> > > > > > > So, if this is the case, we can assume a small granularity of
> > time
> > > > like
> > > > > > > 1 minute making ledger's representing 1 minute of time and
> > > truncating
> > > > > > > each time of message for it corresponding minute and storing in
> > > that
> > > > > > > special ledger.
> > > > > > > Users wanting to receive a messages scheduled for some days in
> > > future
> > > > > > > rarely would care of a margin of error of 1 minute.
> > > > > > >
> > > > > > > Of course we need somehow make the broker aware of this in
> order
> > to
> > > > only
> > > > > > > process ledger's for current corresponding minute and consume
> it.
> > > > > > > And the broker would be the one subject to close current minute
> > > > truncated
> > > > > > > processed ledger.
> > > > > > >
> > > > > > > One problem I can think about this approach, is it painful for
> > > > Bookkeeper
> > > > > > > to having a lot of opened ledgers? (one for each minute per
> > topic)
> > > > > > >
> > > > > > > Another problem here might be what happen if consumer was not
> > > > started?
> > > > > > > At startup time the broker should looking for potentially older
> > > > ledger's
> > > > > > > than its current time and this might be expensive.
> > > > > > >
> > > > > > > Other more trivial issue, we might need to refactor current
> > > mechanism
> > > > > > > which deletes closed ledgers older than the configured time on
> > name
> > > > > > space.
> > > > > > >
> > > > > > > As a final note I think that would be great to have both
> features
> > > in
> > > > > > pulsar
> > > > > > > but sometimes not everything desired is achievable.
> > > > > > > And please correct me if I said something senseless.
> > > > > > >
> > > > > > > --
> > > > > > > *Ezequiel Lovelle*
> > > > > > >
> > > > > > >
> > > > > > > On Fri, 18 Jan 2019 at 05:51, PengHui Li <
> > codelipenghui@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > > So rather than specifying the absolute timestamp that the
> > > message
> > > > > > > > > should appear to the user, the dispatcher can specify the
> > > > relative
> > > > > > > > > delay after dispatch that it should appear to the user.
> > > > > > > >
> > > > > > > > As matteo said the worst case would be that the applied delay
> > to
> > > be
> > > > > > higher
> > > > > > > > for some of the messages, if specify the relative delay to
> > > > consumer,
> > > > > > > > if consumer offline for a period of time, consumer will
> receive
> > > > many
> > > > > > > > delayed messages
> > > > > > > > after connect to broker again will cause the worst case more
> > > > serious.
> > > > > > It's
> > > > > > > > difficult to keep
> > > > > > > > consumers always online.
> > > > > > > >
> > > > > > > > In my personal perspective, i refer to use `delay level
> topic`
> > to
> > > > > > approach
> > > > > > > > smaller delays scene.
> > > > > > > > e.g(10s-topic, 30s-topic), this will not be too much topic.
> And
> > > we
> > > > are
> > > > > > > > using dead letter topic to simulate
> > > > > > > > delay message feature, delayed topics has different delay
> > level.
> > > > > > > >
> > > > > > > > For very long delays scene, in our practice, user may cancel
> it
> > > or
> > > > > > restart
> > > > > > > > it.
> > > > > > > > After previous discussions, i agree that PIP-26 will make
> > broker
> > > > > > > > more complexity.
> > > > > > > > So I had the idea to consider as a separate mechanism.
> > > > > > > >
> > > > > > > >
> > > > > > > > Sijie Guo <gu...@gmail.com> 于2019年1月18日周五 下午3:22写道：
> > > > > > > >
> > > > > > > > > On Fri, Jan 18, 2019 at 2:51 PM Ivan Kelly <
> ivank@apache.org
> > >
> > > > wrote:
> > > > > > > > >
> > > > > > > > > > One thing missing from this discussion is details on the
> > > > motivating
> > > > > > > > > > use-case. How many delayed messages per second are we
> > > > expecting?
> > > > > > And
> > > > > > > > > > what is the payload size?
> > > > > > > > > >
> > > > > > > > > > > If consumer control the delayed message specific
> > execution
> > > > time
> > > > > > we
> > > > > > > > must
> > > > > > > > > > > trust clock of consumer, this can cause delayed message
> > > > process
> > > > > > ahead
> > > > > > > > > of
> > > > > > > > > > > time, some applications cannot tolerate this condition.
> > > > > > > > > >
> > > > > > > > > > This can be handled in a number of ways. Consumer clocks
> > can
> > > be
> > > > > > skewed
> > > > > > > > > > with regard to other clocks, but it is generally safe to
> > > assume
> > > > > > that
> > > > > > > > > > clocks advance at the same rate, especially at the
> > > granularity
> > > > of a
> > > > > > > > > > couple of hours.
> > > > > > > > > > So rather than specifying the absolute timestamp that the
> > > > message
> > > > > > > > > > should appear to the user, the dispatcher can specify the
> > > > relative
> > > > > > > > > > delay after dispatch that it should appear to the user.
> > > > > > > > > >
> > > > > > > > > > > > My concern of this category of approaches is
> > "bandwidth"
> > > > > > usage. It
> > > > > > > > is
> > > > > > > > > > > > basically trading bandwidth for complexity.
> > > > > > > > > > >
> > > > > > > > > > > @Sijie Guo <si...@apache.org> Agree with you, such an
> > > > trading
> > > > > > can
> > > > > > > > > cause
> > > > > > > > > > the
> > > > > > > > > > > broker's out going network to be more serious.
> > > > > > > > > >
> > > > > > > > > > I don't think PIP-26's approach may not use less
> bandwidth
> > in
> > > > this
> > > > > > > > > > regard. With PIP-26, the msg ids are stored in a ledger,
> > and
> > > > when
> > > > > > the
> > > > > > > > > > timeout triggers it dispatches? Are all the delayed
> message
> > > > being
> > > > > > > > > > cached at the broker? If so, that is using a lot of
> memory,
> > > and
> > > > > > it's
> > > > > > > > > > exactly the kind of memory usage pattern that is very bad
> > for
> > > > JVM
> > > > > > > > > > garbage collection. If not, then you have to read the
> > message
> > > > back
> > > > > > in
> > > > > > > > > > from bookkeeper, so the bandwidth usage is the same,
> though
> > > on
> > > > a
> > > > > > > > > > different path.
> > > > > > > > > >
> > > > > > > > > > In the client side approach, the message could be cached
> to
> > > > avoid a
> > > > > > > > > > redispatch. When I was discussing with Matteo, we
> discussed
> > > > this.
> > > > > > The
> > > > > > > > > > redelivery logic has to be there in any case, as any
> cache
> > > > (broker
> > > > > > or
> > > > > > > > > > client side) must have a limited size.
> > > > > > > > > > Another option would be to skip sending the payload for
> > > delayed
> > > > > > > > > > messages, and only send it when the client request
> > > redelivery,
> > > > but
> > > > > > > > > > this has the same issue with regard to the entry likely
> > > > falling out
> > > > > > > > > > the cache at the broker-side.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > There are bandwidth usage at either approaches for sure.
> The
> > > main
> > > > > > > > > difference between broker-side and client-side approaches
> is
> > > > which
> > > > > > part
> > > > > > > > of
> > > > > > > > > the bandwidth is used.
> > > > > > > > >
> > > > > > > > > In the broker-side approach, it is using the bookies egress
> > and
> > > > > > broker
> > > > > > > > > ingress bandwidth. In a typical pulsar deployment, bookies
> > > > egress is
> > > > > > > > mostly
> > > > > > > > > idle unless there are consumers falling behind.
> > > > > > > > >
> > > > > > > > > In the client-side approach, it is using broker’s egress
> > > > bandwidth
> > > > > > and
> > > > > > > > > potentially bookies’ egress bandwidth. Brokers’ egress is
> > > > critical
> > > > > > since
> > > > > > > > it
> > > > > > > > > is shared across consumers. So if the broker egress is
> > doubled,
> > > > it
> > > > > > is a
> > > > > > > > red
> > > > > > > > > flag.
> > > > > > > > >
> > > > > > > > > Although I agree the bandwidth usage depends on workloads.
> > But
> > > in
> > > > > > theory,
> > > > > > > > > broker-side approach is more friendly to resource usage
> and a
> > > > better
> > > > > > > > > approach to use the resources in a multi layered
> > architecture.
> > > > > > Because it
> > > > > > > > > uses less bandwidth at broker side. A client side can cause
> > > more
> > > > > > > > bandwidth
> > > > > > > > > usage at broker side.
> > > > > > > > >
> > > > > > > > > Also as what penghui pointed out, clock screw can be
> another
> > > > factor
> > > > > > > > causing
> > > > > > > > > more traffic in a fanout case. In a broker-side approach,
> the
> > > > > > deferred is
> > > > > > > > > done in a central point, so when the deferred time point
> > kicks
> > > > in,
> > > > > > broker
> > > > > > > > > just need to read the data one time from bookies. However
> in
> > a
> > > > > > > > client-side
> > > > > > > > > approach, the messages are asked by different
> subscriptions,
> > > > > > different
> > > > > > > > > subscription can ask the deferred message at any time based
> > on
> > > > their
> > > > > > > > > clocks.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > -Ivan
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
>

Re: [DISCUSSION] Delayed message delivery

Posted by Joe F <jo...@gmail.com>.

  To me this discussion presupposes that a streaming system should provide
a service like a database. Before we   discuss about how to implement this,
we should look at whether this is something that fits into what is the core
of Pulsar. I still have the same concerns against doing this in the broker
dispatch side.

What exactly is the delayed delivery use case?  Random insertion, dynamic
sorting,  and deletion from the top of the sort.  That is a priority queue.
It is best implemented as a heap. For larger sets it's some sort of tree
structure. You can simulate that on a database with an index.

Random insertion and deletion is not what FIFO queues like Pulsar are
designed for.  The closest thing I can think of with Pulsar is to build an
in-mem priority queue in a Pulsar function, feed it from an input topic and
publish the top of the queue into a separate output topic.  In fact the
entire logic proposed in PIP-26 can be done outside the broker in a Pulsar
function.

For a small scale setup, these distinctions do not matter - you can use a
database as a queue and a queue as a database. But at any larger scale, a
streaming system is not the correct solution for a priority queue use case,
whether it's Pulsar or some other streaming system. So far I have not seen
any mention of the target scale for the design, or the specific use case
requirements

-joe


On Sat, Jan 19, 2019 at 6:43 PM PengHui Li <co...@gmail.com> wrote:

> Hi All,
>
> Actually, I also prefer to simplify at broker side.
>
> If pulsar support set arbitrary timeout on each message, if not cluster
> failure or consumer failure,
> it needs to behave normally(on time). Otherwise, user need to understand
> how pulsar dispatching
> messages and how limit of unacked messages change the delay message
> behavior. This may
> lead users to hesitate, this feature may be misused when the user does not
> fully understand how it works.
>
> When user depends arbitrary timeout message feature, users just need to
> keep producer and consumer
> is work well, and administrator of pulsar need to keep pulsar cluster work
> well.
>
> I don't think pulsar is very necessary to support this feature(arbitrary
> timeout message),
> In most scenarios, #3155 can work well, In a few cases, even if support
> arbitrary timeout message in
> client side, i believe that still can not meet the requirement of all
> delayed messages.
>
> To me, i’m not against support arbitrary timeout on each message on client
> side, maybe this is useful
> for other users. In some of our scenarios, we also need a more functional
> alternative(a task service).
>
> Of course, If we can integrate a task service, we can use pulsar to
> guaranteed delivery of messages,
> task service guaranteed send message to pulsar success. Or pulsar broker
> support filter server.
> This way users can implement their own task services.
>
> Ezequiel Lovelle <ez...@gmail.com> 于2019年1月20日周日 上午12:28写道：
>
> > > If the goal is to minimize the amount of redeliveries from broker ->
> > client, there are multiple ways to achieve that with the client based
> > approach
> > (eg. send message id and delay time instead of the full payload to
> > consumers
> > as Ivan proposed).
> >
> > But the main reason to put this logic on client side was not adding delay
> > related logic on broker side, in order to do this optimisations the
> broker
> > must be aware of delayed message and only send message id and delay time
> > without payload.
> >
> > > I don't necessarily agree with that. NTP is widely available
> > and understood. Any application that's doing anything time-related would
> > have
> > to make sure the clocks are reasonably synced.
> >
> > Yep, that's true, but from my point of view a system that depends on
> client
> > side clock is weaker than a system that does this kind of calculation at
> > a more controlled environment aka backend. This adds one more factor that
> > depends on the user doing things right, which is not always the case.
> >
> > One possible solution might be the broker send periodically its current
> > epoch time and the client do the calculations with this data, or send
> epoch
> > time initially at subscription and do the rest of calculations doing
> delta
> > of
> > time using the initial time from broker as a base (time flows equally for
> > both
> > the important thing is which one is positioned at the very present time).
> >
> > Anyway this mentioned approach sound like an a hack just from the fact of
> > not doing the time calculations in the backend.
> >
> > > Lastly, i do agree client side approaches have better scalability than
> > server side approaches in most cases. However I don’t believe that it is
> > the case here. And I don’t see anyone have a clear explanation on why a
> > broker approach is less scalable than the client side approach.
> >
> > Yes, I agree with this. At least for fixed time delay at pr #3155.
> >
> > The only remained concern to me would be Gc usage of stored positions
> > next to be expired, anyway, since the nature of a fixed delay and
> > from the fact that process a ledger tend to be in a sequentially manner,
> > we could store a range of positions id for some delta when intensive
> > traffic is going on, I believe I did this mention on the pr.
> >
> > > Again, in general I'm more concerned of stuff that happens in broker
> > because
> > it will have to be scaled up 10s of thousands of times in a single
> > process, while in client typically the requirements are much simpler.
> >
> > I agree that adding logic to broker should be considered with deep care,
> > but in this specific scenario at worst case we will only have one and
> only
> > one scheduled task per consumer which will take all expired positions
> > from a DelayQueue.
> >
> > --
> > *Ezequiel Lovelle*
> >
> >
> > On Sat, 19 Jan 2019 at 01:02, Matteo Merli <ma...@gmail.com>
> wrote:
> >
> > > Just a quick correction:
> > >
> > > > And I don’t see anyone have a clear explanation on why a
> > > broker approach is less scalable than the client side approach.
> > >
> > > I haven't said it less or more scalable. I was meaning that it's
> > > "easier" to scale, in that we don't have to do lots of fancy stuff
> > > and add more and more control to make sure that the implementation
> > > will not become a concern point at scale (eg: limit the overall
> > > amount of memory used in broker, across all topics, and the
> > > impact on GC of these long-living objects).
> > >
> > > > However, clock skew in a brokerside approach
> > > is easier to manage and more predictable, but clock skew in a
> clientside
> > > approach is much harder to manage and more unpredictable
> > >
> > > I don't necessarily agree with that. NTP is widely available
> > > and understood.
> > > Any application that's doing anything time-related would have
> > > to make sure the clocks are reasonably synced.
> > >
> > > --
> > > Matteo Merli
> > > <ma...@gmail.com>
> > >
> > > On Fri, Jan 18, 2019 at 7:46 PM Sijie Guo <gu...@gmail.com> wrote:
> > > >
> > > > On Sat, Jan 19, 2019 at 9:45 AM Matteo Merli <matteo.merli@gmail.com
> >
> > > wrote:
> > > >
> > > > > Trying to group and compress responses here.
> > > > >
> > > > > > If consumer control the delayed message specific execution time
> we
> > > must
> > > > > trust clock of consumer, this can cause delayed message process
> ahead
> > > of
> > > > > time, some applications cannot tolerate this condition.
> > > > >
> > > > > This is a problem that cannot be solved.
> > > > > Even assuming the timestamps are assigned by brokers and are
> > guaranteed
> > > > > to be monotonic, this won't prevent 2 brokers from having clock
> > skews.
> > > > > That would results in different delivery delays.
> > > > >
> > > > > Similarly, the broker timestamp might be assigned later compared to
> > > when a
> > > > > publisher was "intending" to start the clock.
> > > > >
> > > > > Barring super-precise clock synchronization techniques (which are
> way
> > > out
> > > > > of the scope of this discussion), the only reasonable way to think
> > > about
> > > > > this is
> > > > > that delays needs to be orders of magnitudes bigger than the
> average
> > > clock
> > > > > skew experienced with common techniques (eg: NTP). NTP clock skew
> > will
> > > > > generally be in the 10s of millis. Any delay > 1 seconds will
> hardly
> > be
> > > > > noticeably affected by these skews.
> > > > >
> > > > > Additionally, any optimization on the timeouts handling (like the
> > > > > hash-wheel
> > > > > timer proposed in PIP-26) will trade off precision for efficiency.
> In
> > > that
> > > > > case,
> > > > > the delays are managed in buckets, and can result in higher delays
> > that
> > > > > what was requested.
> > > > >
> > > > > > 1. Fixed timeout, e.g.(with 10s, 30s, 10min delayed), this is the
> > > largest
> > > > > proportion in throughput of delayed message . A subscription with a
> > > fixed
> > > > > delayed time can approach to this scene.
> > > > >
> > > > > I don't think that for fixed delays, any server-side implementation
> > > > > would provide
> > > > > any advantage compared to doing:
> > > > >
> > > > > ```
> > > > > while (true) {
> > > > >     Message msg = consumer.receive();
> > > > >     long delayMillis = calculateDelay(msg)
> > > > >     if (delayMillis > 0) {
> > > > >         Thread.sleep(delayMillis);
> > > > >     }
> > > > >
> > > > >     // Do something
> > > > >     consumer.acknowledge(msg);
> > > > > }
> > > > > ```
> > > > >
> > > > > This will not need any support from broker. Also, there will be no
> > > > > redeliveries.
> > > > >
> > > > > It could be wrapped in the client API, although I don't see that as
> > > > > big of a problem.
> > > > >
> > > > > > My concern of this category of approaches is "bandwidth" usage.
> It
> > is
> > > > > basically trading bandwidth for complexity.
> > > > >
> > > > > With mixed delays on a single topic, in any case there has to be
> some
> > > kind
> > > > > of time-based sorting of the messages that needs to happen either
> at
> > > broker
> > > > > or at client.
> > > > >
> > > > > Functionally, I believe that either place is equivalent (from a
> user
> > > > > point of view),
> > > > > barring the different implementation requirements.
> > > > >
> > > > > In my view, the bigger cost here is not bandwidth but rather the
> disk
> > > > > IO, that will
> > > > > happen exactly in the same way in both cases. Messages can be
> cached,
> > > > > up to a certain point, either in broker or in client library. After
> > > > > that, in both cases,
> > > > > the messages will have to be fetched from bookies.
> > > > >
> > > > > Also, when implementing the delay feature in the client, the
> existing
> > > > > flow control
> > > > > mechanism is naturally applied to limit the overall amount of
> > > information
> > > > > that
> > > > > we have to keep track (the "currently tracked" messages). Some
> other
> > > > > mechanism
> > > > > would have to be done in the broker as well.
> > > > >
> > > > > Again, in general I'm more concerned of stuff that happens in
> broker
> > > > > because
> > > > > it will have to be scaled up 10s of thousands of times in a single
> > > > > process, while
> > > > > in client typically the requirements are much simpler.
> > > > >
> > > > > If the goal is to minimize the amount of redeliveries from broker
> ->
> > > > > client, there
> > > > > are multiple ways to achieve that with the client based approach
> (eg.
> > > send
> > > > > message id and delay time instead of the full payload to consumers
> as
> > > Ivan
> > > > > proposed).
> > > > >
> > > > > This seems to be simpler and with less overhead than having to
> > persist
> > > > > the whole
> > > > > hashweel timer state into a ledger.
> > > >
> > > >
> > > > I agree with that there are many optimizations can be applied at a
> > client
> > > > side approach. In a stable world, these approaches are technically
> > > > equivalent.
> > > >
> > > > However I do not agree with a few points:
> > > >
> > > > First, based on my past production experiences, network bandwidth on
> > > broker
> > > > is the bigger cost than io cost in a multi subscription case. Also, I
> > > have
> > > > heard a few production users have experienced latency issues where
> > broker
> > > > network bandwidth is saturated. So any mechanisms that rely on
> > > redeliveries
> > > > are a big red flag to me.
> > > >
> > > > Secondly, currently pulsar is using more bandwidth on brokers, than
> > > > bandwidth on bookies. It is not a balanced state. I am more leaning
> > > towards
> > > > an approach that can leverage bookies’ idle bandwidth, rather than
> > > > potentially using more bandwidth on brokers.
> > > >
> > > > Thirdly, in my view, clock skew concern is not a technical issue,
> but a
> > > > management issue. As what Ivan and you have pointed out, there are
> many
> > > > ways on addressing clock skew. However, clock skew in a brokerside
> > > approach
> > > > is easier to manage and more predictable, but clock skew in a
> > clientside
> > > > approach is much harder to manage and more unpredictable. This
> > > > unpredictability can significantly change the io or network pattern
> > when
> > > > things go bad. When such unpredictability happens, it can cause bad
> > > things
> > > > and saturating broker network in a redeliver-ish approach. If we are
> > > > building a distributed system that can handle this unpredictability,
> a
> > > > broker-side approach is much more friendly to managebility and
> incident
> > > > management.
> > > >
> > > > Lastly, i do agree client side approaches have better scalability
> than
> > > > server side approaches in most cases. However I don’t believe that it
> > is
> > > > the case here. And I don’t see anyone have a clear explanation on
> why a
> > > > broker approach is less scalable than the client side approach.
> > > >
> > > > Anyway, for managebility, bandwidth usage, client simplicity, I am
> more
> > > in
> > > > favor of a broker side approach, or at least an approach that is not
> > > > redelivery based. However since the feature is requested by Penghui
> > > > and Ezequiel,
> > > > I am also fine with this client side approach if they are okay with
> > that.
> > > >
> > > > - Sijie
> > > >
> > > >
> > > >
> > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Matteo Merli
> > > > > <ma...@gmail.com>
> > > > >
> > > > >
> > > > > On Fri, Jan 18, 2019 at 6:35 AM Ezequiel Lovelle
> > > > > <ez...@gmail.com> wrote:
> > > > > >
> > > > > > Hi All! and sorry for delay :)
> > > > > >
> > > > > > Probably I'm going to say some things already said, so sorry
> > > beforehand.
> > > > > >
> > > > > > The two main needed features I think are the proposed:
> > > > > > A. Producer delay PIP-26. B. Consumers delay PR #3155
> > > > > >
> > > > > > Of course PIP-26 would result in consumers receiving delayed
> > messages
> > > > > > but the important thing here is one of them made the decision
> about
> > > > > delay.
> > > > > >
> > > > > > First, the easy one, PR #3155. Consumers delay:
> > > > > >
> > > > > > As others have stated before, this is a more trivial approach
> > because
> > > > > > of the nature of having the exactly same period of delay for each
> > > message
> > > > > > which is predictable.
> > > > > >
> > > > > > I agree that adding logic at broker should be avoided, but, for
> > this
> > > > > > specific feature #3155 which I don't think is complex I believe
> > there
> > > > > > are others serious advantages:
> > > > > >
> > > > > >  1. Simplicity at client side, we don't need to add any code
> which
> > is
> > > > > >     less error prone.
> > > > > >  2. Clock issues from client side being outdated and causing
> > headache
> > > > > >     to users detecting this.
> > > > > >  3. Avoids huge overhead delivering non expired messages across
> the
> > > > > >     network unnecessary.
> > > > > >  4. Consumers are free to decide to consume messages with delay
> > > > > regardless
> > > > > >     of the producer.
> > > > > >  5. Delay is uniform for all messages, which sometimes is the
> > > solution
> > > > > >     to the problem rather than arbitrary delays.
> > > > > >
> > > > > > I think that would be great if pulsar can provide this kind of
> > > features
> > > > > > without relaying on users needing to know heavy details about the
> > > > > > mechanism.
> > > > > >
> > > > > > For PIP-26:
> > > > > >
> > > > > > I think we can offer this with the purpose of message's with a
> more
> > > long
> > > > > > delay in terms of time? hours / days?
> > > > > >
> > > > > > So, if this is the case, we can assume a small granularity of
> time
> > > like
> > > > > > 1 minute making ledger's representing 1 minute of time and
> > truncating
> > > > > > each time of message for it corresponding minute and storing in
> > that
> > > > > > special ledger.
> > > > > > Users wanting to receive a messages scheduled for some days in
> > future
> > > > > > rarely would care of a margin of error of 1 minute.
> > > > > >
> > > > > > Of course we need somehow make the broker aware of this in order
> to
> > > only
> > > > > > process ledger's for current corresponding minute and consume it.
> > > > > > And the broker would be the one subject to close current minute
> > > truncated
> > > > > > processed ledger.
> > > > > >
> > > > > > One problem I can think about this approach, is it painful for
> > > Bookkeeper
> > > > > > to having a lot of opened ledgers? (one for each minute per
> topic)
> > > > > >
> > > > > > Another problem here might be what happen if consumer was not
> > > started?
> > > > > > At startup time the broker should looking for potentially older
> > > ledger's
> > > > > > than its current time and this might be expensive.
> > > > > >
> > > > > > Other more trivial issue, we might need to refactor current
> > mechanism
> > > > > > which deletes closed ledgers older than the configured time on
> name
> > > > > space.
> > > > > >
> > > > > > As a final note I think that would be great to have both features
> > in
> > > > > pulsar
> > > > > > but sometimes not everything desired is achievable.
> > > > > > And please correct me if I said something senseless.
> > > > > >
> > > > > > --
> > > > > > *Ezequiel Lovelle*
> > > > > >
> > > > > >
> > > > > > On Fri, 18 Jan 2019 at 05:51, PengHui Li <
> codelipenghui@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > > So rather than specifying the absolute timestamp that the
> > message
> > > > > > > > should appear to the user, the dispatcher can specify the
> > > relative
> > > > > > > > delay after dispatch that it should appear to the user.
> > > > > > >
> > > > > > > As matteo said the worst case would be that the applied delay
> to
> > be
> > > > > higher
> > > > > > > for some of the messages, if specify the relative delay to
> > > consumer,
> > > > > > > if consumer offline for a period of time, consumer will receive
> > > many
> > > > > > > delayed messages
> > > > > > > after connect to broker again will cause the worst case more
> > > serious.
> > > > > It's
> > > > > > > difficult to keep
> > > > > > > consumers always online.
> > > > > > >
> > > > > > > In my personal perspective, i refer to use `delay level topic`
> to
> > > > > approach
> > > > > > > smaller delays scene.
> > > > > > > e.g(10s-topic, 30s-topic), this will not be too much topic. And
> > we
> > > are
> > > > > > > using dead letter topic to simulate
> > > > > > > delay message feature, delayed topics has different delay
> level.
> > > > > > >
> > > > > > > For very long delays scene, in our practice, user may cancel it
> > or
> > > > > restart
> > > > > > > it.
> > > > > > > After previous discussions, i agree that PIP-26 will make
> broker
> > > > > > > more complexity.
> > > > > > > So I had the idea to consider as a separate mechanism.
> > > > > > >
> > > > > > >
> > > > > > > Sijie Guo <gu...@gmail.com> 于2019年1月18日周五 下午3:22写道：
> > > > > > >
> > > > > > > > On Fri, Jan 18, 2019 at 2:51 PM Ivan Kelly <ivank@apache.org
> >
> > > wrote:
> > > > > > > >
> > > > > > > > > One thing missing from this discussion is details on the
> > > motivating
> > > > > > > > > use-case. How many delayed messages per second are we
> > > expecting?
> > > > > And
> > > > > > > > > what is the payload size?
> > > > > > > > >
> > > > > > > > > > If consumer control the delayed message specific
> execution
> > > time
> > > > > we
> > > > > > > must
> > > > > > > > > > trust clock of consumer, this can cause delayed message
> > > process
> > > > > ahead
> > > > > > > > of
> > > > > > > > > > time, some applications cannot tolerate this condition.
> > > > > > > > >
> > > > > > > > > This can be handled in a number of ways. Consumer clocks
> can
> > be
> > > > > skewed
> > > > > > > > > with regard to other clocks, but it is generally safe to
> > assume
> > > > > that
> > > > > > > > > clocks advance at the same rate, especially at the
> > granularity
> > > of a
> > > > > > > > > couple of hours.
> > > > > > > > > So rather than specifying the absolute timestamp that the
> > > message
> > > > > > > > > should appear to the user, the dispatcher can specify the
> > > relative
> > > > > > > > > delay after dispatch that it should appear to the user.
> > > > > > > > >
> > > > > > > > > > > My concern of this category of approaches is
> "bandwidth"
> > > > > usage. It
> > > > > > > is
> > > > > > > > > > > basically trading bandwidth for complexity.
> > > > > > > > > >
> > > > > > > > > > @Sijie Guo <si...@apache.org> Agree with you, such an
> > > trading
> > > > > can
> > > > > > > > cause
> > > > > > > > > the
> > > > > > > > > > broker's out going network to be more serious.
> > > > > > > > >
> > > > > > > > > I don't think PIP-26's approach may not use less bandwidth
> in
> > > this
> > > > > > > > > regard. With PIP-26, the msg ids are stored in a ledger,
> and
> > > when
> > > > > the
> > > > > > > > > timeout triggers it dispatches? Are all the delayed message
> > > being
> > > > > > > > > cached at the broker? If so, that is using a lot of memory,
> > and
> > > > > it's
> > > > > > > > > exactly the kind of memory usage pattern that is very bad
> for
> > > JVM
> > > > > > > > > garbage collection. If not, then you have to read the
> message
> > > back
> > > > > in
> > > > > > > > > from bookkeeper, so the bandwidth usage is the same, though
> > on
> > > a
> > > > > > > > > different path.
> > > > > > > > >
> > > > > > > > > In the client side approach, the message could be cached to
> > > avoid a
> > > > > > > > > redispatch. When I was discussing with Matteo, we discussed
> > > this.
> > > > > The
> > > > > > > > > redelivery logic has to be there in any case, as any cache
> > > (broker
> > > > > or
> > > > > > > > > client side) must have a limited size.
> > > > > > > > > Another option would be to skip sending the payload for
> > delayed
> > > > > > > > > messages, and only send it when the client request
> > redelivery,
> > > but
> > > > > > > > > this has the same issue with regard to the entry likely
> > > falling out
> > > > > > > > > the cache at the broker-side.
> > > > > > > >
> > > > > > > >
> > > > > > > > There are bandwidth usage at either approaches for sure. The
> > main
> > > > > > > > difference between broker-side and client-side approaches is
> > > which
> > > > > part
> > > > > > > of
> > > > > > > > the bandwidth is used.
> > > > > > > >
> > > > > > > > In the broker-side approach, it is using the bookies egress
> and
> > > > > broker
> > > > > > > > ingress bandwidth. In a typical pulsar deployment, bookies
> > > egress is
> > > > > > > mostly
> > > > > > > > idle unless there are consumers falling behind.
> > > > > > > >
> > > > > > > > In the client-side approach, it is using broker’s egress
> > > bandwidth
> > > > > and
> > > > > > > > potentially bookies’ egress bandwidth. Brokers’ egress is
> > > critical
> > > > > since
> > > > > > > it
> > > > > > > > is shared across consumers. So if the broker egress is
> doubled,
> > > it
> > > > > is a
> > > > > > > red
> > > > > > > > flag.
> > > > > > > >
> > > > > > > > Although I agree the bandwidth usage depends on workloads.
> But
> > in
> > > > > theory,
> > > > > > > > broker-side approach is more friendly to resource usage and a
> > > better
> > > > > > > > approach to use the resources in a multi layered
> architecture.
> > > > > Because it
> > > > > > > > uses less bandwidth at broker side. A client side can cause
> > more
> > > > > > > bandwidth
> > > > > > > > usage at broker side.
> > > > > > > >
> > > > > > > > Also as what penghui pointed out, clock screw can be another
> > > factor
> > > > > > > causing
> > > > > > > > more traffic in a fanout case. In a broker-side approach, the
> > > > > deferred is
> > > > > > > > done in a central point, so when the deferred time point
> kicks
> > > in,
> > > > > broker
> > > > > > > > just need to read the data one time from bookies. However in
> a
> > > > > > > client-side
> > > > > > > > approach, the messages are asked by different subscriptions,
> > > > > different
> > > > > > > > subscription can ask the deferred message at any time based
> on
> > > their
> > > > > > > > clocks.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > -Ivan
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
>

Re: [DISCUSSION] Delayed message delivery

Posted by Ezequiel Lovelle <ez...@gmail.com>.

Hi all,

Since this discussion have a lot of edges, I propose the following:

Maybe can we make a vote or some sort of decision about what we are going
to decide next? For example, I would left for a future discussion the topic
of arbitrary delay for each message because is a more difficult to achieve
feature, and decide if we want to include #3155 to be in pulsar, discussing
its implementations details (broker side vs client side, memory usage, etc)

What you all think about this?

--
*Ezequiel Lovelle*


On Sat, 19 Jan 2019 at 23:43, PengHui Li <co...@gmail.com> wrote:

> Hi All,
>
> Actually, I also prefer to simplify at broker side.
>
> If pulsar support set arbitrary timeout on each message, if not cluster
> failure or consumer failure,
> it needs to behave normally(on time). Otherwise, user need to understand
> how pulsar dispatching
> messages and how limit of unacked messages change the delay message
> behavior. This may
> lead users to hesitate, this feature may be misused when the user does not
> fully understand how it works.
>
> When user depends arbitrary timeout message feature, users just need to
> keep producer and consumer
> is work well, and administrator of pulsar need to keep pulsar cluster work
> well.
>
> I don't think pulsar is very necessary to support this feature(arbitrary
> timeout message),
> In most scenarios, #3155 can work well, In a few cases, even if support
> arbitrary timeout message in
> client side, i believe that still can not meet the requirement of all
> delayed messages.
>
> To me, i’m not against support arbitrary timeout on each message on client
> side, maybe this is useful
> for other users. In some of our scenarios, we also need a more functional
> alternative(a task service).
>
> Of course, If we can integrate a task service, we can use pulsar to
> guaranteed delivery of messages,
> task service guaranteed send message to pulsar success. Or pulsar broker
> support filter server.
> This way users can implement their own task services.
>
> Ezequiel Lovelle <ez...@gmail.com> 于2019年1月20日周日 上午12:28写道：
>
> > > If the goal is to minimize the amount of redeliveries from broker ->
> > client, there are multiple ways to achieve that with the client based
> > approach
> > (eg. send message id and delay time instead of the full payload to
> > consumers
> > as Ivan proposed).
> >
> > But the main reason to put this logic on client side was not adding delay
> > related logic on broker side, in order to do this optimisations the
> broker
> > must be aware of delayed message and only send message id and delay time
> > without payload.
> >
> > > I don't necessarily agree with that. NTP is widely available
> > and understood. Any application that's doing anything time-related would
> > have
> > to make sure the clocks are reasonably synced.
> >
> > Yep, that's true, but from my point of view a system that depends on
> client
> > side clock is weaker than a system that does this kind of calculation at
> > a more controlled environment aka backend. This adds one more factor that
> > depends on the user doing things right, which is not always the case.
> >
> > One possible solution might be the broker send periodically its current
> > epoch time and the client do the calculations with this data, or send
> epoch
> > time initially at subscription and do the rest of calculations doing
> delta
> > of
> > time using the initial time from broker as a base (time flows equally for
> > both
> > the important thing is which one is positioned at the very present time).
> >
> > Anyway this mentioned approach sound like an a hack just from the fact of
> > not doing the time calculations in the backend.
> >
> > > Lastly, i do agree client side approaches have better scalability than
> > server side approaches in most cases. However I don’t believe that it is
> > the case here. And I don’t see anyone have a clear explanation on why a
> > broker approach is less scalable than the client side approach.
> >
> > Yes, I agree with this. At least for fixed time delay at pr #3155.
> >
> > The only remained concern to me would be Gc usage of stored positions
> > next to be expired, anyway, since the nature of a fixed delay and
> > from the fact that process a ledger tend to be in a sequentially manner,
> > we could store a range of positions id for some delta when intensive
> > traffic is going on, I believe I did this mention on the pr.
> >
> > > Again, in general I'm more concerned of stuff that happens in broker
> > because
> > it will have to be scaled up 10s of thousands of times in a single
> > process, while in client typically the requirements are much simpler.
> >
> > I agree that adding logic to broker should be considered with deep care,
> > but in this specific scenario at worst case we will only have one and
> only
> > one scheduled task per consumer which will take all expired positions
> > from a DelayQueue.
> >
> > --
> > *Ezequiel Lovelle*
> >
> >
> > On Sat, 19 Jan 2019 at 01:02, Matteo Merli <ma...@gmail.com>
> wrote:
> >
> > > Just a quick correction:
> > >
> > > > And I don’t see anyone have a clear explanation on why a
> > > broker approach is less scalable than the client side approach.
> > >
> > > I haven't said it less or more scalable. I was meaning that it's
> > > "easier" to scale, in that we don't have to do lots of fancy stuff
> > > and add more and more control to make sure that the implementation
> > > will not become a concern point at scale (eg: limit the overall
> > > amount of memory used in broker, across all topics, and the
> > > impact on GC of these long-living objects).
> > >
> > > > However, clock skew in a brokerside approach
> > > is easier to manage and more predictable, but clock skew in a
> clientside
> > > approach is much harder to manage and more unpredictable
> > >
> > > I don't necessarily agree with that. NTP is widely available
> > > and understood.
> > > Any application that's doing anything time-related would have
> > > to make sure the clocks are reasonably synced.
> > >
> > > --
> > > Matteo Merli
> > > <ma...@gmail.com>
> > >
> > > On Fri, Jan 18, 2019 at 7:46 PM Sijie Guo <gu...@gmail.com> wrote:
> > > >
> > > > On Sat, Jan 19, 2019 at 9:45 AM Matteo Merli <matteo.merli@gmail.com
> >
> > > wrote:
> > > >
> > > > > Trying to group and compress responses here.
> > > > >
> > > > > > If consumer control the delayed message specific execution time
> we
> > > must
> > > > > trust clock of consumer, this can cause delayed message process
> ahead
> > > of
> > > > > time, some applications cannot tolerate this condition.
> > > > >
> > > > > This is a problem that cannot be solved.
> > > > > Even assuming the timestamps are assigned by brokers and are
> > guaranteed
> > > > > to be monotonic, this won't prevent 2 brokers from having clock
> > skews.
> > > > > That would results in different delivery delays.
> > > > >
> > > > > Similarly, the broker timestamp might be assigned later compared to
> > > when a
> > > > > publisher was "intending" to start the clock.
> > > > >
> > > > > Barring super-precise clock synchronization techniques (which are
> way
> > > out
> > > > > of the scope of this discussion), the only reasonable way to think
> > > about
> > > > > this is
> > > > > that delays needs to be orders of magnitudes bigger than the
> average
> > > clock
> > > > > skew experienced with common techniques (eg: NTP). NTP clock skew
> > will
> > > > > generally be in the 10s of millis. Any delay > 1 seconds will
> hardly
> > be
> > > > > noticeably affected by these skews.
> > > > >
> > > > > Additionally, any optimization on the timeouts handling (like the
> > > > > hash-wheel
> > > > > timer proposed in PIP-26) will trade off precision for efficiency.
> In
> > > that
> > > > > case,
> > > > > the delays are managed in buckets, and can result in higher delays
> > that
> > > > > what was requested.
> > > > >
> > > > > > 1. Fixed timeout, e.g.(with 10s, 30s, 10min delayed), this is the
> > > largest
> > > > > proportion in throughput of delayed message . A subscription with a
> > > fixed
> > > > > delayed time can approach to this scene.
> > > > >
> > > > > I don't think that for fixed delays, any server-side implementation
> > > > > would provide
> > > > > any advantage compared to doing:
> > > > >
> > > > > ```
> > > > > while (true) {
> > > > >     Message msg = consumer.receive();
> > > > >     long delayMillis = calculateDelay(msg)
> > > > >     if (delayMillis > 0) {
> > > > >         Thread.sleep(delayMillis);
> > > > >     }
> > > > >
> > > > >     // Do something
> > > > >     consumer.acknowledge(msg);
> > > > > }
> > > > > ```
> > > > >
> > > > > This will not need any support from broker. Also, there will be no
> > > > > redeliveries.
> > > > >
> > > > > It could be wrapped in the client API, although I don't see that as
> > > > > big of a problem.
> > > > >
> > > > > > My concern of this category of approaches is "bandwidth" usage.
> It
> > is
> > > > > basically trading bandwidth for complexity.
> > > > >
> > > > > With mixed delays on a single topic, in any case there has to be
> some
> > > kind
> > > > > of time-based sorting of the messages that needs to happen either
> at
> > > broker
> > > > > or at client.
> > > > >
> > > > > Functionally, I believe that either place is equivalent (from a
> user
> > > > > point of view),
> > > > > barring the different implementation requirements.
> > > > >
> > > > > In my view, the bigger cost here is not bandwidth but rather the
> disk
> > > > > IO, that will
> > > > > happen exactly in the same way in both cases. Messages can be
> cached,
> > > > > up to a certain point, either in broker or in client library. After
> > > > > that, in both cases,
> > > > > the messages will have to be fetched from bookies.
> > > > >
> > > > > Also, when implementing the delay feature in the client, the
> existing
> > > > > flow control
> > > > > mechanism is naturally applied to limit the overall amount of
> > > information
> > > > > that
> > > > > we have to keep track (the "currently tracked" messages). Some
> other
> > > > > mechanism
> > > > > would have to be done in the broker as well.
> > > > >
> > > > > Again, in general I'm more concerned of stuff that happens in
> broker
> > > > > because
> > > > > it will have to be scaled up 10s of thousands of times in a single
> > > > > process, while
> > > > > in client typically the requirements are much simpler.
> > > > >
> > > > > If the goal is to minimize the amount of redeliveries from broker
> ->
> > > > > client, there
> > > > > are multiple ways to achieve that with the client based approach
> (eg.
> > > send
> > > > > message id and delay time instead of the full payload to consumers
> as
> > > Ivan
> > > > > proposed).
> > > > >
> > > > > This seems to be simpler and with less overhead than having to
> > persist
> > > > > the whole
> > > > > hashweel timer state into a ledger.
> > > >
> > > >
> > > > I agree with that there are many optimizations can be applied at a
> > client
> > > > side approach. In a stable world, these approaches are technically
> > > > equivalent.
> > > >
> > > > However I do not agree with a few points:
> > > >
> > > > First, based on my past production experiences, network bandwidth on
> > > broker
> > > > is the bigger cost than io cost in a multi subscription case. Also, I
> > > have
> > > > heard a few production users have experienced latency issues where
> > broker
> > > > network bandwidth is saturated. So any mechanisms that rely on
> > > redeliveries
> > > > are a big red flag to me.
> > > >
> > > > Secondly, currently pulsar is using more bandwidth on brokers, than
> > > > bandwidth on bookies. It is not a balanced state. I am more leaning
> > > towards
> > > > an approach that can leverage bookies’ idle bandwidth, rather than
> > > > potentially using more bandwidth on brokers.
> > > >
> > > > Thirdly, in my view, clock skew concern is not a technical issue,
> but a
> > > > management issue. As what Ivan and you have pointed out, there are
> many
> > > > ways on addressing clock skew. However, clock skew in a brokerside
> > > approach
> > > > is easier to manage and more predictable, but clock skew in a
> > clientside
> > > > approach is much harder to manage and more unpredictable. This
> > > > unpredictability can significantly change the io or network pattern
> > when
> > > > things go bad. When such unpredictability happens, it can cause bad
> > > things
> > > > and saturating broker network in a redeliver-ish approach. If we are
> > > > building a distributed system that can handle this unpredictability,
> a
> > > > broker-side approach is much more friendly to managebility and
> incident
> > > > management.
> > > >
> > > > Lastly, i do agree client side approaches have better scalability
> than
> > > > server side approaches in most cases. However I don’t believe that it
> > is
> > > > the case here. And I don’t see anyone have a clear explanation on
> why a
> > > > broker approach is less scalable than the client side approach.
> > > >
> > > > Anyway, for managebility, bandwidth usage, client simplicity, I am
> more
> > > in
> > > > favor of a broker side approach, or at least an approach that is not
> > > > redelivery based. However since the feature is requested by Penghui
> > > > and Ezequiel,
> > > > I am also fine with this client side approach if they are okay with
> > that.
> > > >
> > > > - Sijie
> > > >
> > > >
> > > >
> > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Matteo Merli
> > > > > <ma...@gmail.com>
> > > > >
> > > > >
> > > > > On Fri, Jan 18, 2019 at 6:35 AM Ezequiel Lovelle
> > > > > <ez...@gmail.com> wrote:
> > > > > >
> > > > > > Hi All! and sorry for delay :)
> > > > > >
> > > > > > Probably I'm going to say some things already said, so sorry
> > > beforehand.
> > > > > >
> > > > > > The two main needed features I think are the proposed:
> > > > > > A. Producer delay PIP-26. B. Consumers delay PR #3155
> > > > > >
> > > > > > Of course PIP-26 would result in consumers receiving delayed
> > messages
> > > > > > but the important thing here is one of them made the decision
> about
> > > > > delay.
> > > > > >
> > > > > > First, the easy one, PR #3155. Consumers delay:
> > > > > >
> > > > > > As others have stated before, this is a more trivial approach
> > because
> > > > > > of the nature of having the exactly same period of delay for each
> > > message
> > > > > > which is predictable.
> > > > > >
> > > > > > I agree that adding logic at broker should be avoided, but, for
> > this
> > > > > > specific feature #3155 which I don't think is complex I believe
> > there
> > > > > > are others serious advantages:
> > > > > >
> > > > > >  1. Simplicity at client side, we don't need to add any code
> which
> > is
> > > > > >     less error prone.
> > > > > >  2. Clock issues from client side being outdated and causing
> > headache
> > > > > >     to users detecting this.
> > > > > >  3. Avoids huge overhead delivering non expired messages across
> the
> > > > > >     network unnecessary.
> > > > > >  4. Consumers are free to decide to consume messages with delay
> > > > > regardless
> > > > > >     of the producer.
> > > > > >  5. Delay is uniform for all messages, which sometimes is the
> > > solution
> > > > > >     to the problem rather than arbitrary delays.
> > > > > >
> > > > > > I think that would be great if pulsar can provide this kind of
> > > features
> > > > > > without relaying on users needing to know heavy details about the
> > > > > > mechanism.
> > > > > >
> > > > > > For PIP-26:
> > > > > >
> > > > > > I think we can offer this with the purpose of message's with a
> more
> > > long
> > > > > > delay in terms of time? hours / days?
> > > > > >
> > > > > > So, if this is the case, we can assume a small granularity of
> time
> > > like
> > > > > > 1 minute making ledger's representing 1 minute of time and
> > truncating
> > > > > > each time of message for it corresponding minute and storing in
> > that
> > > > > > special ledger.
> > > > > > Users wanting to receive a messages scheduled for some days in
> > future
> > > > > > rarely would care of a margin of error of 1 minute.
> > > > > >
> > > > > > Of course we need somehow make the broker aware of this in order
> to
> > > only
> > > > > > process ledger's for current corresponding minute and consume it.
> > > > > > And the broker would be the one subject to close current minute
> > > truncated
> > > > > > processed ledger.
> > > > > >
> > > > > > One problem I can think about this approach, is it painful for
> > > Bookkeeper
> > > > > > to having a lot of opened ledgers? (one for each minute per
> topic)
> > > > > >
> > > > > > Another problem here might be what happen if consumer was not
> > > started?
> > > > > > At startup time the broker should looking for potentially older
> > > ledger's
> > > > > > than its current time and this might be expensive.
> > > > > >
> > > > > > Other more trivial issue, we might need to refactor current
> > mechanism
> > > > > > which deletes closed ledgers older than the configured time on
> name
> > > > > space.
> > > > > >
> > > > > > As a final note I think that would be great to have both features
> > in
> > > > > pulsar
> > > > > > but sometimes not everything desired is achievable.
> > > > > > And please correct me if I said something senseless.
> > > > > >
> > > > > > --
> > > > > > *Ezequiel Lovelle*
> > > > > >
> > > > > >
> > > > > > On Fri, 18 Jan 2019 at 05:51, PengHui Li <
> codelipenghui@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > > So rather than specifying the absolute timestamp that the
> > message
> > > > > > > > should appear to the user, the dispatcher can specify the
> > > relative
> > > > > > > > delay after dispatch that it should appear to the user.
> > > > > > >
> > > > > > > As matteo said the worst case would be that the applied delay
> to
> > be
> > > > > higher
> > > > > > > for some of the messages, if specify the relative delay to
> > > consumer,
> > > > > > > if consumer offline for a period of time, consumer will receive
> > > many
> > > > > > > delayed messages
> > > > > > > after connect to broker again will cause the worst case more
> > > serious.
> > > > > It's
> > > > > > > difficult to keep
> > > > > > > consumers always online.
> > > > > > >
> > > > > > > In my personal perspective, i refer to use `delay level topic`
> to
> > > > > approach
> > > > > > > smaller delays scene.
> > > > > > > e.g(10s-topic, 30s-topic), this will not be too much topic. And
> > we
> > > are
> > > > > > > using dead letter topic to simulate
> > > > > > > delay message feature, delayed topics has different delay
> level.
> > > > > > >
> > > > > > > For very long delays scene, in our practice, user may cancel it
> > or
> > > > > restart
> > > > > > > it.
> > > > > > > After previous discussions, i agree that PIP-26 will make
> broker
> > > > > > > more complexity.
> > > > > > > So I had the idea to consider as a separate mechanism.
> > > > > > >
> > > > > > >
> > > > > > > Sijie Guo <gu...@gmail.com> 于2019年1月18日周五 下午3:22写道：
> > > > > > >
> > > > > > > > On Fri, Jan 18, 2019 at 2:51 PM Ivan Kelly <ivank@apache.org
> >
> > > wrote:
> > > > > > > >
> > > > > > > > > One thing missing from this discussion is details on the
> > > motivating
> > > > > > > > > use-case. How many delayed messages per second are we
> > > expecting?
> > > > > And
> > > > > > > > > what is the payload size?
> > > > > > > > >
> > > > > > > > > > If consumer control the delayed message specific
> execution
> > > time
> > > > > we
> > > > > > > must
> > > > > > > > > > trust clock of consumer, this can cause delayed message
> > > process
> > > > > ahead
> > > > > > > > of
> > > > > > > > > > time, some applications cannot tolerate this condition.
> > > > > > > > >
> > > > > > > > > This can be handled in a number of ways. Consumer clocks
> can
> > be
> > > > > skewed
> > > > > > > > > with regard to other clocks, but it is generally safe to
> > assume
> > > > > that
> > > > > > > > > clocks advance at the same rate, especially at the
> > granularity
> > > of a
> > > > > > > > > couple of hours.
> > > > > > > > > So rather than specifying the absolute timestamp that the
> > > message
> > > > > > > > > should appear to the user, the dispatcher can specify the
> > > relative
> > > > > > > > > delay after dispatch that it should appear to the user.
> > > > > > > > >
> > > > > > > > > > > My concern of this category of approaches is
> "bandwidth"
> > > > > usage. It
> > > > > > > is
> > > > > > > > > > > basically trading bandwidth for complexity.
> > > > > > > > > >
> > > > > > > > > > @Sijie Guo <si...@apache.org> Agree with you, such an
> > > trading
> > > > > can
> > > > > > > > cause
> > > > > > > > > the
> > > > > > > > > > broker's out going network to be more serious.
> > > > > > > > >
> > > > > > > > > I don't think PIP-26's approach may not use less bandwidth
> in
> > > this
> > > > > > > > > regard. With PIP-26, the msg ids are stored in a ledger,
> and
> > > when
> > > > > the
> > > > > > > > > timeout triggers it dispatches? Are all the delayed message
> > > being
> > > > > > > > > cached at the broker? If so, that is using a lot of memory,
> > and
> > > > > it's
> > > > > > > > > exactly the kind of memory usage pattern that is very bad
> for
> > > JVM
> > > > > > > > > garbage collection. If not, then you have to read the
> message
> > > back
> > > > > in
> > > > > > > > > from bookkeeper, so the bandwidth usage is the same, though
> > on
> > > a
> > > > > > > > > different path.
> > > > > > > > >
> > > > > > > > > In the client side approach, the message could be cached to
> > > avoid a
> > > > > > > > > redispatch. When I was discussing with Matteo, we discussed
> > > this.
> > > > > The
> > > > > > > > > redelivery logic has to be there in any case, as any cache
> > > (broker
> > > > > or
> > > > > > > > > client side) must have a limited size.
> > > > > > > > > Another option would be to skip sending the payload for
> > delayed
> > > > > > > > > messages, and only send it when the client request
> > redelivery,
> > > but
> > > > > > > > > this has the same issue with regard to the entry likely
> > > falling out
> > > > > > > > > the cache at the broker-side.
> > > > > > > >
> > > > > > > >
> > > > > > > > There are bandwidth usage at either approaches for sure. The
> > main
> > > > > > > > difference between broker-side and client-side approaches is
> > > which
> > > > > part
> > > > > > > of
> > > > > > > > the bandwidth is used.
> > > > > > > >
> > > > > > > > In the broker-side approach, it is using the bookies egress
> and
> > > > > broker
> > > > > > > > ingress bandwidth. In a typical pulsar deployment, bookies
> > > egress is
> > > > > > > mostly
> > > > > > > > idle unless there are consumers falling behind.
> > > > > > > >
> > > > > > > > In the client-side approach, it is using broker’s egress
> > > bandwidth
> > > > > and
> > > > > > > > potentially bookies’ egress bandwidth. Brokers’ egress is
> > > critical
> > > > > since
> > > > > > > it
> > > > > > > > is shared across consumers. So if the broker egress is
> doubled,
> > > it
> > > > > is a
> > > > > > > red
> > > > > > > > flag.
> > > > > > > >
> > > > > > > > Although I agree the bandwidth usage depends on workloads.
> But
> > in
> > > > > theory,
> > > > > > > > broker-side approach is more friendly to resource usage and a
> > > better
> > > > > > > > approach to use the resources in a multi layered
> architecture.
> > > > > Because it
> > > > > > > > uses less bandwidth at broker side. A client side can cause
> > more
> > > > > > > bandwidth
> > > > > > > > usage at broker side.
> > > > > > > >
> > > > > > > > Also as what penghui pointed out, clock screw can be another
> > > factor
> > > > > > > causing
> > > > > > > > more traffic in a fanout case. In a broker-side approach, the
> > > > > deferred is
> > > > > > > > done in a central point, so when the deferred time point
> kicks
> > > in,
> > > > > broker
> > > > > > > > just need to read the data one time from bookies. However in
> a
> > > > > > > client-side
> > > > > > > > approach, the messages are asked by different subscriptions,
> > > > > different
> > > > > > > > subscription can ask the deferred message at any time based
> on
> > > their
> > > > > > > > clocks.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > -Ivan
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
>

Re: [DISCUSSION] Delayed message delivery

Posted by PengHui Li <co...@gmail.com>.

Hi All,

Actually, I also prefer to simplify at broker side.

If pulsar support set arbitrary timeout on each message, if not cluster
failure or consumer failure,
it needs to behave normally(on time). Otherwise, user need to understand
how pulsar dispatching
messages and how limit of unacked messages change the delay message
behavior. This may
lead users to hesitate, this feature may be misused when the user does not
fully understand how it works.

When user depends arbitrary timeout message feature, users just need to
keep producer and consumer
is work well, and administrator of pulsar need to keep pulsar cluster work
well.

I don't think pulsar is very necessary to support this feature(arbitrary
timeout message),
In most scenarios, #3155 can work well, In a few cases, even if support
arbitrary timeout message in
client side, i believe that still can not meet the requirement of all
delayed messages.

To me, i’m not against support arbitrary timeout on each message on client
side, maybe this is useful
for other users. In some of our scenarios, we also need a more functional
alternative(a task service).

Of course, If we can integrate a task service, we can use pulsar to
guaranteed delivery of messages,
task service guaranteed send message to pulsar success. Or pulsar broker
support filter server.
This way users can implement their own task services.

Ezequiel Lovelle <ez...@gmail.com> 于2019年1月20日周日 上午12:28写道：

> > If the goal is to minimize the amount of redeliveries from broker ->
> client, there are multiple ways to achieve that with the client based
> approach
> (eg. send message id and delay time instead of the full payload to
> consumers
> as Ivan proposed).
>
> But the main reason to put this logic on client side was not adding delay
> related logic on broker side, in order to do this optimisations the broker
> must be aware of delayed message and only send message id and delay time
> without payload.
>
> > I don't necessarily agree with that. NTP is widely available
> and understood. Any application that's doing anything time-related would
> have
> to make sure the clocks are reasonably synced.
>
> Yep, that's true, but from my point of view a system that depends on client
> side clock is weaker than a system that does this kind of calculation at
> a more controlled environment aka backend. This adds one more factor that
> depends on the user doing things right, which is not always the case.
>
> One possible solution might be the broker send periodically its current
> epoch time and the client do the calculations with this data, or send epoch
> time initially at subscription and do the rest of calculations doing delta
> of
> time using the initial time from broker as a base (time flows equally for
> both
> the important thing is which one is positioned at the very present time).
>
> Anyway this mentioned approach sound like an a hack just from the fact of
> not doing the time calculations in the backend.
>
> > Lastly, i do agree client side approaches have better scalability than
> server side approaches in most cases. However I don’t believe that it is
> the case here. And I don’t see anyone have a clear explanation on why a
> broker approach is less scalable than the client side approach.
>
> Yes, I agree with this. At least for fixed time delay at pr #3155.
>
> The only remained concern to me would be Gc usage of stored positions
> next to be expired, anyway, since the nature of a fixed delay and
> from the fact that process a ledger tend to be in a sequentially manner,
> we could store a range of positions id for some delta when intensive
> traffic is going on, I believe I did this mention on the pr.
>
> > Again, in general I'm more concerned of stuff that happens in broker
> because
> it will have to be scaled up 10s of thousands of times in a single
> process, while in client typically the requirements are much simpler.
>
> I agree that adding logic to broker should be considered with deep care,
> but in this specific scenario at worst case we will only have one and only
> one scheduled task per consumer which will take all expired positions
> from a DelayQueue.
>
> --
> *Ezequiel Lovelle*
>
>
> On Sat, 19 Jan 2019 at 01:02, Matteo Merli <ma...@gmail.com> wrote:
>
> > Just a quick correction:
> >
> > > And I don’t see anyone have a clear explanation on why a
> > broker approach is less scalable than the client side approach.
> >
> > I haven't said it less or more scalable. I was meaning that it's
> > "easier" to scale, in that we don't have to do lots of fancy stuff
> > and add more and more control to make sure that the implementation
> > will not become a concern point at scale (eg: limit the overall
> > amount of memory used in broker, across all topics, and the
> > impact on GC of these long-living objects).
> >
> > > However, clock skew in a brokerside approach
> > is easier to manage and more predictable, but clock skew in a clientside
> > approach is much harder to manage and more unpredictable
> >
> > I don't necessarily agree with that. NTP is widely available
> > and understood.
> > Any application that's doing anything time-related would have
> > to make sure the clocks are reasonably synced.
> >
> > --
> > Matteo Merli
> > <ma...@gmail.com>
> >
> > On Fri, Jan 18, 2019 at 7:46 PM Sijie Guo <gu...@gmail.com> wrote:
> > >
> > > On Sat, Jan 19, 2019 at 9:45 AM Matteo Merli <ma...@gmail.com>
> > wrote:
> > >
> > > > Trying to group and compress responses here.
> > > >
> > > > > If consumer control the delayed message specific execution time we
> > must
> > > > trust clock of consumer, this can cause delayed message process ahead
> > of
> > > > time, some applications cannot tolerate this condition.
> > > >
> > > > This is a problem that cannot be solved.
> > > > Even assuming the timestamps are assigned by brokers and are
> guaranteed
> > > > to be monotonic, this won't prevent 2 brokers from having clock
> skews.
> > > > That would results in different delivery delays.
> > > >
> > > > Similarly, the broker timestamp might be assigned later compared to
> > when a
> > > > publisher was "intending" to start the clock.
> > > >
> > > > Barring super-precise clock synchronization techniques (which are way
> > out
> > > > of the scope of this discussion), the only reasonable way to think
> > about
> > > > this is
> > > > that delays needs to be orders of magnitudes bigger than the average
> > clock
> > > > skew experienced with common techniques (eg: NTP). NTP clock skew
> will
> > > > generally be in the 10s of millis. Any delay > 1 seconds will hardly
> be
> > > > noticeably affected by these skews.
> > > >
> > > > Additionally, any optimization on the timeouts handling (like the
> > > > hash-wheel
> > > > timer proposed in PIP-26) will trade off precision for efficiency. In
> > that
> > > > case,
> > > > the delays are managed in buckets, and can result in higher delays
> that
> > > > what was requested.
> > > >
> > > > > 1. Fixed timeout, e.g.(with 10s, 30s, 10min delayed), this is the
> > largest
> > > > proportion in throughput of delayed message . A subscription with a
> > fixed
> > > > delayed time can approach to this scene.
> > > >
> > > > I don't think that for fixed delays, any server-side implementation
> > > > would provide
> > > > any advantage compared to doing:
> > > >
> > > > ```
> > > > while (true) {
> > > >     Message msg = consumer.receive();
> > > >     long delayMillis = calculateDelay(msg)
> > > >     if (delayMillis > 0) {
> > > >         Thread.sleep(delayMillis);
> > > >     }
> > > >
> > > >     // Do something
> > > >     consumer.acknowledge(msg);
> > > > }
> > > > ```
> > > >
> > > > This will not need any support from broker. Also, there will be no
> > > > redeliveries.
> > > >
> > > > It could be wrapped in the client API, although I don't see that as
> > > > big of a problem.
> > > >
> > > > > My concern of this category of approaches is "bandwidth" usage. It
> is
> > > > basically trading bandwidth for complexity.
> > > >
> > > > With mixed delays on a single topic, in any case there has to be some
> > kind
> > > > of time-based sorting of the messages that needs to happen either at
> > broker
> > > > or at client.
> > > >
> > > > Functionally, I believe that either place is equivalent (from a user
> > > > point of view),
> > > > barring the different implementation requirements.
> > > >
> > > > In my view, the bigger cost here is not bandwidth but rather the disk
> > > > IO, that will
> > > > happen exactly in the same way in both cases. Messages can be cached,
> > > > up to a certain point, either in broker or in client library. After
> > > > that, in both cases,
> > > > the messages will have to be fetched from bookies.
> > > >
> > > > Also, when implementing the delay feature in the client, the existing
> > > > flow control
> > > > mechanism is naturally applied to limit the overall amount of
> > information
> > > > that
> > > > we have to keep track (the "currently tracked" messages). Some other
> > > > mechanism
> > > > would have to be done in the broker as well.
> > > >
> > > > Again, in general I'm more concerned of stuff that happens in broker
> > > > because
> > > > it will have to be scaled up 10s of thousands of times in a single
> > > > process, while
> > > > in client typically the requirements are much simpler.
> > > >
> > > > If the goal is to minimize the amount of redeliveries from broker ->
> > > > client, there
> > > > are multiple ways to achieve that with the client based approach (eg.
> > send
> > > > message id and delay time instead of the full payload to consumers as
> > Ivan
> > > > proposed).
> > > >
> > > > This seems to be simpler and with less overhead than having to
> persist
> > > > the whole
> > > > hashweel timer state into a ledger.
> > >
> > >
> > > I agree with that there are many optimizations can be applied at a
> client
> > > side approach. In a stable world, these approaches are technically
> > > equivalent.
> > >
> > > However I do not agree with a few points:
> > >
> > > First, based on my past production experiences, network bandwidth on
> > broker
> > > is the bigger cost than io cost in a multi subscription case. Also, I
> > have
> > > heard a few production users have experienced latency issues where
> broker
> > > network bandwidth is saturated. So any mechanisms that rely on
> > redeliveries
> > > are a big red flag to me.
> > >
> > > Secondly, currently pulsar is using more bandwidth on brokers, than
> > > bandwidth on bookies. It is not a balanced state. I am more leaning
> > towards
> > > an approach that can leverage bookies’ idle bandwidth, rather than
> > > potentially using more bandwidth on brokers.
> > >
> > > Thirdly, in my view, clock skew concern is not a technical issue, but a
> > > management issue. As what Ivan and you have pointed out, there are many
> > > ways on addressing clock skew. However, clock skew in a brokerside
> > approach
> > > is easier to manage and more predictable, but clock skew in a
> clientside
> > > approach is much harder to manage and more unpredictable. This
> > > unpredictability can significantly change the io or network pattern
> when
> > > things go bad. When such unpredictability happens, it can cause bad
> > things
> > > and saturating broker network in a redeliver-ish approach. If we are
> > > building a distributed system that can handle this unpredictability, a
> > > broker-side approach is much more friendly to managebility and incident
> > > management.
> > >
> > > Lastly, i do agree client side approaches have better scalability than
> > > server side approaches in most cases. However I don’t believe that it
> is
> > > the case here. And I don’t see anyone have a clear explanation on why a
> > > broker approach is less scalable than the client side approach.
> > >
> > > Anyway, for managebility, bandwidth usage, client simplicity, I am more
> > in
> > > favor of a broker side approach, or at least an approach that is not
> > > redelivery based. However since the feature is requested by Penghui
> > > and Ezequiel,
> > > I am also fine with this client side approach if they are okay with
> that.
> > >
> > > - Sijie
> > >
> > >
> > >
> > >
> > > >
> > > >
> > > >
> > > > --
> > > > Matteo Merli
> > > > <ma...@gmail.com>
> > > >
> > > >
> > > > On Fri, Jan 18, 2019 at 6:35 AM Ezequiel Lovelle
> > > > <ez...@gmail.com> wrote:
> > > > >
> > > > > Hi All! and sorry for delay :)
> > > > >
> > > > > Probably I'm going to say some things already said, so sorry
> > beforehand.
> > > > >
> > > > > The two main needed features I think are the proposed:
> > > > > A. Producer delay PIP-26. B. Consumers delay PR #3155
> > > > >
> > > > > Of course PIP-26 would result in consumers receiving delayed
> messages
> > > > > but the important thing here is one of them made the decision about
> > > > delay.
> > > > >
> > > > > First, the easy one, PR #3155. Consumers delay:
> > > > >
> > > > > As others have stated before, this is a more trivial approach
> because
> > > > > of the nature of having the exactly same period of delay for each
> > message
> > > > > which is predictable.
> > > > >
> > > > > I agree that adding logic at broker should be avoided, but, for
> this
> > > > > specific feature #3155 which I don't think is complex I believe
> there
> > > > > are others serious advantages:
> > > > >
> > > > >  1. Simplicity at client side, we don't need to add any code which
> is
> > > > >     less error prone.
> > > > >  2. Clock issues from client side being outdated and causing
> headache
> > > > >     to users detecting this.
> > > > >  3. Avoids huge overhead delivering non expired messages across the
> > > > >     network unnecessary.
> > > > >  4. Consumers are free to decide to consume messages with delay
> > > > regardless
> > > > >     of the producer.
> > > > >  5. Delay is uniform for all messages, which sometimes is the
> > solution
> > > > >     to the problem rather than arbitrary delays.
> > > > >
> > > > > I think that would be great if pulsar can provide this kind of
> > features
> > > > > without relaying on users needing to know heavy details about the
> > > > > mechanism.
> > > > >
> > > > > For PIP-26:
> > > > >
> > > > > I think we can offer this with the purpose of message's with a more
> > long
> > > > > delay in terms of time? hours / days?
> > > > >
> > > > > So, if this is the case, we can assume a small granularity of time
> > like
> > > > > 1 minute making ledger's representing 1 minute of time and
> truncating
> > > > > each time of message for it corresponding minute and storing in
> that
> > > > > special ledger.
> > > > > Users wanting to receive a messages scheduled for some days in
> future
> > > > > rarely would care of a margin of error of 1 minute.
> > > > >
> > > > > Of course we need somehow make the broker aware of this in order to
> > only
> > > > > process ledger's for current corresponding minute and consume it.
> > > > > And the broker would be the one subject to close current minute
> > truncated
> > > > > processed ledger.
> > > > >
> > > > > One problem I can think about this approach, is it painful for
> > Bookkeeper
> > > > > to having a lot of opened ledgers? (one for each minute per topic)
> > > > >
> > > > > Another problem here might be what happen if consumer was not
> > started?
> > > > > At startup time the broker should looking for potentially older
> > ledger's
> > > > > than its current time and this might be expensive.
> > > > >
> > > > > Other more trivial issue, we might need to refactor current
> mechanism
> > > > > which deletes closed ledgers older than the configured time on name
> > > > space.
> > > > >
> > > > > As a final note I think that would be great to have both features
> in
> > > > pulsar
> > > > > but sometimes not everything desired is achievable.
> > > > > And please correct me if I said something senseless.
> > > > >
> > > > > --
> > > > > *Ezequiel Lovelle*
> > > > >
> > > > >
> > > > > On Fri, 18 Jan 2019 at 05:51, PengHui Li <co...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > > So rather than specifying the absolute timestamp that the
> message
> > > > > > > should appear to the user, the dispatcher can specify the
> > relative
> > > > > > > delay after dispatch that it should appear to the user.
> > > > > >
> > > > > > As matteo said the worst case would be that the applied delay to
> be
> > > > higher
> > > > > > for some of the messages, if specify the relative delay to
> > consumer,
> > > > > > if consumer offline for a period of time, consumer will receive
> > many
> > > > > > delayed messages
> > > > > > after connect to broker again will cause the worst case more
> > serious.
> > > > It's
> > > > > > difficult to keep
> > > > > > consumers always online.
> > > > > >
> > > > > > In my personal perspective, i refer to use `delay level topic` to
> > > > approach
> > > > > > smaller delays scene.
> > > > > > e.g(10s-topic, 30s-topic), this will not be too much topic. And
> we
> > are
> > > > > > using dead letter topic to simulate
> > > > > > delay message feature, delayed topics has different delay level.
> > > > > >
> > > > > > For very long delays scene, in our practice, user may cancel it
> or
> > > > restart
> > > > > > it.
> > > > > > After previous discussions, i agree that PIP-26 will make broker
> > > > > > more complexity.
> > > > > > So I had the idea to consider as a separate mechanism.
> > > > > >
> > > > > >
> > > > > > Sijie Guo <gu...@gmail.com> 于2019年1月18日周五 下午3:22写道：
> > > > > >
> > > > > > > On Fri, Jan 18, 2019 at 2:51 PM Ivan Kelly <iv...@apache.org>
> > wrote:
> > > > > > >
> > > > > > > > One thing missing from this discussion is details on the
> > motivating
> > > > > > > > use-case. How many delayed messages per second are we
> > expecting?
> > > > And
> > > > > > > > what is the payload size?
> > > > > > > >
> > > > > > > > > If consumer control the delayed message specific execution
> > time
> > > > we
> > > > > > must
> > > > > > > > > trust clock of consumer, this can cause delayed message
> > process
> > > > ahead
> > > > > > > of
> > > > > > > > > time, some applications cannot tolerate this condition.
> > > > > > > >
> > > > > > > > This can be handled in a number of ways. Consumer clocks can
> be
> > > > skewed
> > > > > > > > with regard to other clocks, but it is generally safe to
> assume
> > > > that
> > > > > > > > clocks advance at the same rate, especially at the
> granularity
> > of a
> > > > > > > > couple of hours.
> > > > > > > > So rather than specifying the absolute timestamp that the
> > message
> > > > > > > > should appear to the user, the dispatcher can specify the
> > relative
> > > > > > > > delay after dispatch that it should appear to the user.
> > > > > > > >
> > > > > > > > > > My concern of this category of approaches is "bandwidth"
> > > > usage. It
> > > > > > is
> > > > > > > > > > basically trading bandwidth for complexity.
> > > > > > > > >
> > > > > > > > > @Sijie Guo <si...@apache.org> Agree with you, such an
> > trading
> > > > can
> > > > > > > cause
> > > > > > > > the
> > > > > > > > > broker's out going network to be more serious.
> > > > > > > >
> > > > > > > > I don't think PIP-26's approach may not use less bandwidth in
> > this
> > > > > > > > regard. With PIP-26, the msg ids are stored in a ledger, and
> > when
> > > > the
> > > > > > > > timeout triggers it dispatches? Are all the delayed message
> > being
> > > > > > > > cached at the broker? If so, that is using a lot of memory,
> and
> > > > it's
> > > > > > > > exactly the kind of memory usage pattern that is very bad for
> > JVM
> > > > > > > > garbage collection. If not, then you have to read the message
> > back
> > > > in
> > > > > > > > from bookkeeper, so the bandwidth usage is the same, though
> on
> > a
> > > > > > > > different path.
> > > > > > > >
> > > > > > > > In the client side approach, the message could be cached to
> > avoid a
> > > > > > > > redispatch. When I was discussing with Matteo, we discussed
> > this.
> > > > The
> > > > > > > > redelivery logic has to be there in any case, as any cache
> > (broker
> > > > or
> > > > > > > > client side) must have a limited size.
> > > > > > > > Another option would be to skip sending the payload for
> delayed
> > > > > > > > messages, and only send it when the client request
> redelivery,
> > but
> > > > > > > > this has the same issue with regard to the entry likely
> > falling out
> > > > > > > > the cache at the broker-side.
> > > > > > >
> > > > > > >
> > > > > > > There are bandwidth usage at either approaches for sure. The
> main
> > > > > > > difference between broker-side and client-side approaches is
> > which
> > > > part
> > > > > > of
> > > > > > > the bandwidth is used.
> > > > > > >
> > > > > > > In the broker-side approach, it is using the bookies egress and
> > > > broker
> > > > > > > ingress bandwidth. In a typical pulsar deployment, bookies
> > egress is
> > > > > > mostly
> > > > > > > idle unless there are consumers falling behind.
> > > > > > >
> > > > > > > In the client-side approach, it is using broker’s egress
> > bandwidth
> > > > and
> > > > > > > potentially bookies’ egress bandwidth. Brokers’ egress is
> > critical
> > > > since
> > > > > > it
> > > > > > > is shared across consumers. So if the broker egress is doubled,
> > it
> > > > is a
> > > > > > red
> > > > > > > flag.
> > > > > > >
> > > > > > > Although I agree the bandwidth usage depends on workloads. But
> in
> > > > theory,
> > > > > > > broker-side approach is more friendly to resource usage and a
> > better
> > > > > > > approach to use the resources in a multi layered architecture.
> > > > Because it
> > > > > > > uses less bandwidth at broker side. A client side can cause
> more
> > > > > > bandwidth
> > > > > > > usage at broker side.
> > > > > > >
> > > > > > > Also as what penghui pointed out, clock screw can be another
> > factor
> > > > > > causing
> > > > > > > more traffic in a fanout case. In a broker-side approach, the
> > > > deferred is
> > > > > > > done in a central point, so when the deferred time point kicks
> > in,
> > > > broker
> > > > > > > just need to read the data one time from bookies. However in a
> > > > > > client-side
> > > > > > > approach, the messages are asked by different subscriptions,
> > > > different
> > > > > > > subscription can ask the deferred message at any time based on
> > their
> > > > > > > clocks.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > -Ivan
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> >
>

Re: [DISCUSSION] Delayed message delivery

Posted by Ezequiel Lovelle <ez...@gmail.com>.

> If the goal is to minimize the amount of redeliveries from broker ->
client, there are multiple ways to achieve that with the client based
approach
(eg. send message id and delay time instead of the full payload to
consumers
as Ivan proposed).

But the main reason to put this logic on client side was not adding delay
related logic on broker side, in order to do this optimisations the broker
must be aware of delayed message and only send message id and delay time
without payload.

> I don't necessarily agree with that. NTP is widely available
and understood. Any application that's doing anything time-related would
have
to make sure the clocks are reasonably synced.

Yep, that's true, but from my point of view a system that depends on client
side clock is weaker than a system that does this kind of calculation at
a more controlled environment aka backend. This adds one more factor that
depends on the user doing things right, which is not always the case.

One possible solution might be the broker send periodically its current
epoch time and the client do the calculations with this data, or send epoch
time initially at subscription and do the rest of calculations doing delta
of
time using the initial time from broker as a base (time flows equally for
both
the important thing is which one is positioned at the very present time).

Anyway this mentioned approach sound like an a hack just from the fact of
not doing the time calculations in the backend.

> Lastly, i do agree client side approaches have better scalability than
server side approaches in most cases. However I don’t believe that it is
the case here. And I don’t see anyone have a clear explanation on why a
broker approach is less scalable than the client side approach.

Yes, I agree with this. At least for fixed time delay at pr #3155.

The only remained concern to me would be Gc usage of stored positions
next to be expired, anyway, since the nature of a fixed delay and
from the fact that process a ledger tend to be in a sequentially manner,
we could store a range of positions id for some delta when intensive
traffic is going on, I believe I did this mention on the pr.

> Again, in general I'm more concerned of stuff that happens in broker
because
it will have to be scaled up 10s of thousands of times in a single
process, while in client typically the requirements are much simpler.

I agree that adding logic to broker should be considered with deep care,
but in this specific scenario at worst case we will only have one and only
one scheduled task per consumer which will take all expired positions
from a DelayQueue.

--
*Ezequiel Lovelle*


On Sat, 19 Jan 2019 at 01:02, Matteo Merli <ma...@gmail.com> wrote:

> Just a quick correction:
>
> > And I don’t see anyone have a clear explanation on why a
> broker approach is less scalable than the client side approach.
>
> I haven't said it less or more scalable. I was meaning that it's
> "easier" to scale, in that we don't have to do lots of fancy stuff
> and add more and more control to make sure that the implementation
> will not become a concern point at scale (eg: limit the overall
> amount of memory used in broker, across all topics, and the
> impact on GC of these long-living objects).
>
> > However, clock skew in a brokerside approach
> is easier to manage and more predictable, but clock skew in a clientside
> approach is much harder to manage and more unpredictable
>
> I don't necessarily agree with that. NTP is widely available
> and understood.
> Any application that's doing anything time-related would have
> to make sure the clocks are reasonably synced.
>
> --
> Matteo Merli
> <ma...@gmail.com>
>
> On Fri, Jan 18, 2019 at 7:46 PM Sijie Guo <gu...@gmail.com> wrote:
> >
> > On Sat, Jan 19, 2019 at 9:45 AM Matteo Merli <ma...@gmail.com>
> wrote:
> >
> > > Trying to group and compress responses here.
> > >
> > > > If consumer control the delayed message specific execution time we
> must
> > > trust clock of consumer, this can cause delayed message process ahead
> of
> > > time, some applications cannot tolerate this condition.
> > >
> > > This is a problem that cannot be solved.
> > > Even assuming the timestamps are assigned by brokers and are guaranteed
> > > to be monotonic, this won't prevent 2 brokers from having clock skews.
> > > That would results in different delivery delays.
> > >
> > > Similarly, the broker timestamp might be assigned later compared to
> when a
> > > publisher was "intending" to start the clock.
> > >
> > > Barring super-precise clock synchronization techniques (which are way
> out
> > > of the scope of this discussion), the only reasonable way to think
> about
> > > this is
> > > that delays needs to be orders of magnitudes bigger than the average
> clock
> > > skew experienced with common techniques (eg: NTP). NTP clock skew will
> > > generally be in the 10s of millis. Any delay > 1 seconds will hardly be
> > > noticeably affected by these skews.
> > >
> > > Additionally, any optimization on the timeouts handling (like the
> > > hash-wheel
> > > timer proposed in PIP-26) will trade off precision for efficiency. In
> that
> > > case,
> > > the delays are managed in buckets, and can result in higher delays that
> > > what was requested.
> > >
> > > > 1. Fixed timeout, e.g.(with 10s, 30s, 10min delayed), this is the
> largest
> > > proportion in throughput of delayed message . A subscription with a
> fixed
> > > delayed time can approach to this scene.
> > >
> > > I don't think that for fixed delays, any server-side implementation
> > > would provide
> > > any advantage compared to doing:
> > >
> > > ```
> > > while (true) {
> > >     Message msg = consumer.receive();
> > >     long delayMillis = calculateDelay(msg)
> > >     if (delayMillis > 0) {
> > >         Thread.sleep(delayMillis);
> > >     }
> > >
> > >     // Do something
> > >     consumer.acknowledge(msg);
> > > }
> > > ```
> > >
> > > This will not need any support from broker. Also, there will be no
> > > redeliveries.
> > >
> > > It could be wrapped in the client API, although I don't see that as
> > > big of a problem.
> > >
> > > > My concern of this category of approaches is "bandwidth" usage. It is
> > > basically trading bandwidth for complexity.
> > >
> > > With mixed delays on a single topic, in any case there has to be some
> kind
> > > of time-based sorting of the messages that needs to happen either at
> broker
> > > or at client.
> > >
> > > Functionally, I believe that either place is equivalent (from a user
> > > point of view),
> > > barring the different implementation requirements.
> > >
> > > In my view, the bigger cost here is not bandwidth but rather the disk
> > > IO, that will
> > > happen exactly in the same way in both cases. Messages can be cached,
> > > up to a certain point, either in broker or in client library. After
> > > that, in both cases,
> > > the messages will have to be fetched from bookies.
> > >
> > > Also, when implementing the delay feature in the client, the existing
> > > flow control
> > > mechanism is naturally applied to limit the overall amount of
> information
> > > that
> > > we have to keep track (the "currently tracked" messages). Some other
> > > mechanism
> > > would have to be done in the broker as well.
> > >
> > > Again, in general I'm more concerned of stuff that happens in broker
> > > because
> > > it will have to be scaled up 10s of thousands of times in a single
> > > process, while
> > > in client typically the requirements are much simpler.
> > >
> > > If the goal is to minimize the amount of redeliveries from broker ->
> > > client, there
> > > are multiple ways to achieve that with the client based approach (eg.
> send
> > > message id and delay time instead of the full payload to consumers as
> Ivan
> > > proposed).
> > >
> > > This seems to be simpler and with less overhead than having to persist
> > > the whole
> > > hashweel timer state into a ledger.
> >
> >
> > I agree with that there are many optimizations can be applied at a client
> > side approach. In a stable world, these approaches are technically
> > equivalent.
> >
> > However I do not agree with a few points:
> >
> > First, based on my past production experiences, network bandwidth on
> broker
> > is the bigger cost than io cost in a multi subscription case. Also, I
> have
> > heard a few production users have experienced latency issues where broker
> > network bandwidth is saturated. So any mechanisms that rely on
> redeliveries
> > are a big red flag to me.
> >
> > Secondly, currently pulsar is using more bandwidth on brokers, than
> > bandwidth on bookies. It is not a balanced state. I am more leaning
> towards
> > an approach that can leverage bookies’ idle bandwidth, rather than
> > potentially using more bandwidth on brokers.
> >
> > Thirdly, in my view, clock skew concern is not a technical issue, but a
> > management issue. As what Ivan and you have pointed out, there are many
> > ways on addressing clock skew. However, clock skew in a brokerside
> approach
> > is easier to manage and more predictable, but clock skew in a clientside
> > approach is much harder to manage and more unpredictable. This
> > unpredictability can significantly change the io or network pattern when
> > things go bad. When such unpredictability happens, it can cause bad
> things
> > and saturating broker network in a redeliver-ish approach. If we are
> > building a distributed system that can handle this unpredictability, a
> > broker-side approach is much more friendly to managebility and incident
> > management.
> >
> > Lastly, i do agree client side approaches have better scalability than
> > server side approaches in most cases. However I don’t believe that it is
> > the case here. And I don’t see anyone have a clear explanation on why a
> > broker approach is less scalable than the client side approach.
> >
> > Anyway, for managebility, bandwidth usage, client simplicity, I am more
> in
> > favor of a broker side approach, or at least an approach that is not
> > redelivery based. However since the feature is requested by Penghui
> > and Ezequiel,
> > I am also fine with this client side approach if they are okay with that.
> >
> > - Sijie
> >
> >
> >
> >
> > >
> > >
> > >
> > > --
> > > Matteo Merli
> > > <ma...@gmail.com>
> > >
> > >
> > > On Fri, Jan 18, 2019 at 6:35 AM Ezequiel Lovelle
> > > <ez...@gmail.com> wrote:
> > > >
> > > > Hi All! and sorry for delay :)
> > > >
> > > > Probably I'm going to say some things already said, so sorry
> beforehand.
> > > >
> > > > The two main needed features I think are the proposed:
> > > > A. Producer delay PIP-26. B. Consumers delay PR #3155
> > > >
> > > > Of course PIP-26 would result in consumers receiving delayed messages
> > > > but the important thing here is one of them made the decision about
> > > delay.
> > > >
> > > > First, the easy one, PR #3155. Consumers delay:
> > > >
> > > > As others have stated before, this is a more trivial approach because
> > > > of the nature of having the exactly same period of delay for each
> message
> > > > which is predictable.
> > > >
> > > > I agree that adding logic at broker should be avoided, but, for this
> > > > specific feature #3155 which I don't think is complex I believe there
> > > > are others serious advantages:
> > > >
> > > >  1. Simplicity at client side, we don't need to add any code which is
> > > >     less error prone.
> > > >  2. Clock issues from client side being outdated and causing headache
> > > >     to users detecting this.
> > > >  3. Avoids huge overhead delivering non expired messages across the
> > > >     network unnecessary.
> > > >  4. Consumers are free to decide to consume messages with delay
> > > regardless
> > > >     of the producer.
> > > >  5. Delay is uniform for all messages, which sometimes is the
> solution
> > > >     to the problem rather than arbitrary delays.
> > > >
> > > > I think that would be great if pulsar can provide this kind of
> features
> > > > without relaying on users needing to know heavy details about the
> > > > mechanism.
> > > >
> > > > For PIP-26:
> > > >
> > > > I think we can offer this with the purpose of message's with a more
> long
> > > > delay in terms of time? hours / days?
> > > >
> > > > So, if this is the case, we can assume a small granularity of time
> like
> > > > 1 minute making ledger's representing 1 minute of time and truncating
> > > > each time of message for it corresponding minute and storing in that
> > > > special ledger.
> > > > Users wanting to receive a messages scheduled for some days in future
> > > > rarely would care of a margin of error of 1 minute.
> > > >
> > > > Of course we need somehow make the broker aware of this in order to
> only
> > > > process ledger's for current corresponding minute and consume it.
> > > > And the broker would be the one subject to close current minute
> truncated
> > > > processed ledger.
> > > >
> > > > One problem I can think about this approach, is it painful for
> Bookkeeper
> > > > to having a lot of opened ledgers? (one for each minute per topic)
> > > >
> > > > Another problem here might be what happen if consumer was not
> started?
> > > > At startup time the broker should looking for potentially older
> ledger's
> > > > than its current time and this might be expensive.
> > > >
> > > > Other more trivial issue, we might need to refactor current mechanism
> > > > which deletes closed ledgers older than the configured time on name
> > > space.
> > > >
> > > > As a final note I think that would be great to have both features in
> > > pulsar
> > > > but sometimes not everything desired is achievable.
> > > > And please correct me if I said something senseless.
> > > >
> > > > --
> > > > *Ezequiel Lovelle*
> > > >
> > > >
> > > > On Fri, 18 Jan 2019 at 05:51, PengHui Li <co...@gmail.com>
> > > wrote:
> > > >
> > > > > > So rather than specifying the absolute timestamp that the message
> > > > > > should appear to the user, the dispatcher can specify the
> relative
> > > > > > delay after dispatch that it should appear to the user.
> > > > >
> > > > > As matteo said the worst case would be that the applied delay to be
> > > higher
> > > > > for some of the messages, if specify the relative delay to
> consumer,
> > > > > if consumer offline for a period of time, consumer will receive
> many
> > > > > delayed messages
> > > > > after connect to broker again will cause the worst case more
> serious.
> > > It's
> > > > > difficult to keep
> > > > > consumers always online.
> > > > >
> > > > > In my personal perspective, i refer to use `delay level topic` to
> > > approach
> > > > > smaller delays scene.
> > > > > e.g(10s-topic, 30s-topic), this will not be too much topic. And we
> are
> > > > > using dead letter topic to simulate
> > > > > delay message feature, delayed topics has different delay level.
> > > > >
> > > > > For very long delays scene, in our practice, user may cancel it or
> > > restart
> > > > > it.
> > > > > After previous discussions, i agree that PIP-26 will make broker
> > > > > more complexity.
> > > > > So I had the idea to consider as a separate mechanism.
> > > > >
> > > > >
> > > > > Sijie Guo <gu...@gmail.com> 于2019年1月18日周五 下午3:22写道：
> > > > >
> > > > > > On Fri, Jan 18, 2019 at 2:51 PM Ivan Kelly <iv...@apache.org>
> wrote:
> > > > > >
> > > > > > > One thing missing from this discussion is details on the
> motivating
> > > > > > > use-case. How many delayed messages per second are we
> expecting?
> > > And
> > > > > > > what is the payload size?
> > > > > > >
> > > > > > > > If consumer control the delayed message specific execution
> time
> > > we
> > > > > must
> > > > > > > > trust clock of consumer, this can cause delayed message
> process
> > > ahead
> > > > > > of
> > > > > > > > time, some applications cannot tolerate this condition.
> > > > > > >
> > > > > > > This can be handled in a number of ways. Consumer clocks can be
> > > skewed
> > > > > > > with regard to other clocks, but it is generally safe to assume
> > > that
> > > > > > > clocks advance at the same rate, especially at the granularity
> of a
> > > > > > > couple of hours.
> > > > > > > So rather than specifying the absolute timestamp that the
> message
> > > > > > > should appear to the user, the dispatcher can specify the
> relative
> > > > > > > delay after dispatch that it should appear to the user.
> > > > > > >
> > > > > > > > > My concern of this category of approaches is "bandwidth"
> > > usage. It
> > > > > is
> > > > > > > > > basically trading bandwidth for complexity.
> > > > > > > >
> > > > > > > > @Sijie Guo <si...@apache.org> Agree with you, such an
> trading
> > > can
> > > > > > cause
> > > > > > > the
> > > > > > > > broker's out going network to be more serious.
> > > > > > >
> > > > > > > I don't think PIP-26's approach may not use less bandwidth in
> this
> > > > > > > regard. With PIP-26, the msg ids are stored in a ledger, and
> when
> > > the
> > > > > > > timeout triggers it dispatches? Are all the delayed message
> being
> > > > > > > cached at the broker? If so, that is using a lot of memory, and
> > > it's
> > > > > > > exactly the kind of memory usage pattern that is very bad for
> JVM
> > > > > > > garbage collection. If not, then you have to read the message
> back
> > > in
> > > > > > > from bookkeeper, so the bandwidth usage is the same, though on
> a
> > > > > > > different path.
> > > > > > >
> > > > > > > In the client side approach, the message could be cached to
> avoid a
> > > > > > > redispatch. When I was discussing with Matteo, we discussed
> this.
> > > The
> > > > > > > redelivery logic has to be there in any case, as any cache
> (broker
> > > or
> > > > > > > client side) must have a limited size.
> > > > > > > Another option would be to skip sending the payload for delayed
> > > > > > > messages, and only send it when the client request redelivery,
> but
> > > > > > > this has the same issue with regard to the entry likely
> falling out
> > > > > > > the cache at the broker-side.
> > > > > >
> > > > > >
> > > > > > There are bandwidth usage at either approaches for sure. The main
> > > > > > difference between broker-side and client-side approaches is
> which
> > > part
> > > > > of
> > > > > > the bandwidth is used.
> > > > > >
> > > > > > In the broker-side approach, it is using the bookies egress and
> > > broker
> > > > > > ingress bandwidth. In a typical pulsar deployment, bookies
> egress is
> > > > > mostly
> > > > > > idle unless there are consumers falling behind.
> > > > > >
> > > > > > In the client-side approach, it is using broker’s egress
> bandwidth
> > > and
> > > > > > potentially bookies’ egress bandwidth. Brokers’ egress is
> critical
> > > since
> > > > > it
> > > > > > is shared across consumers. So if the broker egress is doubled,
> it
> > > is a
> > > > > red
> > > > > > flag.
> > > > > >
> > > > > > Although I agree the bandwidth usage depends on workloads. But in
> > > theory,
> > > > > > broker-side approach is more friendly to resource usage and a
> better
> > > > > > approach to use the resources in a multi layered architecture.
> > > Because it
> > > > > > uses less bandwidth at broker side. A client side can cause more
> > > > > bandwidth
> > > > > > usage at broker side.
> > > > > >
> > > > > > Also as what penghui pointed out, clock screw can be another
> factor
> > > > > causing
> > > > > > more traffic in a fanout case. In a broker-side approach, the
> > > deferred is
> > > > > > done in a central point, so when the deferred time point kicks
> in,
> > > broker
> > > > > > just need to read the data one time from bookies. However in a
> > > > > client-side
> > > > > > approach, the messages are asked by different subscriptions,
> > > different
> > > > > > subscription can ask the deferred message at any time based on
> their
> > > > > > clocks.
> > > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > -Ivan
> > > > > > >
> > > > > >
> > > > >
> > >
>

Re: [DISCUSSION] Delayed message delivery

Posted by Matteo Merli <ma...@gmail.com>.

Just a quick correction:

> And I don’t see anyone have a clear explanation on why a
broker approach is less scalable than the client side approach.

I haven't said it less or more scalable. I was meaning that it's
"easier" to scale, in that we don't have to do lots of fancy stuff
and add more and more control to make sure that the implementation
will not become a concern point at scale (eg: limit the overall
amount of memory used in broker, across all topics, and the
impact on GC of these long-living objects).

> However, clock skew in a brokerside approach
is easier to manage and more predictable, but clock skew in a clientside
approach is much harder to manage and more unpredictable

I don't necessarily agree with that. NTP is widely available
and understood.
Any application that's doing anything time-related would have
to make sure the clocks are reasonably synced.

--
Matteo Merli
<ma...@gmail.com>

On Fri, Jan 18, 2019 at 7:46 PM Sijie Guo <gu...@gmail.com> wrote:
>
> On Sat, Jan 19, 2019 at 9:45 AM Matteo Merli <ma...@gmail.com> wrote:
>
> > Trying to group and compress responses here.
> >
> > > If consumer control the delayed message specific execution time we must
> > trust clock of consumer, this can cause delayed message process ahead of
> > time, some applications cannot tolerate this condition.
> >
> > This is a problem that cannot be solved.
> > Even assuming the timestamps are assigned by brokers and are guaranteed
> > to be monotonic, this won't prevent 2 brokers from having clock skews.
> > That would results in different delivery delays.
> >
> > Similarly, the broker timestamp might be assigned later compared to when a
> > publisher was "intending" to start the clock.
> >
> > Barring super-precise clock synchronization techniques (which are way out
> > of the scope of this discussion), the only reasonable way to think about
> > this is
> > that delays needs to be orders of magnitudes bigger than the average clock
> > skew experienced with common techniques (eg: NTP). NTP clock skew will
> > generally be in the 10s of millis. Any delay > 1 seconds will hardly be
> > noticeably affected by these skews.
> >
> > Additionally, any optimization on the timeouts handling (like the
> > hash-wheel
> > timer proposed in PIP-26) will trade off precision for efficiency. In that
> > case,
> > the delays are managed in buckets, and can result in higher delays that
> > what was requested.
> >
> > > 1. Fixed timeout, e.g.(with 10s, 30s, 10min delayed), this is the largest
> > proportion in throughput of delayed message . A subscription with a fixed
> > delayed time can approach to this scene.
> >
> > I don't think that for fixed delays, any server-side implementation
> > would provide
> > any advantage compared to doing:
> >
> > ```
> > while (true) {
> >     Message msg = consumer.receive();
> >     long delayMillis = calculateDelay(msg)
> >     if (delayMillis > 0) {
> >         Thread.sleep(delayMillis);
> >     }
> >
> >     // Do something
> >     consumer.acknowledge(msg);
> > }
> > ```
> >
> > This will not need any support from broker. Also, there will be no
> > redeliveries.
> >
> > It could be wrapped in the client API, although I don't see that as
> > big of a problem.
> >
> > > My concern of this category of approaches is "bandwidth" usage. It is
> > basically trading bandwidth for complexity.
> >
> > With mixed delays on a single topic, in any case there has to be some kind
> > of time-based sorting of the messages that needs to happen either at broker
> > or at client.
> >
> > Functionally, I believe that either place is equivalent (from a user
> > point of view),
> > barring the different implementation requirements.
> >
> > In my view, the bigger cost here is not bandwidth but rather the disk
> > IO, that will
> > happen exactly in the same way in both cases. Messages can be cached,
> > up to a certain point, either in broker or in client library. After
> > that, in both cases,
> > the messages will have to be fetched from bookies.
> >
> > Also, when implementing the delay feature in the client, the existing
> > flow control
> > mechanism is naturally applied to limit the overall amount of information
> > that
> > we have to keep track (the "currently tracked" messages). Some other
> > mechanism
> > would have to be done in the broker as well.
> >
> > Again, in general I'm more concerned of stuff that happens in broker
> > because
> > it will have to be scaled up 10s of thousands of times in a single
> > process, while
> > in client typically the requirements are much simpler.
> >
> > If the goal is to minimize the amount of redeliveries from broker ->
> > client, there
> > are multiple ways to achieve that with the client based approach (eg. send
> > message id and delay time instead of the full payload to consumers as Ivan
> > proposed).
> >
> > This seems to be simpler and with less overhead than having to persist
> > the whole
> > hashweel timer state into a ledger.
>
>
> I agree with that there are many optimizations can be applied at a client
> side approach. In a stable world, these approaches are technically
> equivalent.
>
> However I do not agree with a few points:
>
> First, based on my past production experiences, network bandwidth on broker
> is the bigger cost than io cost in a multi subscription case. Also, I have
> heard a few production users have experienced latency issues where broker
> network bandwidth is saturated. So any mechanisms that rely on redeliveries
> are a big red flag to me.
>
> Secondly, currently pulsar is using more bandwidth on brokers, than
> bandwidth on bookies. It is not a balanced state. I am more leaning towards
> an approach that can leverage bookies’ idle bandwidth, rather than
> potentially using more bandwidth on brokers.
>
> Thirdly, in my view, clock skew concern is not a technical issue, but a
> management issue. As what Ivan and you have pointed out, there are many
> ways on addressing clock skew. However, clock skew in a brokerside approach
> is easier to manage and more predictable, but clock skew in a clientside
> approach is much harder to manage and more unpredictable. This
> unpredictability can significantly change the io or network pattern when
> things go bad. When such unpredictability happens, it can cause bad things
> and saturating broker network in a redeliver-ish approach. If we are
> building a distributed system that can handle this unpredictability, a
> broker-side approach is much more friendly to managebility and incident
> management.
>
> Lastly, i do agree client side approaches have better scalability than
> server side approaches in most cases. However I don’t believe that it is
> the case here. And I don’t see anyone have a clear explanation on why a
> broker approach is less scalable than the client side approach.
>
> Anyway, for managebility, bandwidth usage, client simplicity, I am more in
> favor of a broker side approach, or at least an approach that is not
> redelivery based. However since the feature is requested by Penghui
> and Ezequiel,
> I am also fine with this client side approach if they are okay with that.
>
> - Sijie
>
>
>
>
> >
> >
> >
> > --
> > Matteo Merli
> > <ma...@gmail.com>
> >
> >
> > On Fri, Jan 18, 2019 at 6:35 AM Ezequiel Lovelle
> > <ez...@gmail.com> wrote:
> > >
> > > Hi All! and sorry for delay :)
> > >
> > > Probably I'm going to say some things already said, so sorry beforehand.
> > >
> > > The two main needed features I think are the proposed:
> > > A. Producer delay PIP-26. B. Consumers delay PR #3155
> > >
> > > Of course PIP-26 would result in consumers receiving delayed messages
> > > but the important thing here is one of them made the decision about
> > delay.
> > >
> > > First, the easy one, PR #3155. Consumers delay:
> > >
> > > As others have stated before, this is a more trivial approach because
> > > of the nature of having the exactly same period of delay for each message
> > > which is predictable.
> > >
> > > I agree that adding logic at broker should be avoided, but, for this
> > > specific feature #3155 which I don't think is complex I believe there
> > > are others serious advantages:
> > >
> > >  1. Simplicity at client side, we don't need to add any code which is
> > >     less error prone.
> > >  2. Clock issues from client side being outdated and causing headache
> > >     to users detecting this.
> > >  3. Avoids huge overhead delivering non expired messages across the
> > >     network unnecessary.
> > >  4. Consumers are free to decide to consume messages with delay
> > regardless
> > >     of the producer.
> > >  5. Delay is uniform for all messages, which sometimes is the solution
> > >     to the problem rather than arbitrary delays.
> > >
> > > I think that would be great if pulsar can provide this kind of features
> > > without relaying on users needing to know heavy details about the
> > > mechanism.
> > >
> > > For PIP-26:
> > >
> > > I think we can offer this with the purpose of message's with a more long
> > > delay in terms of time? hours / days?
> > >
> > > So, if this is the case, we can assume a small granularity of time like
> > > 1 minute making ledger's representing 1 minute of time and truncating
> > > each time of message for it corresponding minute and storing in that
> > > special ledger.
> > > Users wanting to receive a messages scheduled for some days in future
> > > rarely would care of a margin of error of 1 minute.
> > >
> > > Of course we need somehow make the broker aware of this in order to only
> > > process ledger's for current corresponding minute and consume it.
> > > And the broker would be the one subject to close current minute truncated
> > > processed ledger.
> > >
> > > One problem I can think about this approach, is it painful for Bookkeeper
> > > to having a lot of opened ledgers? (one for each minute per topic)
> > >
> > > Another problem here might be what happen if consumer was not started?
> > > At startup time the broker should looking for potentially older ledger's
> > > than its current time and this might be expensive.
> > >
> > > Other more trivial issue, we might need to refactor current mechanism
> > > which deletes closed ledgers older than the configured time on name
> > space.
> > >
> > > As a final note I think that would be great to have both features in
> > pulsar
> > > but sometimes not everything desired is achievable.
> > > And please correct me if I said something senseless.
> > >
> > > --
> > > *Ezequiel Lovelle*
> > >
> > >
> > > On Fri, 18 Jan 2019 at 05:51, PengHui Li <co...@gmail.com>
> > wrote:
> > >
> > > > > So rather than specifying the absolute timestamp that the message
> > > > > should appear to the user, the dispatcher can specify the relative
> > > > > delay after dispatch that it should appear to the user.
> > > >
> > > > As matteo said the worst case would be that the applied delay to be
> > higher
> > > > for some of the messages, if specify the relative delay to consumer,
> > > > if consumer offline for a period of time, consumer will receive many
> > > > delayed messages
> > > > after connect to broker again will cause the worst case more serious.
> > It's
> > > > difficult to keep
> > > > consumers always online.
> > > >
> > > > In my personal perspective, i refer to use `delay level topic` to
> > approach
> > > > smaller delays scene.
> > > > e.g(10s-topic, 30s-topic), this will not be too much topic. And we are
> > > > using dead letter topic to simulate
> > > > delay message feature, delayed topics has different delay level.
> > > >
> > > > For very long delays scene, in our practice, user may cancel it or
> > restart
> > > > it.
> > > > After previous discussions, i agree that PIP-26 will make broker
> > > > more complexity.
> > > > So I had the idea to consider as a separate mechanism.
> > > >
> > > >
> > > > Sijie Guo <gu...@gmail.com> 于2019年1月18日周五 下午3:22写道：
> > > >
> > > > > On Fri, Jan 18, 2019 at 2:51 PM Ivan Kelly <iv...@apache.org> wrote:
> > > > >
> > > > > > One thing missing from this discussion is details on the motivating
> > > > > > use-case. How many delayed messages per second are we expecting?
> > And
> > > > > > what is the payload size?
> > > > > >
> > > > > > > If consumer control the delayed message specific execution time
> > we
> > > > must
> > > > > > > trust clock of consumer, this can cause delayed message process
> > ahead
> > > > > of
> > > > > > > time, some applications cannot tolerate this condition.
> > > > > >
> > > > > > This can be handled in a number of ways. Consumer clocks can be
> > skewed
> > > > > > with regard to other clocks, but it is generally safe to assume
> > that
> > > > > > clocks advance at the same rate, especially at the granularity of a
> > > > > > couple of hours.
> > > > > > So rather than specifying the absolute timestamp that the message
> > > > > > should appear to the user, the dispatcher can specify the relative
> > > > > > delay after dispatch that it should appear to the user.
> > > > > >
> > > > > > > > My concern of this category of approaches is "bandwidth"
> > usage. It
> > > > is
> > > > > > > > basically trading bandwidth for complexity.
> > > > > > >
> > > > > > > @Sijie Guo <si...@apache.org> Agree with you, such an trading
> > can
> > > > > cause
> > > > > > the
> > > > > > > broker's out going network to be more serious.
> > > > > >
> > > > > > I don't think PIP-26's approach may not use less bandwidth in this
> > > > > > regard. With PIP-26, the msg ids are stored in a ledger, and when
> > the
> > > > > > timeout triggers it dispatches? Are all the delayed message being
> > > > > > cached at the broker? If so, that is using a lot of memory, and
> > it's
> > > > > > exactly the kind of memory usage pattern that is very bad for JVM
> > > > > > garbage collection. If not, then you have to read the message back
> > in
> > > > > > from bookkeeper, so the bandwidth usage is the same, though on a
> > > > > > different path.
> > > > > >
> > > > > > In the client side approach, the message could be cached to avoid a
> > > > > > redispatch. When I was discussing with Matteo, we discussed this.
> > The
> > > > > > redelivery logic has to be there in any case, as any cache (broker
> > or
> > > > > > client side) must have a limited size.
> > > > > > Another option would be to skip sending the payload for delayed
> > > > > > messages, and only send it when the client request redelivery, but
> > > > > > this has the same issue with regard to the entry likely falling out
> > > > > > the cache at the broker-side.
> > > > >
> > > > >
> > > > > There are bandwidth usage at either approaches for sure. The main
> > > > > difference between broker-side and client-side approaches is which
> > part
> > > > of
> > > > > the bandwidth is used.
> > > > >
> > > > > In the broker-side approach, it is using the bookies egress and
> > broker
> > > > > ingress bandwidth. In a typical pulsar deployment, bookies egress is
> > > > mostly
> > > > > idle unless there are consumers falling behind.
> > > > >
> > > > > In the client-side approach, it is using broker’s egress bandwidth
> > and
> > > > > potentially bookies’ egress bandwidth. Brokers’ egress is critical
> > since
> > > > it
> > > > > is shared across consumers. So if the broker egress is doubled, it
> > is a
> > > > red
> > > > > flag.
> > > > >
> > > > > Although I agree the bandwidth usage depends on workloads. But in
> > theory,
> > > > > broker-side approach is more friendly to resource usage and a better
> > > > > approach to use the resources in a multi layered architecture.
> > Because it
> > > > > uses less bandwidth at broker side. A client side can cause more
> > > > bandwidth
> > > > > usage at broker side.
> > > > >
> > > > > Also as what penghui pointed out, clock screw can be another factor
> > > > causing
> > > > > more traffic in a fanout case. In a broker-side approach, the
> > deferred is
> > > > > done in a central point, so when the deferred time point kicks in,
> > broker
> > > > > just need to read the data one time from bookies. However in a
> > > > client-side
> > > > > approach, the messages are asked by different subscriptions,
> > different
> > > > > subscription can ask the deferred message at any time based on their
> > > > > clocks.
> > > > >
> > > > >
> > > > >
> > > > > >
> > > > > > -Ivan
> > > > > >
> > > > >
> > > >
> >

Re: [DISCUSSION] Delayed message delivery

Posted by Sijie Guo <gu...@gmail.com>.

On Sat, Jan 19, 2019 at 9:45 AM Matteo Merli <ma...@gmail.com> wrote:

> Trying to group and compress responses here.
>
> > If consumer control the delayed message specific execution time we must
> trust clock of consumer, this can cause delayed message process ahead of
> time, some applications cannot tolerate this condition.
>
> This is a problem that cannot be solved.
> Even assuming the timestamps are assigned by brokers and are guaranteed
> to be monotonic, this won't prevent 2 brokers from having clock skews.
> That would results in different delivery delays.
>
> Similarly, the broker timestamp might be assigned later compared to when a
> publisher was "intending" to start the clock.
>
> Barring super-precise clock synchronization techniques (which are way out
> of the scope of this discussion), the only reasonable way to think about
> this is
> that delays needs to be orders of magnitudes bigger than the average clock
> skew experienced with common techniques (eg: NTP). NTP clock skew will
> generally be in the 10s of millis. Any delay > 1 seconds will hardly be
> noticeably affected by these skews.
>
> Additionally, any optimization on the timeouts handling (like the
> hash-wheel
> timer proposed in PIP-26) will trade off precision for efficiency. In that
> case,
> the delays are managed in buckets, and can result in higher delays that
> what was requested.
>
> > 1. Fixed timeout, e.g.(with 10s, 30s, 10min delayed), this is the largest
> proportion in throughput of delayed message . A subscription with a fixed
> delayed time can approach to this scene.
>
> I don't think that for fixed delays, any server-side implementation
> would provide
> any advantage compared to doing:
>
> ```
> while (true) {
>     Message msg = consumer.receive();
>     long delayMillis = calculateDelay(msg)
>     if (delayMillis > 0) {
>         Thread.sleep(delayMillis);
>     }
>
>     // Do something
>     consumer.acknowledge(msg);
> }
> ```
>
> This will not need any support from broker. Also, there will be no
> redeliveries.
>
> It could be wrapped in the client API, although I don't see that as
> big of a problem.
>
> > My concern of this category of approaches is "bandwidth" usage. It is
> basically trading bandwidth for complexity.
>
> With mixed delays on a single topic, in any case there has to be some kind
> of time-based sorting of the messages that needs to happen either at broker
> or at client.
>
> Functionally, I believe that either place is equivalent (from a user
> point of view),
> barring the different implementation requirements.
>
> In my view, the bigger cost here is not bandwidth but rather the disk
> IO, that will
> happen exactly in the same way in both cases. Messages can be cached,
> up to a certain point, either in broker or in client library. After
> that, in both cases,
> the messages will have to be fetched from bookies.
>
> Also, when implementing the delay feature in the client, the existing
> flow control
> mechanism is naturally applied to limit the overall amount of information
> that
> we have to keep track (the "currently tracked" messages). Some other
> mechanism
> would have to be done in the broker as well.
>
> Again, in general I'm more concerned of stuff that happens in broker
> because
> it will have to be scaled up 10s of thousands of times in a single
> process, while
> in client typically the requirements are much simpler.
>
> If the goal is to minimize the amount of redeliveries from broker ->
> client, there
> are multiple ways to achieve that with the client based approach (eg. send
> message id and delay time instead of the full payload to consumers as Ivan
> proposed).
>
> This seems to be simpler and with less overhead than having to persist
> the whole
> hashweel timer state into a ledger.


I agree with that there are many optimizations can be applied at a client
side approach. In a stable world, these approaches are technically
equivalent.

However I do not agree with a few points:

First, based on my past production experiences, network bandwidth on broker
is the bigger cost than io cost in a multi subscription case. Also, I have
heard a few production users have experienced latency issues where broker
network bandwidth is saturated. So any mechanisms that rely on redeliveries
are a big red flag to me.

Secondly, currently pulsar is using more bandwidth on brokers, than
bandwidth on bookies. It is not a balanced state. I am more leaning towards
an approach that can leverage bookies’ idle bandwidth, rather than
potentially using more bandwidth on brokers.

Thirdly, in my view, clock skew concern is not a technical issue, but a
management issue. As what Ivan and you have pointed out, there are many
ways on addressing clock skew. However, clock skew in a brokerside approach
is easier to manage and more predictable, but clock skew in a clientside
approach is much harder to manage and more unpredictable. This
unpredictability can significantly change the io or network pattern when
things go bad. When such unpredictability happens, it can cause bad things
and saturating broker network in a redeliver-ish approach. If we are
building a distributed system that can handle this unpredictability, a
broker-side approach is much more friendly to managebility and incident
management.

Lastly, i do agree client side approaches have better scalability than
server side approaches in most cases. However I don’t believe that it is
the case here. And I don’t see anyone have a clear explanation on why a
broker approach is less scalable than the client side approach.

Anyway, for managebility, bandwidth usage, client simplicity, I am more in
favor of a broker side approach, or at least an approach that is not
redelivery based. However since the feature is requested by Penghui
and Ezequiel,
I am also fine with this client side approach if they are okay with that.

- Sijie




>
>
>
> --
> Matteo Merli
> <ma...@gmail.com>
>
>
> On Fri, Jan 18, 2019 at 6:35 AM Ezequiel Lovelle
> <ez...@gmail.com> wrote:
> >
> > Hi All! and sorry for delay :)
> >
> > Probably I'm going to say some things already said, so sorry beforehand.
> >
> > The two main needed features I think are the proposed:
> > A. Producer delay PIP-26. B. Consumers delay PR #3155
> >
> > Of course PIP-26 would result in consumers receiving delayed messages
> > but the important thing here is one of them made the decision about
> delay.
> >
> > First, the easy one, PR #3155. Consumers delay:
> >
> > As others have stated before, this is a more trivial approach because
> > of the nature of having the exactly same period of delay for each message
> > which is predictable.
> >
> > I agree that adding logic at broker should be avoided, but, for this
> > specific feature #3155 which I don't think is complex I believe there
> > are others serious advantages:
> >
> >  1. Simplicity at client side, we don't need to add any code which is
> >     less error prone.
> >  2. Clock issues from client side being outdated and causing headache
> >     to users detecting this.
> >  3. Avoids huge overhead delivering non expired messages across the
> >     network unnecessary.
> >  4. Consumers are free to decide to consume messages with delay
> regardless
> >     of the producer.
> >  5. Delay is uniform for all messages, which sometimes is the solution
> >     to the problem rather than arbitrary delays.
> >
> > I think that would be great if pulsar can provide this kind of features
> > without relaying on users needing to know heavy details about the
> > mechanism.
> >
> > For PIP-26:
> >
> > I think we can offer this with the purpose of message's with a more long
> > delay in terms of time? hours / days?
> >
> > So, if this is the case, we can assume a small granularity of time like
> > 1 minute making ledger's representing 1 minute of time and truncating
> > each time of message for it corresponding minute and storing in that
> > special ledger.
> > Users wanting to receive a messages scheduled for some days in future
> > rarely would care of a margin of error of 1 minute.
> >
> > Of course we need somehow make the broker aware of this in order to only
> > process ledger's for current corresponding minute and consume it.
> > And the broker would be the one subject to close current minute truncated
> > processed ledger.
> >
> > One problem I can think about this approach, is it painful for Bookkeeper
> > to having a lot of opened ledgers? (one for each minute per topic)
> >
> > Another problem here might be what happen if consumer was not started?
> > At startup time the broker should looking for potentially older ledger's
> > than its current time and this might be expensive.
> >
> > Other more trivial issue, we might need to refactor current mechanism
> > which deletes closed ledgers older than the configured time on name
> space.
> >
> > As a final note I think that would be great to have both features in
> pulsar
> > but sometimes not everything desired is achievable.
> > And please correct me if I said something senseless.
> >
> > --
> > *Ezequiel Lovelle*
> >
> >
> > On Fri, 18 Jan 2019 at 05:51, PengHui Li <co...@gmail.com>
> wrote:
> >
> > > > So rather than specifying the absolute timestamp that the message
> > > > should appear to the user, the dispatcher can specify the relative
> > > > delay after dispatch that it should appear to the user.
> > >
> > > As matteo said the worst case would be that the applied delay to be
> higher
> > > for some of the messages, if specify the relative delay to consumer,
> > > if consumer offline for a period of time, consumer will receive many
> > > delayed messages
> > > after connect to broker again will cause the worst case more serious.
> It's
> > > difficult to keep
> > > consumers always online.
> > >
> > > In my personal perspective, i refer to use `delay level topic` to
> approach
> > > smaller delays scene.
> > > e.g(10s-topic, 30s-topic), this will not be too much topic. And we are
> > > using dead letter topic to simulate
> > > delay message feature, delayed topics has different delay level.
> > >
> > > For very long delays scene, in our practice, user may cancel it or
> restart
> > > it.
> > > After previous discussions, i agree that PIP-26 will make broker
> > > more complexity.
> > > So I had the idea to consider as a separate mechanism.
> > >
> > >
> > > Sijie Guo <gu...@gmail.com> 于2019年1月18日周五 下午3:22写道：
> > >
> > > > On Fri, Jan 18, 2019 at 2:51 PM Ivan Kelly <iv...@apache.org> wrote:
> > > >
> > > > > One thing missing from this discussion is details on the motivating
> > > > > use-case. How many delayed messages per second are we expecting?
> And
> > > > > what is the payload size?
> > > > >
> > > > > > If consumer control the delayed message specific execution time
> we
> > > must
> > > > > > trust clock of consumer, this can cause delayed message process
> ahead
> > > > of
> > > > > > time, some applications cannot tolerate this condition.
> > > > >
> > > > > This can be handled in a number of ways. Consumer clocks can be
> skewed
> > > > > with regard to other clocks, but it is generally safe to assume
> that
> > > > > clocks advance at the same rate, especially at the granularity of a
> > > > > couple of hours.
> > > > > So rather than specifying the absolute timestamp that the message
> > > > > should appear to the user, the dispatcher can specify the relative
> > > > > delay after dispatch that it should appear to the user.
> > > > >
> > > > > > > My concern of this category of approaches is "bandwidth"
> usage. It
> > > is
> > > > > > > basically trading bandwidth for complexity.
> > > > > >
> > > > > > @Sijie Guo <si...@apache.org> Agree with you, such an trading
> can
> > > > cause
> > > > > the
> > > > > > broker's out going network to be more serious.
> > > > >
> > > > > I don't think PIP-26's approach may not use less bandwidth in this
> > > > > regard. With PIP-26, the msg ids are stored in a ledger, and when
> the
> > > > > timeout triggers it dispatches? Are all the delayed message being
> > > > > cached at the broker? If so, that is using a lot of memory, and
> it's
> > > > > exactly the kind of memory usage pattern that is very bad for JVM
> > > > > garbage collection. If not, then you have to read the message back
> in
> > > > > from bookkeeper, so the bandwidth usage is the same, though on a
> > > > > different path.
> > > > >
> > > > > In the client side approach, the message could be cached to avoid a
> > > > > redispatch. When I was discussing with Matteo, we discussed this.
> The
> > > > > redelivery logic has to be there in any case, as any cache (broker
> or
> > > > > client side) must have a limited size.
> > > > > Another option would be to skip sending the payload for delayed
> > > > > messages, and only send it when the client request redelivery, but
> > > > > this has the same issue with regard to the entry likely falling out
> > > > > the cache at the broker-side.
> > > >
> > > >
> > > > There are bandwidth usage at either approaches for sure. The main
> > > > difference between broker-side and client-side approaches is which
> part
> > > of
> > > > the bandwidth is used.
> > > >
> > > > In the broker-side approach, it is using the bookies egress and
> broker
> > > > ingress bandwidth. In a typical pulsar deployment, bookies egress is
> > > mostly
> > > > idle unless there are consumers falling behind.
> > > >
> > > > In the client-side approach, it is using broker’s egress bandwidth
> and
> > > > potentially bookies’ egress bandwidth. Brokers’ egress is critical
> since
> > > it
> > > > is shared across consumers. So if the broker egress is doubled, it
> is a
> > > red
> > > > flag.
> > > >
> > > > Although I agree the bandwidth usage depends on workloads. But in
> theory,
> > > > broker-side approach is more friendly to resource usage and a better
> > > > approach to use the resources in a multi layered architecture.
> Because it
> > > > uses less bandwidth at broker side. A client side can cause more
> > > bandwidth
> > > > usage at broker side.
> > > >
> > > > Also as what penghui pointed out, clock screw can be another factor
> > > causing
> > > > more traffic in a fanout case. In a broker-side approach, the
> deferred is
> > > > done in a central point, so when the deferred time point kicks in,
> broker
> > > > just need to read the data one time from bookies. However in a
> > > client-side
> > > > approach, the messages are asked by different subscriptions,
> different
> > > > subscription can ask the deferred message at any time based on their
> > > > clocks.
> > > >
> > > >
> > > >
> > > > >
> > > > > -Ivan
> > > > >
> > > >
> > >
>

Re: [DISCUSSION] Delayed message delivery

Posted by Matteo Merli <ma...@gmail.com>.

Trying to group and compress responses here.

> If consumer control the delayed message specific execution time we must
trust clock of consumer, this can cause delayed message process ahead of
time, some applications cannot tolerate this condition.

This is a problem that cannot be solved.
Even assuming the timestamps are assigned by brokers and are guaranteed
to be monotonic, this won't prevent 2 brokers from having clock skews.
That would results in different delivery delays.

Similarly, the broker timestamp might be assigned later compared to when a
publisher was "intending" to start the clock.

Barring super-precise clock synchronization techniques (which are way out
of the scope of this discussion), the only reasonable way to think about this is
that delays needs to be orders of magnitudes bigger than the average clock
skew experienced with common techniques (eg: NTP). NTP clock skew will
generally be in the 10s of millis. Any delay > 1 seconds will hardly be
noticeably affected by these skews.

Additionally, any optimization on the timeouts handling (like the hash-wheel
timer proposed in PIP-26) will trade off precision for efficiency. In that case,
the delays are managed in buckets, and can result in higher delays that
what was requested.

> 1. Fixed timeout, e.g.(with 10s, 30s, 10min delayed), this is the largest
proportion in throughput of delayed message . A subscription with a fixed
delayed time can approach to this scene.

I don't think that for fixed delays, any server-side implementation
would provide
any advantage compared to doing:

```
while (true) {
    Message msg = consumer.receive();
    long delayMillis = calculateDelay(msg)
    if (delayMillis > 0) {
        Thread.sleep(delayMillis);
    }

    // Do something
    consumer.acknowledge(msg);
}
```

This will not need any support from broker. Also, there will be no redeliveries.

It could be wrapped in the client API, although I don't see that as
big of a problem.

> My concern of this category of approaches is "bandwidth" usage. It is
basically trading bandwidth for complexity.

With mixed delays on a single topic, in any case there has to be some kind
of time-based sorting of the messages that needs to happen either at broker
or at client.

Functionally, I believe that either place is equivalent (from a user
point of view),
barring the different implementation requirements.

In my view, the bigger cost here is not bandwidth but rather the disk
IO, that will
happen exactly in the same way in both cases. Messages can be cached,
up to a certain point, either in broker or in client library. After
that, in both cases,
the messages will have to be fetched from bookies.

Also, when implementing the delay feature in the client, the existing
flow control
mechanism is naturally applied to limit the overall amount of information that
we have to keep track (the "currently tracked" messages). Some other mechanism
would have to be done in the broker as well.

Again, in general I'm more concerned of stuff that happens in broker because
it will have to be scaled up 10s of thousands of times in a single
process, while
in client typically the requirements are much simpler.

If the goal is to minimize the amount of redeliveries from broker ->
client, there
are multiple ways to achieve that with the client based approach (eg. send
message id and delay time instead of the full payload to consumers as Ivan
proposed).

This seems to be simpler and with less overhead than having to persist
the whole
hashweel timer state into a ledger.



--
Matteo Merli
<ma...@gmail.com>


On Fri, Jan 18, 2019 at 6:35 AM Ezequiel Lovelle
<ez...@gmail.com> wrote:
>
> Hi All! and sorry for delay :)
>
> Probably I'm going to say some things already said, so sorry beforehand.
>
> The two main needed features I think are the proposed:
> A. Producer delay PIP-26. B. Consumers delay PR #3155
>
> Of course PIP-26 would result in consumers receiving delayed messages
> but the important thing here is one of them made the decision about delay.
>
> First, the easy one, PR #3155. Consumers delay:
>
> As others have stated before, this is a more trivial approach because
> of the nature of having the exactly same period of delay for each message
> which is predictable.
>
> I agree that adding logic at broker should be avoided, but, for this
> specific feature #3155 which I don't think is complex I believe there
> are others serious advantages:
>
>  1. Simplicity at client side, we don't need to add any code which is
>     less error prone.
>  2. Clock issues from client side being outdated and causing headache
>     to users detecting this.
>  3. Avoids huge overhead delivering non expired messages across the
>     network unnecessary.
>  4. Consumers are free to decide to consume messages with delay regardless
>     of the producer.
>  5. Delay is uniform for all messages, which sometimes is the solution
>     to the problem rather than arbitrary delays.
>
> I think that would be great if pulsar can provide this kind of features
> without relaying on users needing to know heavy details about the
> mechanism.
>
> For PIP-26:
>
> I think we can offer this with the purpose of message's with a more long
> delay in terms of time? hours / days?
>
> So, if this is the case, we can assume a small granularity of time like
> 1 minute making ledger's representing 1 minute of time and truncating
> each time of message for it corresponding minute and storing in that
> special ledger.
> Users wanting to receive a messages scheduled for some days in future
> rarely would care of a margin of error of 1 minute.
>
> Of course we need somehow make the broker aware of this in order to only
> process ledger's for current corresponding minute and consume it.
> And the broker would be the one subject to close current minute truncated
> processed ledger.
>
> One problem I can think about this approach, is it painful for Bookkeeper
> to having a lot of opened ledgers? (one for each minute per topic)
>
> Another problem here might be what happen if consumer was not started?
> At startup time the broker should looking for potentially older ledger's
> than its current time and this might be expensive.
>
> Other more trivial issue, we might need to refactor current mechanism
> which deletes closed ledgers older than the configured time on name space.
>
> As a final note I think that would be great to have both features in pulsar
> but sometimes not everything desired is achievable.
> And please correct me if I said something senseless.
>
> --
> *Ezequiel Lovelle*
>
>
> On Fri, 18 Jan 2019 at 05:51, PengHui Li <co...@gmail.com> wrote:
>
> > > So rather than specifying the absolute timestamp that the message
> > > should appear to the user, the dispatcher can specify the relative
> > > delay after dispatch that it should appear to the user.
> >
> > As matteo said the worst case would be that the applied delay to be higher
> > for some of the messages, if specify the relative delay to consumer,
> > if consumer offline for a period of time, consumer will receive many
> > delayed messages
> > after connect to broker again will cause the worst case more serious. It's
> > difficult to keep
> > consumers always online.
> >
> > In my personal perspective, i refer to use `delay level topic` to approach
> > smaller delays scene.
> > e.g(10s-topic, 30s-topic), this will not be too much topic. And we are
> > using dead letter topic to simulate
> > delay message feature, delayed topics has different delay level.
> >
> > For very long delays scene, in our practice, user may cancel it or restart
> > it.
> > After previous discussions, i agree that PIP-26 will make broker
> > more complexity.
> > So I had the idea to consider as a separate mechanism.
> >
> >
> > Sijie Guo <gu...@gmail.com> 于2019年1月18日周五 下午3:22写道：
> >
> > > On Fri, Jan 18, 2019 at 2:51 PM Ivan Kelly <iv...@apache.org> wrote:
> > >
> > > > One thing missing from this discussion is details on the motivating
> > > > use-case. How many delayed messages per second are we expecting? And
> > > > what is the payload size?
> > > >
> > > > > If consumer control the delayed message specific execution time we
> > must
> > > > > trust clock of consumer, this can cause delayed message process ahead
> > > of
> > > > > time, some applications cannot tolerate this condition.
> > > >
> > > > This can be handled in a number of ways. Consumer clocks can be skewed
> > > > with regard to other clocks, but it is generally safe to assume that
> > > > clocks advance at the same rate, especially at the granularity of a
> > > > couple of hours.
> > > > So rather than specifying the absolute timestamp that the message
> > > > should appear to the user, the dispatcher can specify the relative
> > > > delay after dispatch that it should appear to the user.
> > > >
> > > > > > My concern of this category of approaches is "bandwidth" usage. It
> > is
> > > > > > basically trading bandwidth for complexity.
> > > > >
> > > > > @Sijie Guo <si...@apache.org> Agree with you, such an trading can
> > > cause
> > > > the
> > > > > broker's out going network to be more serious.
> > > >
> > > > I don't think PIP-26's approach may not use less bandwidth in this
> > > > regard. With PIP-26, the msg ids are stored in a ledger, and when the
> > > > timeout triggers it dispatches? Are all the delayed message being
> > > > cached at the broker? If so, that is using a lot of memory, and it's
> > > > exactly the kind of memory usage pattern that is very bad for JVM
> > > > garbage collection. If not, then you have to read the message back in
> > > > from bookkeeper, so the bandwidth usage is the same, though on a
> > > > different path.
> > > >
> > > > In the client side approach, the message could be cached to avoid a
> > > > redispatch. When I was discussing with Matteo, we discussed this. The
> > > > redelivery logic has to be there in any case, as any cache (broker or
> > > > client side) must have a limited size.
> > > > Another option would be to skip sending the payload for delayed
> > > > messages, and only send it when the client request redelivery, but
> > > > this has the same issue with regard to the entry likely falling out
> > > > the cache at the broker-side.
> > >
> > >
> > > There are bandwidth usage at either approaches for sure. The main
> > > difference between broker-side and client-side approaches is which part
> > of
> > > the bandwidth is used.
> > >
> > > In the broker-side approach, it is using the bookies egress and broker
> > > ingress bandwidth. In a typical pulsar deployment, bookies egress is
> > mostly
> > > idle unless there are consumers falling behind.
> > >
> > > In the client-side approach, it is using broker’s egress bandwidth and
> > > potentially bookies’ egress bandwidth. Brokers’ egress is critical since
> > it
> > > is shared across consumers. So if the broker egress is doubled, it is a
> > red
> > > flag.
> > >
> > > Although I agree the bandwidth usage depends on workloads. But in theory,
> > > broker-side approach is more friendly to resource usage and a better
> > > approach to use the resources in a multi layered architecture. Because it
> > > uses less bandwidth at broker side. A client side can cause more
> > bandwidth
> > > usage at broker side.
> > >
> > > Also as what penghui pointed out, clock screw can be another factor
> > causing
> > > more traffic in a fanout case. In a broker-side approach, the deferred is
> > > done in a central point, so when the deferred time point kicks in, broker
> > > just need to read the data one time from bookies. However in a
> > client-side
> > > approach, the messages are asked by different subscriptions, different
> > > subscription can ask the deferred message at any time based on their
> > > clocks.
> > >
> > >
> > >
> > > >
> > > > -Ivan
> > > >
> > >
> >

Re: [DISCUSSION] Delayed message delivery

Posted by Ezequiel Lovelle <ez...@gmail.com>.

Hi All! and sorry for delay :)

Probably I'm going to say some things already said, so sorry beforehand.

The two main needed features I think are the proposed:
A. Producer delay PIP-26. B. Consumers delay PR #3155

Of course PIP-26 would result in consumers receiving delayed messages
but the important thing here is one of them made the decision about delay.

First, the easy one, PR #3155. Consumers delay:

As others have stated before, this is a more trivial approach because
of the nature of having the exactly same period of delay for each message
which is predictable.

I agree that adding logic at broker should be avoided, but, for this
specific feature #3155 which I don't think is complex I believe there
are others serious advantages:

 1. Simplicity at client side, we don't need to add any code which is
    less error prone.
 2. Clock issues from client side being outdated and causing headache
    to users detecting this.
 3. Avoids huge overhead delivering non expired messages across the
    network unnecessary.
 4. Consumers are free to decide to consume messages with delay regardless
    of the producer.
 5. Delay is uniform for all messages, which sometimes is the solution
    to the problem rather than arbitrary delays.

I think that would be great if pulsar can provide this kind of features
without relaying on users needing to know heavy details about the
mechanism.

For PIP-26:

I think we can offer this with the purpose of message's with a more long
delay in terms of time? hours / days?

So, if this is the case, we can assume a small granularity of time like
1 minute making ledger's representing 1 minute of time and truncating
each time of message for it corresponding minute and storing in that
special ledger.
Users wanting to receive a messages scheduled for some days in future
rarely would care of a margin of error of 1 minute.

Of course we need somehow make the broker aware of this in order to only
process ledger's for current corresponding minute and consume it.
And the broker would be the one subject to close current minute truncated
processed ledger.

One problem I can think about this approach, is it painful for Bookkeeper
to having a lot of opened ledgers? (one for each minute per topic)

Another problem here might be what happen if consumer was not started?
At startup time the broker should looking for potentially older ledger's
than its current time and this might be expensive.

Other more trivial issue, we might need to refactor current mechanism
which deletes closed ledgers older than the configured time on name space.

As a final note I think that would be great to have both features in pulsar
but sometimes not everything desired is achievable.
And please correct me if I said something senseless.

--
*Ezequiel Lovelle*


On Fri, 18 Jan 2019 at 05:51, PengHui Li <co...@gmail.com> wrote:

> > So rather than specifying the absolute timestamp that the message
> > should appear to the user, the dispatcher can specify the relative
> > delay after dispatch that it should appear to the user.
>
> As matteo said the worst case would be that the applied delay to be higher
> for some of the messages, if specify the relative delay to consumer,
> if consumer offline for a period of time, consumer will receive many
> delayed messages
> after connect to broker again will cause the worst case more serious. It's
> difficult to keep
> consumers always online.
>
> In my personal perspective, i refer to use `delay level topic` to approach
> smaller delays scene.
> e.g(10s-topic, 30s-topic), this will not be too much topic. And we are
> using dead letter topic to simulate
> delay message feature, delayed topics has different delay level.
>
> For very long delays scene, in our practice, user may cancel it or restart
> it.
> After previous discussions, i agree that PIP-26 will make broker
> more complexity.
> So I had the idea to consider as a separate mechanism.
>
>
> Sijie Guo <gu...@gmail.com> 于2019年1月18日周五 下午3:22写道：
>
> > On Fri, Jan 18, 2019 at 2:51 PM Ivan Kelly <iv...@apache.org> wrote:
> >
> > > One thing missing from this discussion is details on the motivating
> > > use-case. How many delayed messages per second are we expecting? And
> > > what is the payload size?
> > >
> > > > If consumer control the delayed message specific execution time we
> must
> > > > trust clock of consumer, this can cause delayed message process ahead
> > of
> > > > time, some applications cannot tolerate this condition.
> > >
> > > This can be handled in a number of ways. Consumer clocks can be skewed
> > > with regard to other clocks, but it is generally safe to assume that
> > > clocks advance at the same rate, especially at the granularity of a
> > > couple of hours.
> > > So rather than specifying the absolute timestamp that the message
> > > should appear to the user, the dispatcher can specify the relative
> > > delay after dispatch that it should appear to the user.
> > >
> > > > > My concern of this category of approaches is "bandwidth" usage. It
> is
> > > > > basically trading bandwidth for complexity.
> > > >
> > > > @Sijie Guo <si...@apache.org> Agree with you, such an trading can
> > cause
> > > the
> > > > broker's out going network to be more serious.
> > >
> > > I don't think PIP-26's approach may not use less bandwidth in this
> > > regard. With PIP-26, the msg ids are stored in a ledger, and when the
> > > timeout triggers it dispatches? Are all the delayed message being
> > > cached at the broker? If so, that is using a lot of memory, and it's
> > > exactly the kind of memory usage pattern that is very bad for JVM
> > > garbage collection. If not, then you have to read the message back in
> > > from bookkeeper, so the bandwidth usage is the same, though on a
> > > different path.
> > >
> > > In the client side approach, the message could be cached to avoid a
> > > redispatch. When I was discussing with Matteo, we discussed this. The
> > > redelivery logic has to be there in any case, as any cache (broker or
> > > client side) must have a limited size.
> > > Another option would be to skip sending the payload for delayed
> > > messages, and only send it when the client request redelivery, but
> > > this has the same issue with regard to the entry likely falling out
> > > the cache at the broker-side.
> >
> >
> > There are bandwidth usage at either approaches for sure. The main
> > difference between broker-side and client-side approaches is which part
> of
> > the bandwidth is used.
> >
> > In the broker-side approach, it is using the bookies egress and broker
> > ingress bandwidth. In a typical pulsar deployment, bookies egress is
> mostly
> > idle unless there are consumers falling behind.
> >
> > In the client-side approach, it is using broker’s egress bandwidth and
> > potentially bookies’ egress bandwidth. Brokers’ egress is critical since
> it
> > is shared across consumers. So if the broker egress is doubled, it is a
> red
> > flag.
> >
> > Although I agree the bandwidth usage depends on workloads. But in theory,
> > broker-side approach is more friendly to resource usage and a better
> > approach to use the resources in a multi layered architecture. Because it
> > uses less bandwidth at broker side. A client side can cause more
> bandwidth
> > usage at broker side.
> >
> > Also as what penghui pointed out, clock screw can be another factor
> causing
> > more traffic in a fanout case. In a broker-side approach, the deferred is
> > done in a central point, so when the deferred time point kicks in, broker
> > just need to read the data one time from bookies. However in a
> client-side
> > approach, the messages are asked by different subscriptions, different
> > subscription can ask the deferred message at any time based on their
> > clocks.
> >
> >
> >
> > >
> > > -Ivan
> > >
> >
>

Re: [DISCUSSION] Delayed message delivery

Posted by PengHui Li <co...@gmail.com>.

> So rather than specifying the absolute timestamp that the message
> should appear to the user, the dispatcher can specify the relative
> delay after dispatch that it should appear to the user.

As matteo said the worst case would be that the applied delay to be higher
for some of the messages, if specify the relative delay to consumer,
if consumer offline for a period of time, consumer will receive many
delayed messages
after connect to broker again will cause the worst case more serious. It's
difficult to keep
consumers always online.

In my personal perspective, i refer to use `delay level topic` to approach
smaller delays scene.
e.g(10s-topic, 30s-topic), this will not be too much topic. And we are
using dead letter topic to simulate
delay message feature, delayed topics has different delay level.

For very long delays scene, in our practice, user may cancel it or restart
it.
After previous discussions, i agree that PIP-26 will make broker
more complexity.
So I had the idea to consider as a separate mechanism.


Sijie Guo <gu...@gmail.com> 于2019年1月18日周五 下午3:22写道：

> On Fri, Jan 18, 2019 at 2:51 PM Ivan Kelly <iv...@apache.org> wrote:
>
> > One thing missing from this discussion is details on the motivating
> > use-case. How many delayed messages per second are we expecting? And
> > what is the payload size?
> >
> > > If consumer control the delayed message specific execution time we must
> > > trust clock of consumer, this can cause delayed message process ahead
> of
> > > time, some applications cannot tolerate this condition.
> >
> > This can be handled in a number of ways. Consumer clocks can be skewed
> > with regard to other clocks, but it is generally safe to assume that
> > clocks advance at the same rate, especially at the granularity of a
> > couple of hours.
> > So rather than specifying the absolute timestamp that the message
> > should appear to the user, the dispatcher can specify the relative
> > delay after dispatch that it should appear to the user.
> >
> > > > My concern of this category of approaches is "bandwidth" usage. It is
> > > > basically trading bandwidth for complexity.
> > >
> > > @Sijie Guo <si...@apache.org> Agree with you, such an trading can
> cause
> > the
> > > broker's out going network to be more serious.
> >
> > I don't think PIP-26's approach may not use less bandwidth in this
> > regard. With PIP-26, the msg ids are stored in a ledger, and when the
> > timeout triggers it dispatches? Are all the delayed message being
> > cached at the broker? If so, that is using a lot of memory, and it's
> > exactly the kind of memory usage pattern that is very bad for JVM
> > garbage collection. If not, then you have to read the message back in
> > from bookkeeper, so the bandwidth usage is the same, though on a
> > different path.
> >
> > In the client side approach, the message could be cached to avoid a
> > redispatch. When I was discussing with Matteo, we discussed this. The
> > redelivery logic has to be there in any case, as any cache (broker or
> > client side) must have a limited size.
> > Another option would be to skip sending the payload for delayed
> > messages, and only send it when the client request redelivery, but
> > this has the same issue with regard to the entry likely falling out
> > the cache at the broker-side.
>
>
> There are bandwidth usage at either approaches for sure. The main
> difference between broker-side and client-side approaches is which part of
> the bandwidth is used.
>
> In the broker-side approach, it is using the bookies egress and broker
> ingress bandwidth. In a typical pulsar deployment, bookies egress is mostly
> idle unless there are consumers falling behind.
>
> In the client-side approach, it is using broker’s egress bandwidth and
> potentially bookies’ egress bandwidth. Brokers’ egress is critical since it
> is shared across consumers. So if the broker egress is doubled, it is a red
> flag.
>
> Although I agree the bandwidth usage depends on workloads. But in theory,
> broker-side approach is more friendly to resource usage and a better
> approach to use the resources in a multi layered architecture. Because it
> uses less bandwidth at broker side. A client side can cause more bandwidth
> usage at broker side.
>
> Also as what penghui pointed out, clock screw can be another factor causing
> more traffic in a fanout case. In a broker-side approach, the deferred is
> done in a central point, so when the deferred time point kicks in, broker
> just need to read the data one time from bookies. However in a client-side
> approach, the messages are asked by different subscriptions, different
> subscription can ask the deferred message at any time based on their
> clocks.
>
>
>
> >
> > -Ivan
> >
>

Re: [DISCUSSION] Delayed message delivery

Posted by Sijie Guo <gu...@gmail.com>.

On Fri, Jan 18, 2019 at 2:51 PM Ivan Kelly <iv...@apache.org> wrote:

> One thing missing from this discussion is details on the motivating
> use-case. How many delayed messages per second are we expecting? And
> what is the payload size?
>
> > If consumer control the delayed message specific execution time we must
> > trust clock of consumer, this can cause delayed message process ahead of
> > time, some applications cannot tolerate this condition.
>
> This can be handled in a number of ways. Consumer clocks can be skewed
> with regard to other clocks, but it is generally safe to assume that
> clocks advance at the same rate, especially at the granularity of a
> couple of hours.
> So rather than specifying the absolute timestamp that the message
> should appear to the user, the dispatcher can specify the relative
> delay after dispatch that it should appear to the user.
>
> > > My concern of this category of approaches is "bandwidth" usage. It is
> > > basically trading bandwidth for complexity.
> >
> > @Sijie Guo <si...@apache.org> Agree with you, such an trading can cause
> the
> > broker's out going network to be more serious.
>
> I don't think PIP-26's approach may not use less bandwidth in this
> regard. With PIP-26, the msg ids are stored in a ledger, and when the
> timeout triggers it dispatches? Are all the delayed message being
> cached at the broker? If so, that is using a lot of memory, and it's
> exactly the kind of memory usage pattern that is very bad for JVM
> garbage collection. If not, then you have to read the message back in
> from bookkeeper, so the bandwidth usage is the same, though on a
> different path.
>
> In the client side approach, the message could be cached to avoid a
> redispatch. When I was discussing with Matteo, we discussed this. The
> redelivery logic has to be there in any case, as any cache (broker or
> client side) must have a limited size.
> Another option would be to skip sending the payload for delayed
> messages, and only send it when the client request redelivery, but
> this has the same issue with regard to the entry likely falling out
> the cache at the broker-side.

There are bandwidth usage at either approaches for sure. The main
difference between broker-side and client-side approaches is which part of
the bandwidth is used.

In the broker-side approach, it is using the bookies egress and broker
ingress bandwidth. In a typical pulsar deployment, bookies egress is mostly
idle unless there are consumers falling behind.

In the client-side approach, it is using broker’s egress bandwidth and
potentially bookies’ egress bandwidth. Brokers’ egress is critical since it
is shared across consumers. So if the broker egress is doubled, it is a red
flag.

Although I agree the bandwidth usage depends on workloads. But in theory,
broker-side approach is more friendly to resource usage and a better
approach to use the resources in a multi layered architecture. Because it
uses less bandwidth at broker side. A client side can cause more bandwidth
usage at broker side.

Also as what penghui pointed out, clock screw can be another factor causing
more traffic in a fanout case. In a broker-side approach, the deferred is
done in a central point, so when the deferred time point kicks in, broker
just need to read the data one time from bookies. However in a client-side
approach, the messages are asked by different subscriptions, different
subscription can ask the deferred message at any time based on their clocks.

>
> -Ivan
>

Re: [DISCUSSION] Delayed message delivery

Posted by Ivan Kelly <iv...@apache.org>.

One thing missing from this discussion is details on the motivating
use-case. How many delayed messages per second are we expecting? And
what is the payload size?

> If consumer control the delayed message specific execution time we must
> trust clock of consumer, this can cause delayed message process ahead of
> time, some applications cannot tolerate this condition.

This can be handled in a number of ways. Consumer clocks can be skewed
with regard to other clocks, but it is generally safe to assume that
clocks advance at the same rate, especially at the granularity of a
couple of hours.
So rather than specifying the absolute timestamp that the message
should appear to the user, the dispatcher can specify the relative
delay after dispatch that it should appear to the user.

> > My concern of this category of approaches is "bandwidth" usage. It is
> > basically trading bandwidth for complexity.
>
> @Sijie Guo <si...@apache.org> Agree with you, such an trading can cause the
> broker's out going network to be more serious.

I don't think PIP-26's approach may not use less bandwidth in this
regard. With PIP-26, the msg ids are stored in a ledger, and when the
timeout triggers it dispatches? Are all the delayed message being
cached at the broker? If so, that is using a lot of memory, and it's
exactly the kind of memory usage pattern that is very bad for JVM
garbage collection. If not, then you have to read the message back in
from bookkeeper, so the bandwidth usage is the same, though on a
different path.

In the client side approach, the message could be cached to avoid a
redispatch. When I was discussing with Matteo, we discussed this. The
redelivery logic has to be there in any case, as any cache (broker or
client side) must have a limited size.
Another option would be to skip sending the payload for delayed
messages, and only send it when the client request redelivery, but
this has the same issue with regard to the entry likely falling out
the cache at the broker-side.

-Ivan

Re: [DISCUSSION] Delayed message delivery

Posted by PengHui Li <co...@gmail.com>.

If consumer control the delayed message specific execution time we must
trust clock of consumer, this can cause delayed message process ahead of
time, some applications cannot tolerate this condition.

> My concern of this category of approaches is "bandwidth" usage. It is
> basically trading bandwidth for complexity.

@Sijie Guo <si...@apache.org> Agree with you, such an trading can cause the
broker's out going network to be more serious.

Come back to the user scene, we have a lot of delayed message requirement:

1. Fixed timeout, e.g.(with 10s, 30s, 10min delayed), this is the largest
proportion in throughput of delayed message . A subscription with a fixed
delayed time can approach to this scene.

2. Arbitrary timeout, e.g.(with 10min, 60min, 62min, 78min….  delayed),
deal with this scenario from the client is difficult(delayed time to much
distributed), PIP-26 can approach to this scene.

Through the previous discussions around the delayed message delivery
proposals,  we should try to avoid adding complexity in
the broker dispatching code. Impletement Fixed delayed subscrption can
be simple as https://github.com/apache/pulsar/pull/3155, but can’t approach
to the arbitrary timeout scene.

I have a idea, We can turn arbitrary timeout feature into an optional
component like function worker, sql worker, websocket proxy, by this way,
we can keep broker dispatching simpler and the component can separate
deployment.

Sijie Guo <gu...@gmail.com> 于2019年1月17日周四 下午5:43写道：

> On Thu, Jan 17, 2019 at 2:58 PM Matteo Merli <mm...@apache.org> wrote:
>
> > After a long delay (no pun intended), I finally got through the
> > previous discussions around the delayed message delivery proposals.
> > I'm referring to PIP-26
> > https://github.com/apache/pulsar/wiki/PIP-26%3A-Delayed-Message-Delivery
> > and the Pull Request at #3155
> > https://github.com/apache/pulsar/pull/3155
> >
> > To summarize these proposals (correct me if I'm getting any point wrong):
> >
> >  * PIP-26
> >     - Producer sets arbitrary timeout on each message
> >     - Broker keeps a hash-wheel timer (backed by ledger) to keep track
> > of messages for which the dispatch has to be deferred
> >
> >  * PR #3155
> >    - Consumer specify a fixed time delay to consume messages
> >    - Broker will defer delivery by that time
> >
> > As I stated previously, we should try to avoid adding complexity in
> > the broker dispatching code, unless there's a clear benefit compared
> > to do the same operation in client library.
> >
> > After discussing with Ivan, I wanted to share this alternative approach.
> >
> > Key points:
> >   * Application set arbitrary timeout on each message
> >   * Broker is unchanged
> >   * Consumer (in client library) will make these messages visible to
> > application after delay has expired
> >
> > Implementation notes:
> >   * Producer change is trivial. We just need to add new field in
> > message metadata (similar as described in PIP-26)
> >   * On consumer side, the following will happen:
> >      - Messages get added to receiverQueue
> >      - When application calls receive, we might get from the queue a
> > message with delay.
> >      - This message is not passed to application. Rather insert the
> > message ID into a priority queue (or equivalent structure), ordered by
> > target time.
> >      - At this point messages are not added to the ack-timeout tracker
> >      - Periodically, we check the head of the priority queue to see if
> > there's anything ready
> >         - If so, we request the broker to "redeliver" these messages,
> > using the same mechanism as ack-timeout:
> > CommandRedeliverUnacknowledgedMessages
> >           with a list of message ids)
> >
>
> If I understand this correctly, this is an enhanced variation of
> "ack-timeout"-ish approach that supports arbitrary delays.
>
> My concern of this category of approaches is "bandwidth" usage. It is
> basically trading bandwidth for complexity.
>
> A *deferred* message will be delivered "twice" from brokers (if there is no
> cache or cache miss). If a topic's traffic is M bytes/second,
> there are N subscriptions. The traffic will effectively be 2 * M * N, which
> can potentially be a red flag to the users who
> rely on this feature.
>
> But I am also fine with approach if most of people don't in favor of
> changing the dispatch logic at broker side.
>
>
>
> > This approach will ensure:
> >  * We can support arbitrary delays
> >  * No changes and no overhead in broker - No need to configure
> > policies for delay activation
> >  * Works well with existing flow control mechanism: messages are
> > dequeued so that we can process messages with smaller delays
> >  * Amount of memory required in client side is limited.
> >      - We just keep message ids (we could consider caching few
> > messages as well, as an optimization)
> >      - Broker has a limit of unacked messages pushed to a consumer
> > (default 50K). I don't expect this being a particular problem.
> >        If there a lot of messages with big differences in the delay
> > value, the worst case would be that the applied delay to be higher
> >        for some of the messages.
> >
> > Any thoughts on this?
> >
> > --
> > Matteo Merli
> > <mm...@apache.org>
> >
>

Re: [DISCUSSION] Delayed message delivery

Posted by Sijie Guo <gu...@gmail.com>.

On Thu, Jan 17, 2019 at 2:58 PM Matteo Merli <mm...@apache.org> wrote:

> After a long delay (no pun intended), I finally got through the
> previous discussions around the delayed message delivery proposals.
> I'm referring to PIP-26
> https://github.com/apache/pulsar/wiki/PIP-26%3A-Delayed-Message-Delivery
> and the Pull Request at #3155
> https://github.com/apache/pulsar/pull/3155
>
> To summarize these proposals (correct me if I'm getting any point wrong):
>
>  * PIP-26
>     - Producer sets arbitrary timeout on each message
>     - Broker keeps a hash-wheel timer (backed by ledger) to keep track
> of messages for which the dispatch has to be deferred
>
>  * PR #3155
>    - Consumer specify a fixed time delay to consume messages
>    - Broker will defer delivery by that time
>
> As I stated previously, we should try to avoid adding complexity in
> the broker dispatching code, unless there's a clear benefit compared
> to do the same operation in client library.
>
> After discussing with Ivan, I wanted to share this alternative approach.
>
> Key points:
>   * Application set arbitrary timeout on each message
>   * Broker is unchanged
>   * Consumer (in client library) will make these messages visible to
> application after delay has expired
>
> Implementation notes:
>   * Producer change is trivial. We just need to add new field in
> message metadata (similar as described in PIP-26)
>   * On consumer side, the following will happen:
>      - Messages get added to receiverQueue
>      - When application calls receive, we might get from the queue a
> message with delay.
>      - This message is not passed to application. Rather insert the
> message ID into a priority queue (or equivalent structure), ordered by
> target time.
>      - At this point messages are not added to the ack-timeout tracker
>      - Periodically, we check the head of the priority queue to see if
> there's anything ready
>         - If so, we request the broker to "redeliver" these messages,
> using the same mechanism as ack-timeout:
> CommandRedeliverUnacknowledgedMessages
>           with a list of message ids)
>

If I understand this correctly, this is an enhanced variation of
"ack-timeout"-ish approach that supports arbitrary delays.

My concern of this category of approaches is "bandwidth" usage. It is
basically trading bandwidth for complexity.

A *deferred* message will be delivered "twice" from brokers (if there is no
cache or cache miss). If a topic's traffic is M bytes/second,
there are N subscriptions. The traffic will effectively be 2 * M * N, which
can potentially be a red flag to the users who
rely on this feature.

But I am also fine with approach if most of people don't in favor of
changing the dispatch logic at broker side.



> This approach will ensure:
>  * We can support arbitrary delays
>  * No changes and no overhead in broker - No need to configure
> policies for delay activation
>  * Works well with existing flow control mechanism: messages are
> dequeued so that we can process messages with smaller delays
>  * Amount of memory required in client side is limited.
>      - We just keep message ids (we could consider caching few
> messages as well, as an optimization)
>      - Broker has a limit of unacked messages pushed to a consumer
> (default 50K). I don't expect this being a particular problem.
>        If there a lot of messages with big differences in the delay
> value, the worst case would be that the applied delay to be higher
>        for some of the messages.
>
> Any thoughts on this?
>
> --
> Matteo Merli
> <mm...@apache.org>
>