You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by Sijie Guo <gu...@gmail.com> on 2017/08/23 21:52:18 UTC

Add EventTime to pulsar messages

Hi pulsar folks,

Currently Pulsar messages has fields for `publish_time`. However in some
use cases for example stream computing, `event_time` is required. I'd like
to start a proposal to add `event_time` support to pulsar messages.

This proposal only covers adding `event_time` to pulsar messages and make
it available to both producers and subscribers. It doesn't cover advanced
features, such as event-time index or rewind subscriptions based on
event-time.

I created the proposal here:
https://gist.github.com/sijie/60324cc892643961b923593a597109ab

Please go over and provide your feedback/comments.

Also, since I don't have any permissions on writing pulsar wiki pages, can
any pulsar committers help me add this PIP to the wiki pages if it is
accepted?

- Sijie

Re: Add EventTime to pulsar messages

Posted by Sahaya Andrews <an...@apache.org>.
Ok, got it. I know you had specific reason for this proposal, but
wasn't clear initially. Thanks for clarifying.

Andrews.

On Wed, Aug 23, 2017 at 3:56 PM, Sijie Guo <gu...@gmail.com> wrote:
> On Wed, Aug 23, 2017 at 3:19 PM, Sahaya Andrews <an...@apache.org> wrote:
>
>> If this is not going to be used by the broker for anything, why can't
>> this be set in message property key/value?
>>
>
> Currently it is not used by broker. However, we might leverage field later
> on to provide advanced features like event-time based index. It is not
> covered by this proposal.
>
> Also there is another consideration - adding this field to message metadata
> is more serialization friendly than putting it into key/value property.
> this is important for stream computing,
> because this field might be accessed when processing every message. the
> overhead in key/value property is much higher than an int64 field.
>
> Hope this make sense.
>
> - Sijie
>
>
>>
>> Andrews
>>
>> On Wed, Aug 23, 2017 at 3:03 PM, Matteo Merli <mm...@apache.org> wrote:
>> > Proposal looks good to me.
>> >
>> > Added to wiki:
>> > https://github.com/apache/incubator-pulsar/wiki/PIP-5:-Event-time
>> >
>> > On Wed, Aug 23, 2017 at 2:52 PM Sijie Guo <gu...@gmail.com> wrote:
>> >
>> >> Hi pulsar folks,
>> >>
>> >> Currently Pulsar messages has fields for `publish_time`. However in some
>> >> use cases for example stream computing, `event_time` is required. I'd
>> like
>> >> to start a proposal to add `event_time` support to pulsar messages.
>> >>
>> >> This proposal only covers adding `event_time` to pulsar messages and
>> make
>> >> it available to both producers and subscribers. It doesn't cover
>> advanced
>> >> features, such as event-time index or rewind subscriptions based on
>> >> event-time.
>> >>
>> >> I created the proposal here:
>> >> https://gist.github.com/sijie/60324cc892643961b923593a597109ab
>> >>
>> >> Please go over and provide your feedback/comments.
>> >>
>> >> Also, since I don't have any permissions on writing pulsar wiki pages,
>> can
>> >> any pulsar committers help me add this PIP to the wiki pages if it is
>> >> accepted?
>> >>
>> >> - Sijie
>> >>
>> > --
>> > Matteo Merli
>> > <mm...@apache.org>
>>

Re: Add EventTime to pulsar messages

Posted by Rajan Dhabalia <rd...@apache.org>.
> Didn't get this, can you clarify?

I think I misinterpreted previous reply where I thought we want to add
"eventTIme" field to avoid deserialization of  Message-Metadata so, I asked
question how to retrieve "eventTime" field without deserializing.

> It's certain that an application can use a (string -> string) property,
though it would be a custom field and it will need to be in string format
and also carry the field name.
But now, I understand actual reason for not using properties to avoid extra
"key" in the message and keeping int64 value instead string.

> It's also nice to expose both `getPublishTime()` and `getEventTime()`
directly in a consistent way.
Sure.

Thanks,
Rajan

On Wed, Aug 23, 2017 at 4:31 PM, Matteo Merli <ma...@gmail.com>
wrote:

> > So, I believe in your usecase, we want to skip metadata-deserialization
> and
> > idea is to read only "EventTime" field (field-12) without deserializing
> > message-metada using ByteBuf-offset (eg: read last int64 using end of
> > metadata-index).? Don't you think it will prevent us to add additional
> > String-field in future because then it will be hard to know "EventTime"
> > index?
>
> Didn't get this, can you clarify?
>
> In my view, the event time is a common concept that can be useful in many
> scenarios, every time there is the chance for the message being published
> to refer to some event that happened before the publishing time.
>
> It's also nice to expose both `getPublishTime()` and `getEventTime()`
> directly in a consistent way.
>
> Finally, there will be zero overhead when the field is not set and very
> little when it's set.
>
> It's certain that an application can use a (string -> string) property,
> though it would be a custom field and it will need to be in string format
> and also carry the field name. I think event time is general enough to be
> considered as a built-in field and exposed in API.
>
> Matteo
>
> On Wed, Aug 23, 2017 at 4:23 PM Rajan Dhabalia <rd...@apache.org>
> wrote:
>
> > > because this field might be accessed when processing every message. the
> > overhead in key/value property is much higher than an int64 field.
> >
> > I have the same concern as Andrews mentioned.
> > But a very quick implementation question: properties
> > <
> > https://github.com/apache/incubator-pulsar/blob/master/
> pulsar-common/src/main/proto/PulsarApi.proto#L47
> > >
> > field is also part of message-metadata only.
> > So, I believe in your usecase, we want to skip metadata-deserialization
> and
> > idea is to read only "EventTime" field (field-12) without deserializing
> > message-metada using ByteBuf-offset (eg: read last int64 using end of
> > metadata-index).? Don't you think it will prevent us to add additional
> > String-field in future because then it will be hard to know "EventTime"
> > index?
> >
> > Thanks,
> > Rajan
> >
> >
> > On Wed, Aug 23, 2017 at 3:56 PM, Sijie Guo <gu...@gmail.com> wrote:
> >
> > > On Wed, Aug 23, 2017 at 3:19 PM, Sahaya Andrews <an...@apache.org>
> > > wrote:
> > >
> > > > If this is not going to be used by the broker for anything, why can't
> > > > this be set in message property key/value?
> > > >
> > >
> > > Currently it is not used by broker. However, we might leverage field
> > later
> > > on to provide advanced features like event-time based index. It is not
> > > covered by this proposal.
> > >
> > > Also there is another consideration - adding this field to message
> > metadata
> > > is more serialization friendly than putting it into key/value property.
> > > this is important for stream computing,
> > > because this field might be accessed when processing every message. the
> > > overhead in key/value property is much higher than an int64 field.
> > >
> > > Hope this make sense.
> > >
> > > - Sijie
> > >
> > >
> > > >
> > > > Andrews
> > > >
> > > > On Wed, Aug 23, 2017 at 3:03 PM, Matteo Merli <mm...@apache.org>
> > wrote:
> > > > > Proposal looks good to me.
> > > > >
> > > > > Added to wiki:
> > > > > https://github.com/apache/incubator-pulsar/wiki/PIP-5:-Event-time
> > > > >
> > > > > On Wed, Aug 23, 2017 at 2:52 PM Sijie Guo <gu...@gmail.com>
> > wrote:
> > > > >
> > > > >> Hi pulsar folks,
> > > > >>
> > > > >> Currently Pulsar messages has fields for `publish_time`. However
> in
> > > some
> > > > >> use cases for example stream computing, `event_time` is required.
> > I'd
> > > > like
> > > > >> to start a proposal to add `event_time` support to pulsar
> messages.
> > > > >>
> > > > >> This proposal only covers adding `event_time` to pulsar messages
> and
> > > > make
> > > > >> it available to both producers and subscribers. It doesn't cover
> > > > advanced
> > > > >> features, such as event-time index or rewind subscriptions based
> on
> > > > >> event-time.
> > > > >>
> > > > >> I created the proposal here:
> > > > >> https://gist.github.com/sijie/60324cc892643961b923593a597109ab
> > > > >>
> > > > >> Please go over and provide your feedback/comments.
> > > > >>
> > > > >> Also, since I don't have any permissions on writing pulsar wiki
> > pages,
> > > > can
> > > > >> any pulsar committers help me add this PIP to the wiki pages if it
> > is
> > > > >> accepted?
> > > > >>
> > > > >> - Sijie
> > > > >>
> > > > > --
> > > > > Matteo Merli
> > > > > <mm...@apache.org>
> > > >
> > >
> >
> --
> Matteo Merli
> <mm...@apache.org>
>

Re: Add EventTime to pulsar messages

Posted by Matteo Merli <ma...@gmail.com>.
> So, I believe in your usecase, we want to skip metadata-deserialization
and
> idea is to read only "EventTime" field (field-12) without deserializing
> message-metada using ByteBuf-offset (eg: read last int64 using end of
> metadata-index).? Don't you think it will prevent us to add additional
> String-field in future because then it will be hard to know "EventTime"
> index?

Didn't get this, can you clarify?

In my view, the event time is a common concept that can be useful in many
scenarios, every time there is the chance for the message being published
to refer to some event that happened before the publishing time.

It's also nice to expose both `getPublishTime()` and `getEventTime()`
directly in a consistent way.

Finally, there will be zero overhead when the field is not set and very
little when it's set.

It's certain that an application can use a (string -> string) property,
though it would be a custom field and it will need to be in string format
and also carry the field name. I think event time is general enough to be
considered as a built-in field and exposed in API.

Matteo

On Wed, Aug 23, 2017 at 4:23 PM Rajan Dhabalia <rd...@apache.org> wrote:

> > because this field might be accessed when processing every message. the
> overhead in key/value property is much higher than an int64 field.
>
> I have the same concern as Andrews mentioned.
> But a very quick implementation question: properties
> <
> https://github.com/apache/incubator-pulsar/blob/master/pulsar-common/src/main/proto/PulsarApi.proto#L47
> >
> field is also part of message-metadata only.
> So, I believe in your usecase, we want to skip metadata-deserialization and
> idea is to read only "EventTime" field (field-12) without deserializing
> message-metada using ByteBuf-offset (eg: read last int64 using end of
> metadata-index).? Don't you think it will prevent us to add additional
> String-field in future because then it will be hard to know "EventTime"
> index?
>
> Thanks,
> Rajan
>
>
> On Wed, Aug 23, 2017 at 3:56 PM, Sijie Guo <gu...@gmail.com> wrote:
>
> > On Wed, Aug 23, 2017 at 3:19 PM, Sahaya Andrews <an...@apache.org>
> > wrote:
> >
> > > If this is not going to be used by the broker for anything, why can't
> > > this be set in message property key/value?
> > >
> >
> > Currently it is not used by broker. However, we might leverage field
> later
> > on to provide advanced features like event-time based index. It is not
> > covered by this proposal.
> >
> > Also there is another consideration - adding this field to message
> metadata
> > is more serialization friendly than putting it into key/value property.
> > this is important for stream computing,
> > because this field might be accessed when processing every message. the
> > overhead in key/value property is much higher than an int64 field.
> >
> > Hope this make sense.
> >
> > - Sijie
> >
> >
> > >
> > > Andrews
> > >
> > > On Wed, Aug 23, 2017 at 3:03 PM, Matteo Merli <mm...@apache.org>
> wrote:
> > > > Proposal looks good to me.
> > > >
> > > > Added to wiki:
> > > > https://github.com/apache/incubator-pulsar/wiki/PIP-5:-Event-time
> > > >
> > > > On Wed, Aug 23, 2017 at 2:52 PM Sijie Guo <gu...@gmail.com>
> wrote:
> > > >
> > > >> Hi pulsar folks,
> > > >>
> > > >> Currently Pulsar messages has fields for `publish_time`. However in
> > some
> > > >> use cases for example stream computing, `event_time` is required.
> I'd
> > > like
> > > >> to start a proposal to add `event_time` support to pulsar messages.
> > > >>
> > > >> This proposal only covers adding `event_time` to pulsar messages and
> > > make
> > > >> it available to both producers and subscribers. It doesn't cover
> > > advanced
> > > >> features, such as event-time index or rewind subscriptions based on
> > > >> event-time.
> > > >>
> > > >> I created the proposal here:
> > > >> https://gist.github.com/sijie/60324cc892643961b923593a597109ab
> > > >>
> > > >> Please go over and provide your feedback/comments.
> > > >>
> > > >> Also, since I don't have any permissions on writing pulsar wiki
> pages,
> > > can
> > > >> any pulsar committers help me add this PIP to the wiki pages if it
> is
> > > >> accepted?
> > > >>
> > > >> - Sijie
> > > >>
> > > > --
> > > > Matteo Merli
> > > > <mm...@apache.org>
> > >
> >
>
-- 
Matteo Merli
<mm...@apache.org>

Re: Add EventTime to pulsar messages

Posted by Sijie Guo <gu...@gmail.com>.
On Wed, Aug 23, 2017 at 4:23 PM, Rajan Dhabalia <rd...@apache.org>
wrote:

> > because this field might be accessed when processing every message. the
> overhead in key/value property is much higher than an int64 field.
>
> I have the same concern as Andrews mentioned.
> But a very quick implementation question: properties
> <https://github.com/apache/incubator-pulsar/blob/master/
> pulsar-common/src/main/proto/PulsarApi.proto#L47>
> field is also part of message-metadata only.
> So, I believe in your usecase, we want to skip metadata-deserialization and
> idea is to read only "EventTime" field (field-12) without deserializing
> message-metada using ByteBuf-offset (eg: read last int64 using end of
> metadata-index).? Don't you think it will prevent us to add additional
> String-field in future because then it will be hard to know "EventTime"
> index?
>

Ah, not. In protobuf, the key/value properties are list of key/value pairs.
If you want to look up a `event-time` property,
you either do a sequential lookup or put the key/value pairs back into a
hashmap and look up. Either approach isn't really efficient if getEventTime
needs to happen on every message.

Also, it is going to introduce additional overhead for keys - every message
needs to carry a key `eventtime`.

Hope this make sense.


>
> Thanks,
> Rajan
>
>
> On Wed, Aug 23, 2017 at 3:56 PM, Sijie Guo <gu...@gmail.com> wrote:
>
> > On Wed, Aug 23, 2017 at 3:19 PM, Sahaya Andrews <an...@apache.org>
> > wrote:
> >
> > > If this is not going to be used by the broker for anything, why can't
> > > this be set in message property key/value?
> > >
> >
> > Currently it is not used by broker. However, we might leverage field
> later
> > on to provide advanced features like event-time based index. It is not
> > covered by this proposal.
> >
> > Also there is another consideration - adding this field to message
> metadata
> > is more serialization friendly than putting it into key/value property.
> > this is important for stream computing,
> > because this field might be accessed when processing every message. the
> > overhead in key/value property is much higher than an int64 field.
> >
> > Hope this make sense.
> >
> > - Sijie
> >
> >
> > >
> > > Andrews
> > >
> > > On Wed, Aug 23, 2017 at 3:03 PM, Matteo Merli <mm...@apache.org>
> wrote:
> > > > Proposal looks good to me.
> > > >
> > > > Added to wiki:
> > > > https://github.com/apache/incubator-pulsar/wiki/PIP-5:-Event-time
> > > >
> > > > On Wed, Aug 23, 2017 at 2:52 PM Sijie Guo <gu...@gmail.com>
> wrote:
> > > >
> > > >> Hi pulsar folks,
> > > >>
> > > >> Currently Pulsar messages has fields for `publish_time`. However in
> > some
> > > >> use cases for example stream computing, `event_time` is required.
> I'd
> > > like
> > > >> to start a proposal to add `event_time` support to pulsar messages.
> > > >>
> > > >> This proposal only covers adding `event_time` to pulsar messages and
> > > make
> > > >> it available to both producers and subscribers. It doesn't cover
> > > advanced
> > > >> features, such as event-time index or rewind subscriptions based on
> > > >> event-time.
> > > >>
> > > >> I created the proposal here:
> > > >> https://gist.github.com/sijie/60324cc892643961b923593a597109ab
> > > >>
> > > >> Please go over and provide your feedback/comments.
> > > >>
> > > >> Also, since I don't have any permissions on writing pulsar wiki
> pages,
> > > can
> > > >> any pulsar committers help me add this PIP to the wiki pages if it
> is
> > > >> accepted?
> > > >>
> > > >> - Sijie
> > > >>
> > > > --
> > > > Matteo Merli
> > > > <mm...@apache.org>
> > >
> >
>

Re: Add EventTime to pulsar messages

Posted by Rajan Dhabalia <rd...@apache.org>.
> because this field might be accessed when processing every message. the
overhead in key/value property is much higher than an int64 field.

I have the same concern as Andrews mentioned.
But a very quick implementation question: properties
<https://github.com/apache/incubator-pulsar/blob/master/pulsar-common/src/main/proto/PulsarApi.proto#L47>
field is also part of message-metadata only.
So, I believe in your usecase, we want to skip metadata-deserialization and
idea is to read only "EventTime" field (field-12) without deserializing
message-metada using ByteBuf-offset (eg: read last int64 using end of
metadata-index).? Don't you think it will prevent us to add additional
String-field in future because then it will be hard to know "EventTime"
index?

Thanks,
Rajan


On Wed, Aug 23, 2017 at 3:56 PM, Sijie Guo <gu...@gmail.com> wrote:

> On Wed, Aug 23, 2017 at 3:19 PM, Sahaya Andrews <an...@apache.org>
> wrote:
>
> > If this is not going to be used by the broker for anything, why can't
> > this be set in message property key/value?
> >
>
> Currently it is not used by broker. However, we might leverage field later
> on to provide advanced features like event-time based index. It is not
> covered by this proposal.
>
> Also there is another consideration - adding this field to message metadata
> is more serialization friendly than putting it into key/value property.
> this is important for stream computing,
> because this field might be accessed when processing every message. the
> overhead in key/value property is much higher than an int64 field.
>
> Hope this make sense.
>
> - Sijie
>
>
> >
> > Andrews
> >
> > On Wed, Aug 23, 2017 at 3:03 PM, Matteo Merli <mm...@apache.org> wrote:
> > > Proposal looks good to me.
> > >
> > > Added to wiki:
> > > https://github.com/apache/incubator-pulsar/wiki/PIP-5:-Event-time
> > >
> > > On Wed, Aug 23, 2017 at 2:52 PM Sijie Guo <gu...@gmail.com> wrote:
> > >
> > >> Hi pulsar folks,
> > >>
> > >> Currently Pulsar messages has fields for `publish_time`. However in
> some
> > >> use cases for example stream computing, `event_time` is required. I'd
> > like
> > >> to start a proposal to add `event_time` support to pulsar messages.
> > >>
> > >> This proposal only covers adding `event_time` to pulsar messages and
> > make
> > >> it available to both producers and subscribers. It doesn't cover
> > advanced
> > >> features, such as event-time index or rewind subscriptions based on
> > >> event-time.
> > >>
> > >> I created the proposal here:
> > >> https://gist.github.com/sijie/60324cc892643961b923593a597109ab
> > >>
> > >> Please go over and provide your feedback/comments.
> > >>
> > >> Also, since I don't have any permissions on writing pulsar wiki pages,
> > can
> > >> any pulsar committers help me add this PIP to the wiki pages if it is
> > >> accepted?
> > >>
> > >> - Sijie
> > >>
> > > --
> > > Matteo Merli
> > > <mm...@apache.org>
> >
>

Re: Add EventTime to pulsar messages

Posted by Sijie Guo <gu...@gmail.com>.
On Wed, Aug 23, 2017 at 3:19 PM, Sahaya Andrews <an...@apache.org> wrote:

> If this is not going to be used by the broker for anything, why can't
> this be set in message property key/value?
>

Currently it is not used by broker. However, we might leverage field later
on to provide advanced features like event-time based index. It is not
covered by this proposal.

Also there is another consideration - adding this field to message metadata
is more serialization friendly than putting it into key/value property.
this is important for stream computing,
because this field might be accessed when processing every message. the
overhead in key/value property is much higher than an int64 field.

Hope this make sense.

- Sijie


>
> Andrews
>
> On Wed, Aug 23, 2017 at 3:03 PM, Matteo Merli <mm...@apache.org> wrote:
> > Proposal looks good to me.
> >
> > Added to wiki:
> > https://github.com/apache/incubator-pulsar/wiki/PIP-5:-Event-time
> >
> > On Wed, Aug 23, 2017 at 2:52 PM Sijie Guo <gu...@gmail.com> wrote:
> >
> >> Hi pulsar folks,
> >>
> >> Currently Pulsar messages has fields for `publish_time`. However in some
> >> use cases for example stream computing, `event_time` is required. I'd
> like
> >> to start a proposal to add `event_time` support to pulsar messages.
> >>
> >> This proposal only covers adding `event_time` to pulsar messages and
> make
> >> it available to both producers and subscribers. It doesn't cover
> advanced
> >> features, such as event-time index or rewind subscriptions based on
> >> event-time.
> >>
> >> I created the proposal here:
> >> https://gist.github.com/sijie/60324cc892643961b923593a597109ab
> >>
> >> Please go over and provide your feedback/comments.
> >>
> >> Also, since I don't have any permissions on writing pulsar wiki pages,
> can
> >> any pulsar committers help me add this PIP to the wiki pages if it is
> >> accepted?
> >>
> >> - Sijie
> >>
> > --
> > Matteo Merli
> > <mm...@apache.org>
>

Re: Add EventTime to pulsar messages

Posted by Sahaya Andrews <an...@apache.org>.
If this is not going to be used by the broker for anything, why can't
this be set in message property key/value?

Andrews

On Wed, Aug 23, 2017 at 3:03 PM, Matteo Merli <mm...@apache.org> wrote:
> Proposal looks good to me.
>
> Added to wiki:
> https://github.com/apache/incubator-pulsar/wiki/PIP-5:-Event-time
>
> On Wed, Aug 23, 2017 at 2:52 PM Sijie Guo <gu...@gmail.com> wrote:
>
>> Hi pulsar folks,
>>
>> Currently Pulsar messages has fields for `publish_time`. However in some
>> use cases for example stream computing, `event_time` is required. I'd like
>> to start a proposal to add `event_time` support to pulsar messages.
>>
>> This proposal only covers adding `event_time` to pulsar messages and make
>> it available to both producers and subscribers. It doesn't cover advanced
>> features, such as event-time index or rewind subscriptions based on
>> event-time.
>>
>> I created the proposal here:
>> https://gist.github.com/sijie/60324cc892643961b923593a597109ab
>>
>> Please go over and provide your feedback/comments.
>>
>> Also, since I don't have any permissions on writing pulsar wiki pages, can
>> any pulsar committers help me add this PIP to the wiki pages if it is
>> accepted?
>>
>> - Sijie
>>
> --
> Matteo Merli
> <mm...@apache.org>

Re: Add EventTime to pulsar messages

Posted by Matteo Merli <mm...@apache.org>.
Proposal looks good to me.

Added to wiki:
https://github.com/apache/incubator-pulsar/wiki/PIP-5:-Event-time

On Wed, Aug 23, 2017 at 2:52 PM Sijie Guo <gu...@gmail.com> wrote:

> Hi pulsar folks,
>
> Currently Pulsar messages has fields for `publish_time`. However in some
> use cases for example stream computing, `event_time` is required. I'd like
> to start a proposal to add `event_time` support to pulsar messages.
>
> This proposal only covers adding `event_time` to pulsar messages and make
> it available to both producers and subscribers. It doesn't cover advanced
> features, such as event-time index or rewind subscriptions based on
> event-time.
>
> I created the proposal here:
> https://gist.github.com/sijie/60324cc892643961b923593a597109ab
>
> Please go over and provide your feedback/comments.
>
> Also, since I don't have any permissions on writing pulsar wiki pages, can
> any pulsar committers help me add this PIP to the wiki pages if it is
> accepted?
>
> - Sijie
>
-- 
Matteo Merli
<mm...@apache.org>