You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@streampipes.apache.org by Philipp Zehnder <ze...@apache.org> on 2020/08/04 17:40:37 UTC

Timestamp in event

Hi all,

I am currently reworking the schema editor in Connect to work with the newly generated model. 
The following question came up: Should we ensure that there is a timestamp in the event? 
I.e. users have to add a timestamp or mark a property as a timestamp. 

What do you think?

Philipp

Re: Timestamp in event

Posted by Patrick Wiener <wi...@apache.org>.
I agree - this is somewhat more of a general question as @Philipp already pointed out. 

I share @Dominik's suggestions and think that we def need  the feature to actively ask users/display a warning in case of a missing timestamp field.

Coming back to the more general aspect, it is in the nature of an event, that it occurs or is created at a certain point in „time“. Knowing this point in time is crucial 
to deduce contextual knowledge about situations etc, a measurement at time T1 might „mean“ something totally different that a measurement at time T2.
Especially when thinking about windowing, streaming joins etc.

Thus, we I would suggest the following:

1. always actively inform the user that he/she a timestamp field is required
2a. if provided in the raw event stream: mark it (event time) - maybe needs transformation using date format strings etc.
2b. if not provided in the raw event stream: add it (ingestion time, when the event is processed by the Connect worker instance)

Internally, we then leverage timestamps as UNIX timestamps.

As for the suggestion with the index. We could do that, however it doesn’t feel intuitive to me and since we only use the index as an indicator for the passed seconds
since the adapter is created you could also just use the beforementioned method (2b) to simply add a timestamp on adapter creation using the wall clock time
of the Connect worker instance.

Doesn’t this cover all cases or am I missing some?

Patrick


> Am 06.08.2020 um 23:14 schrieb Philipp Zehnder <ze...@apache.org>:
> 
> Hi Marco,
> 
> do you mean this as a solution for all adapters, or for the file stream adapter?
> 
> If you mean it for the file stream adapter, then I would suggest that we mention in the documentation that a user should add an index column.
> Then mark this as a timestamp and provide this regex “s” (Then each number is interpreted as seconds).
> I like the idea of using the line index, but I do not know how we could implement this generic for all different formats. Do you have an idea?
> 
> Philipp
> 
>> On 6. Aug 2020, at 16:03, Marco Heyden <he...@gmail.com> wrote:
>> 
>> Hey, 
>> 
>> Maybe another option would be to use the data index as a default timestamp, if no other timestamp is provided. Then one could specify a sampling frequency and obtain the relative time since the start of recording.
>> 
>> What do you think?
>> 
>> Best
>> Marco
>> 
>> Am 06.08.20, 15:57 schrieb "Philipp Zehnder" <ze...@apache.org>:
>> 
>>   Hi,
>> 
>>   this is a general question. Do we want a time stamp each event?
>>   I think it makes sense to have a timestamp in each event, because then we always know when they occurred. When there is no timestamp in the data it can be added in the adapter. What is your opinion on that?
>> 
>>   With connect we have one case where a timestamp is required.
>>   For the file stream adapter, we use the timestamp to replay the events according to the offset between the timestamps in the events in the file.
>>   This enables us to simulate the original data stream.
>>   Therefore, we need a timestamp in the event schema. The event schema component is independent of the adapter used, so we do not know whether the timestamp is required or not. 
>> 
>>   Philipp
>> 
>> 
>>> On 6. Aug 2020, at 10:31, Dominik Riemer <ri...@apache.org> wrote:
>>> 
>>> Hi,
>>> 
>>> is there an advantage of requiring a timestamp in every event? Maybe we could also only display a warning or actively ask users in Connect in case a timestamp is missing and force the addition of timestamps in one of the following releases.
>>> 
>>> Dominik
>>> 
>>> On 2020/08/04 17:52:07, Patrick Wiener <wi...@apache.org> wrote: 
>>>> Hi Philipp,
>>>> 
>>>> I think that is definitely a valuable feature to check for timestamp existence before creating the adapter since we have a various processors or sinks that rely on a timestamp. 
>>>> 
>>>> One possible solution could be to notify users immediately in case a timestamp field is missing, e.g. in a dialog. 
>>>> 
>>>> 
>>>> Patrick
>>>> 
>>>>> Am 04.08.2020 um 19:40 schrieb Philipp Zehnder <ze...@apache.org>:
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> I am currently reworking the schema editor in Connect to work with the newly generated model. 
>>>>> The following question came up: Should we ensure that there is a timestamp in the event? 
>>>>> I.e. users have to add a timestamp or mark a property as a timestamp. 
>>>>> 
>>>>> What do you think?
>>>>> 
>>>>> Philipp
>>>> 
>>>> 
>> 
>> 
>> 
> 


Re: Timestamp in event

Posted by Philipp Zehnder <ze...@apache.org>.
Hi Marco,

do you mean this as a solution for all adapters, or for the file stream adapter?

If you mean it for the file stream adapter, then I would suggest that we mention in the documentation that a user should add an index column.
Then mark this as a timestamp and provide this regex “s” (Then each number is interpreted as seconds).
I like the idea of using the line index, but I do not know how we could implement this generic for all different formats. Do you have an idea?

Philipp

> On 6. Aug 2020, at 16:03, Marco Heyden <he...@gmail.com> wrote:
> 
> Hey, 
> 
> Maybe another option would be to use the data index as a default timestamp, if no other timestamp is provided. Then one could specify a sampling frequency and obtain the relative time since the start of recording.
> 
> What do you think?
> 
> Best
> Marco
> 
> Am 06.08.20, 15:57 schrieb "Philipp Zehnder" <ze...@apache.org>:
> 
>    Hi,
> 
>    this is a general question. Do we want a time stamp each event?
>    I think it makes sense to have a timestamp in each event, because then we always know when they occurred. When there is no timestamp in the data it can be added in the adapter. What is your opinion on that?
> 
>    With connect we have one case where a timestamp is required.
>    For the file stream adapter, we use the timestamp to replay the events according to the offset between the timestamps in the events in the file.
>    This enables us to simulate the original data stream.
>    Therefore, we need a timestamp in the event schema. The event schema component is independent of the adapter used, so we do not know whether the timestamp is required or not. 
> 
>    Philipp
> 
> 
>> On 6. Aug 2020, at 10:31, Dominik Riemer <ri...@apache.org> wrote:
>> 
>> Hi,
>> 
>> is there an advantage of requiring a timestamp in every event? Maybe we could also only display a warning or actively ask users in Connect in case a timestamp is missing and force the addition of timestamps in one of the following releases.
>> 
>> Dominik
>> 
>> On 2020/08/04 17:52:07, Patrick Wiener <wi...@apache.org> wrote: 
>>> Hi Philipp,
>>> 
>>> I think that is definitely a valuable feature to check for timestamp existence before creating the adapter since we have a various processors or sinks that rely on a timestamp. 
>>> 
>>> One possible solution could be to notify users immediately in case a timestamp field is missing, e.g. in a dialog. 
>>> 
>>> 
>>> Patrick
>>> 
>>>> Am 04.08.2020 um 19:40 schrieb Philipp Zehnder <ze...@apache.org>:
>>>> 
>>>> Hi all,
>>>> 
>>>> I am currently reworking the schema editor in Connect to work with the newly generated model. 
>>>> The following question came up: Should we ensure that there is a timestamp in the event? 
>>>> I.e. users have to add a timestamp or mark a property as a timestamp. 
>>>> 
>>>> What do you think?
>>>> 
>>>> Philipp
>>> 
>>> 
> 
> 
> 


Re: Timestamp in event

Posted by Marco Heyden <he...@gmail.com>.
Hey, 

Maybe another option would be to use the data index as a default timestamp, if no other timestamp is provided. Then one could specify a sampling frequency and obtain the relative time since the start of recording.

What do you think?

Best
Marco

Am 06.08.20, 15:57 schrieb "Philipp Zehnder" <ze...@apache.org>:

    Hi,

    this is a general question. Do we want a time stamp each event?
    I think it makes sense to have a timestamp in each event, because then we always know when they occurred. When there is no timestamp in the data it can be added in the adapter. What is your opinion on that?

    With connect we have one case where a timestamp is required.
    For the file stream adapter, we use the timestamp to replay the events according to the offset between the timestamps in the events in the file.
    This enables us to simulate the original data stream.
    Therefore, we need a timestamp in the event schema. The event schema component is independent of the adapter used, so we do not know whether the timestamp is required or not. 

    Philipp


    > On 6. Aug 2020, at 10:31, Dominik Riemer <ri...@apache.org> wrote:
    > 
    > Hi,
    > 
    > is there an advantage of requiring a timestamp in every event? Maybe we could also only display a warning or actively ask users in Connect in case a timestamp is missing and force the addition of timestamps in one of the following releases.
    > 
    > Dominik
    > 
    > On 2020/08/04 17:52:07, Patrick Wiener <wi...@apache.org> wrote: 
    >> Hi Philipp,
    >> 
    >> I think that is definitely a valuable feature to check for timestamp existence before creating the adapter since we have a various processors or sinks that rely on a timestamp. 
    >> 
    >> One possible solution could be to notify users immediately in case a timestamp field is missing, e.g. in a dialog. 
    >> 
    >> 
    >> Patrick
    >> 
    >>> Am 04.08.2020 um 19:40 schrieb Philipp Zehnder <ze...@apache.org>:
    >>> 
    >>> Hi all,
    >>> 
    >>> I am currently reworking the schema editor in Connect to work with the newly generated model. 
    >>> The following question came up: Should we ensure that there is a timestamp in the event? 
    >>> I.e. users have to add a timestamp or mark a property as a timestamp. 
    >>> 
    >>> What do you think?
    >>> 
    >>> Philipp
    >> 
    >> 




Re: Timestamp in event

Posted by Philipp Zehnder <ze...@apache.org>.
Hi,

this is a general question. Do we want a time stamp each event?
I think it makes sense to have a timestamp in each event, because then we always know when they occurred. When there is no timestamp in the data it can be added in the adapter. What is your opinion on that?

With connect we have one case where a timestamp is required.
For the file stream adapter, we use the timestamp to replay the events according to the offset between the timestamps in the events in the file.
This enables us to simulate the original data stream.
Therefore, we need a timestamp in the event schema. The event schema component is independent of the adapter used, so we do not know whether the timestamp is required or not. 

Philipp


> On 6. Aug 2020, at 10:31, Dominik Riemer <ri...@apache.org> wrote:
> 
> Hi,
> 
> is there an advantage of requiring a timestamp in every event? Maybe we could also only display a warning or actively ask users in Connect in case a timestamp is missing and force the addition of timestamps in one of the following releases.
> 
> Dominik
> 
> On 2020/08/04 17:52:07, Patrick Wiener <wi...@apache.org> wrote: 
>> Hi Philipp,
>> 
>> I think that is definitely a valuable feature to check for timestamp existence before creating the adapter since we have a various processors or sinks that rely on a timestamp. 
>> 
>> One possible solution could be to notify users immediately in case a timestamp field is missing, e.g. in a dialog. 
>> 
>> 
>> Patrick
>> 
>>> Am 04.08.2020 um 19:40 schrieb Philipp Zehnder <ze...@apache.org>:
>>> 
>>> Hi all,
>>> 
>>> I am currently reworking the schema editor in Connect to work with the newly generated model. 
>>> The following question came up: Should we ensure that there is a timestamp in the event? 
>>> I.e. users have to add a timestamp or mark a property as a timestamp. 
>>> 
>>> What do you think?
>>> 
>>> Philipp
>> 
>> 


Re: Timestamp in event

Posted by Dominik Riemer <ri...@apache.org>.
Hi,

is there an advantage of requiring a timestamp in every event? Maybe we could also only display a warning or actively ask users in Connect in case a timestamp is missing and force the addition of timestamps in one of the following releases.

Dominik

On 2020/08/04 17:52:07, Patrick Wiener <wi...@apache.org> wrote: 
> Hi Philipp,
> 
> I think that is definitely a valuable feature to check for timestamp existence before creating the adapter since we have a various processors or sinks that rely on a timestamp. 
> 
> One possible solution could be to notify users immediately in case a timestamp field is missing, e.g. in a dialog. 
> 
> 
> Patrick
> 
> > Am 04.08.2020 um 19:40 schrieb Philipp Zehnder <ze...@apache.org>:
> > 
> > Hi all,
> > 
> > I am currently reworking the schema editor in Connect to work with the newly generated model. 
> > The following question came up: Should we ensure that there is a timestamp in the event? 
> > I.e. users have to add a timestamp or mark a property as a timestamp. 
> > 
> > What do you think?
> > 
> > Philipp
> 
> 

Re: Timestamp in event

Posted by Patrick Wiener <wi...@apache.org>.
Hi Philipp,

I think that is definitely a valuable feature to check for timestamp existence before creating the adapter since we have a various processors or sinks that rely on a timestamp. 

One possible solution could be to notify users immediately in case a timestamp field is missing, e.g. in a dialog. 


Patrick

> Am 04.08.2020 um 19:40 schrieb Philipp Zehnder <ze...@apache.org>:
> 
> Hi all,
> 
> I am currently reworking the schema editor in Connect to work with the newly generated model. 
> The following question came up: Should we ensure that there is a timestamp in the event? 
> I.e. users have to add a timestamp or mark a property as a timestamp. 
> 
> What do you think?
> 
> Philipp


Re: Timestamp in event

Posted by Patrick Wiener <wi...@fzi.de>.
Hi Philipp,

I think that is definitely a valuable feature to check for timestamp existence before creating the adapter since we have a various processors or sinks that rely on a timestamp. 

One possible solution could be to notify users immediately in case a timestamp field is missing, e.g. in a dialog. 


Patrick

> Am 04.08.2020 um 19:40 schrieb Philipp Zehnder <ze...@apache.org>:
> 
> Hi all,
> 
> I am currently reworking the schema editor in Connect to work with the newly generated model. 
> The following question came up: Should we ensure that there is a timestamp in the event? 
> I.e. users have to add a timestamp or mark a property as a timestamp. 
> 
> What do you think?
> 
> Philipp