You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@metron.apache.org by "Kumar, Sunny" <Su...@capitalone.com> on 2016/05/27 17:29:13 UTC

Standardizing the timestamp in the parsed output of Metron

Hi all

Here at Capital One we want to standardize the timestamp of each telemetry tuple to UTC and we think this might be useful for Metron in general too.
If you think so too then we'll be happy to contribute this back to Metron. And again if so, I would like to have a quick sort-of architecture discussion for the use-case.
Please let me know what do you guys think?

Thanks,
Sunny
________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

Re: Standardizing the timestamp in the parsed output of Metron

Posted by "Zeolla@GMail.com" <ze...@gmail.com>.
Additionally there is the (already implied) concern of RFC1918 space or
external IP blocks allocated to a company that is spread geographically.
Perhaps an internal lookup using DDI data, if DDI includes geo region or
some kind of tag like datacenter, building name, address, etc.?  If not,
perhaps something simple that allows the user to configure
192.168.1-10./24, 10./8 = "datacenter a", 192.168.254./24 = "datacenter
b".  In either case, step two would map the tags ("datacenter a",
"datacenter b", etc.) to timezones.

On Fri, May 27, 2016, 14:21 Nick Allen <ni...@nickallen.org> wrote:

> Having the user manipulate timezone offsets is tricky too.  The particular
> offset for any one time zone can change throughout the year based on
> daylight savings, for example.
>
> It might be easier to allow the user to indicate the timezone (EDT, CDT,
> etc) and then have logic in the platform that knows how to convert to UTC
> at any given point in time.
>
> On Fri, May 27, 2016 at 1:39 PM, Puzio, Domenic <
> Domenic.Puzio@capitalone.com> wrote:
>
> > For me, I think this work needs to address two situations:
> >
> >
> >
> > 1. Data that is always coming from the same timezone. If a data source
> > always has its timestamp in EST, then we need to add a static offset to
> > each timestamp to make it UTC. Perhaps this could be a variable in the
> > parser configs; I believe some parsers already use a “withTimezone”
> > configuration method.
> >
> > 2. Data that comes in from multiple timezones. This is the trickier case;
> > we want to add or subtract an offset to get the timestamp to UTC, but
> this
> > offset could be different from record to record. We could compare the
> log’s
> > timestamp to the system timestamp to get a guess at the log’s timezone,
> but
> > I’m not sure how reliable and efficient this would be.
> >
> > Let me know what you all think about these two cases.
> >
> > Domenic
> >
> > On 5/27/16, 1:29 PM, "Kumar, Sunny" <Su...@capitalone.com> wrote:
> >
> > >Hi all
> > >
> > >Here at Capital One we want to standardize the timestamp of each
> > telemetry tuple to UTC and we think this might be useful for Metron in
> > general too.
> > >If you think so too then we'll be happy to contribute this back to
> > Metron. And again if so, I would like to have a quick sort-of
> architecture
> > discussion for the use-case.
> > >Please let me know what do you guys think?
> > >
> > >Thanks,
> > >Sunny
> > >________________________________________________________
> > >
> > >The information contained in this e-mail is confidential and/or
> > proprietary to Capital One and/or its affiliates and may only be used
> > solely in performance of work or services for Capital One. The
> information
> > transmitted herewith is intended only for use by the individual or entity
> > to which it is addressed. If the reader of this message is not the
> intended
> > recipient, you are hereby notified that any review, retransmission,
> > dissemination, distribution, copying or other use of, or taking of any
> > action in reliance upon this information is strictly prohibited. If you
> > have received this communication in error, please contact the sender and
> > delete the material from your computer.
> > ________________________________________________________
> >
> > The information contained in this e-mail is confidential and/or
> > proprietary to Capital One and/or its affiliates and may only be used
> > solely in performance of work or services for Capital One. The
> information
> > transmitted herewith is intended only for use by the individual or entity
> > to which it is addressed. If the reader of this message is not the
> intended
> > recipient, you are hereby notified that any review, retransmission,
> > dissemination, distribution, copying or other use of, or taking of any
> > action in reliance upon this information is strictly prohibited. If you
> > have received this communication in error, please contact the sender and
> > delete the material from your computer.
> >
>
>
>
> --
> Nick Allen <ni...@nickallen.org>
>
-- 

Jon

Re: Standardizing the timestamp in the parsed output of Metron

Posted by Nick Allen <ni...@nickallen.org>.
Having the user manipulate timezone offsets is tricky too.  The particular
offset for any one time zone can change throughout the year based on
daylight savings, for example.

It might be easier to allow the user to indicate the timezone (EDT, CDT,
etc) and then have logic in the platform that knows how to convert to UTC
at any given point in time.

On Fri, May 27, 2016 at 1:39 PM, Puzio, Domenic <
Domenic.Puzio@capitalone.com> wrote:

> For me, I think this work needs to address two situations:
>
>
>
> 1. Data that is always coming from the same timezone. If a data source
> always has its timestamp in EST, then we need to add a static offset to
> each timestamp to make it UTC. Perhaps this could be a variable in the
> parser configs; I believe some parsers already use a “withTimezone”
> configuration method.
>
> 2. Data that comes in from multiple timezones. This is the trickier case;
> we want to add or subtract an offset to get the timestamp to UTC, but this
> offset could be different from record to record. We could compare the log’s
> timestamp to the system timestamp to get a guess at the log’s timezone, but
> I’m not sure how reliable and efficient this would be.
>
> Let me know what you all think about these two cases.
>
> Domenic
>
> On 5/27/16, 1:29 PM, "Kumar, Sunny" <Su...@capitalone.com> wrote:
>
> >Hi all
> >
> >Here at Capital One we want to standardize the timestamp of each
> telemetry tuple to UTC and we think this might be useful for Metron in
> general too.
> >If you think so too then we'll be happy to contribute this back to
> Metron. And again if so, I would like to have a quick sort-of architecture
> discussion for the use-case.
> >Please let me know what do you guys think?
> >
> >Thanks,
> >Sunny
> >________________________________________________________
> >
> >The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
> ________________________________________________________
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>



-- 
Nick Allen <ni...@nickallen.org>

Re: Standardizing the timestamp in the parsed output of Metron

Posted by Nick Allen <ni...@nickallen.org>.
>
> 2. Data that comes in from multiple timezones. This is the trickier case;
> we want to add or subtract an offset to get the timestamp to UTC, but this
> offset could be different from record to record. We could compare the log’s
> timestamp to the system timestamp to get a guess at the log’s timezone, but
> I’m not sure how reliable and efficient this would be.


I completely agree that this will be a challenge that we should solve.  Do
you have a specific example that you could bring up?  This might help
provide some structure around the solution.

An alternative approach would be to use another field in the data to
indicate the offset. For example, maybe I have a location indicator in the
data that can help me get at the right offset.

if state = OH then offset = -4

if state = WI then offset = -5


Something a little deeper to think about is whether an enrichment source
could be used to solve this problem for more challenging scenarios.
Imagine we have a data source with no location indicators, but we do have
an IP.  We could do a geoip lookup on the IP and then use the geo enriched
data to then determine the correct offset as in the example above.

Re: Standardizing the timestamp in the parsed output of Metron

Posted by "Zeolla@GMail.com" <ze...@gmail.com>.
In addition it would need to add a set default timezone before processing
untagged data sources such as anything using RFC 3164.

On Fri, May 27, 2016, 13:40 Puzio, Domenic <Do...@capitalone.com>
wrote:

> For me, I think this work needs to address two situations:
>
>
>
> 1. Data that is always coming from the same timezone. If a data source
> always has its timestamp in EST, then we need to add a static offset to
> each timestamp to make it UTC. Perhaps this could be a variable in the
> parser configs; I believe some parsers already use a “withTimezone”
> configuration method.
>
> 2. Data that comes in from multiple timezones. This is the trickier case;
> we want to add or subtract an offset to get the timestamp to UTC, but this
> offset could be different from record to record. We could compare the log’s
> timestamp to the system timestamp to get a guess at the log’s timezone, but
> I’m not sure how reliable and efficient this would be.
>
> Let me know what you all think about these two cases.
>
> Domenic
>
> On 5/27/16, 1:29 PM, "Kumar, Sunny" <Su...@capitalone.com> wrote:
>
> >Hi all
> >
> >Here at Capital One we want to standardize the timestamp of each
> telemetry tuple to UTC and we think this might be useful for Metron in
> general too.
> >If you think so too then we'll be happy to contribute this back to
> Metron. And again if so, I would like to have a quick sort-of architecture
> discussion for the use-case.
> >Please let me know what do you guys think?
> >
> >Thanks,
> >Sunny
> >________________________________________________________
> >
> >The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
> ________________________________________________________
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>
-- 

Jon

Re: Standardizing the timestamp in the parsed output of Metron

Posted by Nick Allen <ni...@nickallen.org>.
Everyone is going to experience this problem; guarantee it.  Any
contributions you can make to solve this problem will be huge, massive,
epic!  Thank you for opening the discussion.



On Fri, May 27, 2016 at 1:39 PM, Puzio, Domenic <
Domenic.Puzio@capitalone.com> wrote:

> For me, I think this work needs to address two situations:
>
>
>
> 1. Data that is always coming from the same timezone. If a data source
> always has its timestamp in EST, then we need to add a static offset to
> each timestamp to make it UTC. Perhaps this could be a variable in the
> parser configs; I believe some parsers already use a “withTimezone”
> configuration method.
>
> 2. Data that comes in from multiple timezones. This is the trickier case;
> we want to add or subtract an offset to get the timestamp to UTC, but this
> offset could be different from record to record. We could compare the log’s
> timestamp to the system timestamp to get a guess at the log’s timezone, but
> I’m not sure how reliable and efficient this would be.
>
> Let me know what you all think about these two cases.
>
> Domenic
>
> On 5/27/16, 1:29 PM, "Kumar, Sunny" <Su...@capitalone.com> wrote:
>
> >Hi all
> >
> >Here at Capital One we want to standardize the timestamp of each
> telemetry tuple to UTC and we think this might be useful for Metron in
> general too.
> >If you think so too then we'll be happy to contribute this back to
> Metron. And again if so, I would like to have a quick sort-of architecture
> discussion for the use-case.
> >Please let me know what do you guys think?
> >
> >Thanks,
> >Sunny
> >________________________________________________________
> >
> >The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
> ________________________________________________________
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>



-- 
Nick Allen <ni...@nickallen.org>

Re: Standardizing the timestamp in the parsed output of Metron

Posted by "Puzio, Domenic" <Do...@capitalone.com>.
For me, I think this work needs to address two situations:



1. Data that is always coming from the same timezone. If a data source always has its timestamp in EST, then we need to add a static offset to each timestamp to make it UTC. Perhaps this could be a variable in the parser configs; I believe some parsers already use a “withTimezone” configuration method.

2. Data that comes in from multiple timezones. This is the trickier case; we want to add or subtract an offset to get the timestamp to UTC, but this offset could be different from record to record. We could compare the log’s timestamp to the system timestamp to get a guess at the log’s timezone, but I’m not sure how reliable and efficient this would be.

Let me know what you all think about these two cases.

Domenic

On 5/27/16, 1:29 PM, "Kumar, Sunny" <Su...@capitalone.com> wrote:

>Hi all
>
>Here at Capital One we want to standardize the timestamp of each telemetry tuple to UTC and we think this might be useful for Metron in general too.
>If you think so too then we'll be happy to contribute this back to Metron. And again if so, I would like to have a quick sort-of architecture discussion for the use-case.
>Please let me know what do you guys think?
>
>Thanks,
>Sunny
>________________________________________________________
>
>The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.
________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.