You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@metron.apache.org by Ryan Merriman <me...@gmail.com> on 2018/09/07 21:50:07 UTC

[DISCUSS] Internal Metron fields

I recently worked on a PR that involved changing the default behavior of
the ElasticsearchWriter to store data using field names with the default
Metron separator, dots.  One of the unfortunate consequences of this is
that although dots are allowed in more recent versions of ES, it changes
how these fields are stored.  Having a dot in a field name causes ES to
treat it as an object field type.  We're not quite comfortable with this
because it could introduce unforeseen side effects that may not be
obvious.  Here's the PR:  https://github.com/apache/metron/pull/1181

As I worked through it I noticed there are a couple fields that include
separators where it's not actually necessary.  They are not nested by
nature and are internal to Metron.  The fact that they are internal means
they show up in constants and are hardcoded in several different places.
That made the work in the PR above much harder and tedious than it should
have been.  There are 2 in particular that I had to deal with:  source:type
and threat:triage:score in metaalerts.

Is it worth considering converting these to internal Metron fields so that
they stay constant and this isn't a problem in the future?  I could see
these fields following the same pattern as 'metron_alert'.  However this
would cause pain when upgrading because existing data would need to be
updated with these new fields.

Just an idea.  Curious if other have an opinion on the subject.

Re: [DISCUSS] Internal Metron fields

Posted by Ali Nazemian <al...@gmail.com>.
Totally agree with replacing dot with something else. We have had so much
drama to use either dot or column with ORC either via Hive or Spark.
Although we have replaced it with an underscore, it may not be a good idea
as it can be confusing with underscores in the internal field names.

Cheers,
Ali

On Wed, Sep 12, 2018 at 8:18 AM James Sirota <js...@apache.org> wrote:

> I propose that we just disallow having dots in the field name.  Dots seem
> to have a special meaning and as we keep adding data stores we may run into
> some unintended behavior.  We should have logic in our code to check for it
> and either auto-correct it (replace with underscores?) or at least throw an
> error or a warning.
>
> Thanks,
> James
>
> 07.09.2018, 16:33, "Ryan Merriman" <me...@gmail.com>:
> > Internal means it’s not configurable, doesn’t contain our default
> separator (dots) and is namespaced with metron. We can definitely improve
> on DRY but there’s more to it than that. For example, having 2 different
> versions of this field name (ES and Solr) adds a significant amount of
> complexity for no real benefit.
> >
> >>  On Sep 7, 2018, at 5:12 PM, Michael Miklavcic <
> michael.miklavcic@gmail.com> wrote:
> >>
> >>  Can you elaborate on what you mean by "convert to internal?" From your
> >>  description, it looks like the challenge is from our violations of DRY
> when
> >>  it comes to constants referencing those keys, which would be
> eliminated by
> >>  refactoring.
> >>
> >>>  On Fri, Sep 7, 2018, 3:50 PM Ryan Merriman <me...@gmail.com>
> wrote:
> >>>
> >>>  I recently worked on a PR that involved changing the default behavior
> of
> >>>  the ElasticsearchWriter to store data using field names with the
> default
> >>>  Metron separator, dots. One of the unfortunate consequences of this is
> >>>  that although dots are allowed in more recent versions of ES, it
> changes
> >>>  how these fields are stored. Having a dot in a field name causes ES to
> >>>  treat it as an object field type. We're not quite comfortable with
> this
> >>>  because it could introduce unforeseen side effects that may not be
> >>>  obvious. Here's the PR: https://github.com/apache/metron/pull/1181
> >>>
> >>>  As I worked through it I noticed there are a couple fields that
> include
> >>>  separators where it's not actually necessary. They are not nested by
> >>>  nature and are internal to Metron. The fact that they are internal
> means
> >>>  they show up in constants and are hardcoded in several different
> places.
> >>>  That made the work in the PR above much harder and tedious than it
> should
> >>>  have been. There are 2 in particular that I had to deal with:
> source:type
> >>>  and threat:triage:score in metaalerts.
> >>>
> >>>  Is it worth considering converting these to internal Metron fields so
> that
> >>>  they stay constant and this isn't a problem in the future? I could see
> >>>  these fields following the same pattern as 'metron_alert'. However
> this
> >>>  would cause pain when upgrading because existing data would need to be
> >>>  updated with these new fields.
> >>>
> >>>  Just an idea. Curious if other have an opinion on the subject.
>
> -------------------
> Thank you,
>
> James Sirota
> PMC- Apache Metron
> jsirota AT apache DOT org
>
>

-- 
A.Nazemian

Re: [DISCUSS] Internal Metron fields

Posted by James Sirota <js...@apache.org>.
I propose that we just disallow having dots in the field name.  Dots seem to have a special meaning and as we keep adding data stores we may run into some unintended behavior.  We should have logic in our code to check for it and either auto-correct it (replace with underscores?) or at least throw an error or a warning.  

Thanks,
James 

07.09.2018, 16:33, "Ryan Merriman" <me...@gmail.com>:
> Internal means it’s not configurable, doesn’t contain our default separator (dots) and is namespaced with metron. We can definitely improve on DRY but there’s more to it than that. For example, having 2 different versions of this field name (ES and Solr) adds a significant amount of complexity for no real benefit.
>
>>  On Sep 7, 2018, at 5:12 PM, Michael Miklavcic <mi...@gmail.com> wrote:
>>
>>  Can you elaborate on what you mean by "convert to internal?" From your
>>  description, it looks like the challenge is from our violations of DRY when
>>  it comes to constants referencing those keys, which would be eliminated by
>>  refactoring.
>>
>>>  On Fri, Sep 7, 2018, 3:50 PM Ryan Merriman <me...@gmail.com> wrote:
>>>
>>>  I recently worked on a PR that involved changing the default behavior of
>>>  the ElasticsearchWriter to store data using field names with the default
>>>  Metron separator, dots. One of the unfortunate consequences of this is
>>>  that although dots are allowed in more recent versions of ES, it changes
>>>  how these fields are stored. Having a dot in a field name causes ES to
>>>  treat it as an object field type. We're not quite comfortable with this
>>>  because it could introduce unforeseen side effects that may not be
>>>  obvious. Here's the PR: https://github.com/apache/metron/pull/1181
>>>
>>>  As I worked through it I noticed there are a couple fields that include
>>>  separators where it's not actually necessary. They are not nested by
>>>  nature and are internal to Metron. The fact that they are internal means
>>>  they show up in constants and are hardcoded in several different places.
>>>  That made the work in the PR above much harder and tedious than it should
>>>  have been. There are 2 in particular that I had to deal with: source:type
>>>  and threat:triage:score in metaalerts.
>>>
>>>  Is it worth considering converting these to internal Metron fields so that
>>>  they stay constant and this isn't a problem in the future? I could see
>>>  these fields following the same pattern as 'metron_alert'. However this
>>>  would cause pain when upgrading because existing data would need to be
>>>  updated with these new fields.
>>>
>>>  Just an idea. Curious if other have an opinion on the subject.

------------------- 
Thank you,

James Sirota
PMC- Apache Metron
jsirota AT apache DOT org


Re: [DISCUSS] Internal Metron fields

Posted by Ryan Merriman <me...@gmail.com>.
Internal means it’s not configurable, doesn’t contain our default separator (dots) and is namespaced with metron.  We can definitely improve on DRY but there’s more to it than that.  For example, having 2 different versions of this field name (ES and Solr) adds a significant amount of complexity for no real benefit.

> On Sep 7, 2018, at 5:12 PM, Michael Miklavcic <mi...@gmail.com> wrote:
> 
> Can you elaborate on what you mean by "convert to internal?" From your
> description, it looks like the challenge is from our violations of DRY when
> it comes to constants referencing those keys, which would be eliminated by
> refactoring.
> 
>> On Fri, Sep 7, 2018, 3:50 PM Ryan Merriman <me...@gmail.com> wrote:
>> 
>> I recently worked on a PR that involved changing the default behavior of
>> the ElasticsearchWriter to store data using field names with the default
>> Metron separator, dots.  One of the unfortunate consequences of this is
>> that although dots are allowed in more recent versions of ES, it changes
>> how these fields are stored.  Having a dot in a field name causes ES to
>> treat it as an object field type.  We're not quite comfortable with this
>> because it could introduce unforeseen side effects that may not be
>> obvious.  Here's the PR:  https://github.com/apache/metron/pull/1181
>> 
>> As I worked through it I noticed there are a couple fields that include
>> separators where it's not actually necessary.  They are not nested by
>> nature and are internal to Metron.  The fact that they are internal means
>> they show up in constants and are hardcoded in several different places.
>> That made the work in the PR above much harder and tedious than it should
>> have been.  There are 2 in particular that I had to deal with:  source:type
>> and threat:triage:score in metaalerts.
>> 
>> Is it worth considering converting these to internal Metron fields so that
>> they stay constant and this isn't a problem in the future?  I could see
>> these fields following the same pattern as 'metron_alert'.  However this
>> would cause pain when upgrading because existing data would need to be
>> updated with these new fields.
>> 
>> Just an idea.  Curious if other have an opinion on the subject.
>> 

Re: [DISCUSS] Internal Metron fields

Posted by Michael Miklavcic <mi...@gmail.com>.
Can you elaborate on what you mean by "convert to internal?" From your
description, it looks like the challenge is from our violations of DRY when
it comes to constants referencing those keys, which would be eliminated by
refactoring.

On Fri, Sep 7, 2018, 3:50 PM Ryan Merriman <me...@gmail.com> wrote:

> I recently worked on a PR that involved changing the default behavior of
> the ElasticsearchWriter to store data using field names with the default
> Metron separator, dots.  One of the unfortunate consequences of this is
> that although dots are allowed in more recent versions of ES, it changes
> how these fields are stored.  Having a dot in a field name causes ES to
> treat it as an object field type.  We're not quite comfortable with this
> because it could introduce unforeseen side effects that may not be
> obvious.  Here's the PR:  https://github.com/apache/metron/pull/1181
>
> As I worked through it I noticed there are a couple fields that include
> separators where it's not actually necessary.  They are not nested by
> nature and are internal to Metron.  The fact that they are internal means
> they show up in constants and are hardcoded in several different places.
> That made the work in the PR above much harder and tedious than it should
> have been.  There are 2 in particular that I had to deal with:  source:type
> and threat:triage:score in metaalerts.
>
> Is it worth considering converting these to internal Metron fields so that
> they stay constant and this isn't a problem in the future?  I could see
> these fields following the same pattern as 'metron_alert'.  However this
> would cause pain when upgrading because existing data would need to be
> updated with these new fields.
>
> Just an idea.  Curious if other have an opinion on the subject.
>