You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@metron.apache.org by "Yerex, Tom" <to...@ubc.ca> on 2019/11/01 16:41:51 UTC

Fields with a period/dot in the name

Good day to everyone. I'm working on our own variation of the Geographic Login Outliers use case (https://metron.apache.org/current-book/use-cases/geographic_login_outliers/index.html). I noticed that our fields names arrive with a period in the name, for example "client.ip" and "user.id". 

 

Our internal naming convention is intended to align the data ingestion solution with the Elasticsearch Common Schema. From experience, working with those dots in Elasticsearch is a challenge and it raises the question if we need to handle field names with a dot in a different matter in Metron.

 

In the case of Metron, should we be modifying the field names to replace dots? Can the Metron STELLAR language handle a field name with a dot in it, or are there any special steps required such as surrounding event fields with single or double-quotes in order to properly handle those field names?

 

Thank you,

 

Tom.

 


Re: Fields with a period/dot in the name

Posted by "Yerex, Tom" <to...@ubc.ca>.
Thank you Nick! As near as I can tell we have a working implementation, now I need to put together test data to validate everything.

 

Cheers,

 

Tom.

 

From: Nick Allen <ni...@nickallen.org>
Reply-To: "user@metron.apache.org" <us...@metron.apache.org>
Date: Friday, November 1, 2019 at 11:02 AM
To: "user@metron.apache.org" <us...@metron.apache.org>
Subject: Re: Fields with a period/dot in the name

 

Hi Tom -

 

> In the case of Metron, should we be modifying the field names to replace dots? Can the Metron STELLAR language handle a field name with a dot in it, or are there any special steps required such as surrounding event fields with single or double-quotes in order to properly handle those field names? 

 

I cannot think of any facilities within Metron itself that would have difficulties with periods in field names.

 

 

> I noticed that our fields names arrive with a period in the name, for example "client.ip" and "user.id"... Our internal naming convention is intended to align the data ingestion solution with the Elasticsearch Common Schema. From experience, working with those dots in Elasticsearch is a challenge 

 

You can use Metron to translate the field names however you like.  For example, replace "client.ip" with "client_ip". There are some examples of this in the Parsers documentation here [1].  Looks under the section "fieldTransformation configuration".  

 

 

---

 

[1] https://metron.apache.org/current-book/metron-platform/metron-parsers/index.html

 

 

 

 

 

On Fri, Nov 1, 2019 at 1:21 PM Yerex, Tom <to...@ubc.ca> wrote:

Good day to everyone. I'm working on our own variation of the Geographic Login Outliers use case (https://metron.apache.org/current-book/use-cases/geographic_login_outliers/index.html). I noticed that our fields names arrive with a period in the name, for example "client.ip" and "user.id". 

 

Our internal naming convention is intended to align the data ingestion solution with the Elasticsearch Common Schema. From experience, working with those dots in Elasticsearch is a challenge and it raises the question if we need to handle field names with a dot in a different matter in Metron.

 

In the case of Metron, should we be modifying the field names to replace dots? Can the Metron STELLAR language handle a field name with a dot in it, or are there any special steps required such as surrounding event fields with single or double-quotes in order to properly handle those field names?

 

Thank you,

 

Tom.

 


Re: Fields with a period/dot in the name

Posted by Владимир Михайлов <v....@content-media.ru>.
> The overall approach with Metron has been to flatten nested fields,
rather than deal with deeply nested structures.  Most of the parsers are
built to flatten the data. This presents a "flat" view for all downstream
functionality like Enrichment, Profiling, etc. I assume in your example
that whatever parser you are using is not flattening the data.  Is that
correct?

Yes, we get ready-made JSON from AuditBeat and WinlogBeat and do not plan
to turn them into flat ones. Our goal at Metron is to enrich this data, to
run on TI feeds and profiler. And then index in ElasticSearch.

> Using a map literal would simplify your example a bit.

> system := { 'id': MAP_GET('id', MAP_GET('os', host_id)) }


if I understand correctly, this expression will create a 'system' object
(map) with one single 'id' property. But often it is necessary to make
changes to one of the properties of a more complex object.

> If we complete METRON-2072
<https://issues.apache.org/jira/browse/METRON-2072?jql=text%20~%20%22Stellar%20Map%22>,
adding some syntactic sugar around MAP_PUT/GET, then your example could be
much simpler.

>system := { 'id': host_id['os']['id'] }


It would be great!
Especially if something like this will be implemented:

system['id'] := host_id['os']['id']




сб, 16 нояб. 2019 г. в 02:09, Nick Allen <ni...@nickallen.org>:

> Hi Valdimir -
>
> > Converting ECS to flat json where the fields take the form {"system.id":
> "<value>"} is not a good option, because the very meaning of its use and
> the convenience of the JSON format are lost.
>
> Right, it just depends on your use case.  My hope is that with the
> facilities in Metron, you can manipulate the data in whatever manner works
> best for you.
>
>
> > And with deep nesting, this generally turns into unreadable,
> hard-to-maintain code.
>
> The overall approach with Metron has been to flatten nested fields, rather
> than deal with deeply nested structures.  Most of the parsers are built to
> flatten the data. This presents a "flat" view for all downstream
> functionality like Enrichment, Profiling, etc. I assume in your example
> that whatever parser you are using is not flattening the data.  Is that
> correct?
>
>
> > And now the question: is there a way to easily work with nested JSON in
> Stellar? Deep diving into the documentation and source code has not yet
> given an answer.
> >
> >     system := MAP_PUT('id', MAP_GET('id',MAP_GET('os',host_id)), system)
> >
>
> Using a map literal would simplify your example a bit.
>
> system := { 'id': MAP_GET('id', MAP_GET('os', host_id)) }
>
>
> If we complete METRON-2072
> <https://issues.apache.org/jira/browse/METRON-2072?jql=text%20~%20%22Stellar%20Map%22>,
> adding some syntactic sugar around MAP_PUT/GET, then your example could be
> much simpler.
>
>
> system := { 'id': host_id['os']['id'] }
>
>
>
> > Now this is a fundamentally important issue that affects the moments of
> enrichment, TI, profiling and simply changing data when parsing.
>
> All that being said, I think this highlights one advantage of using a DSL
> like Stellar.  If you do not want to flatten your data, it should be easy
> enough to add whatever Stellar functions might be required to make the
> task simpler.
>
> I hope this helps.
>
>
>
>
>
> On Fri, Nov 15, 2019 at 5:14 AM Vladimir Mikhailov <
> v.mikhailov@content-media.ru> wrote:
>
>> Hi Nick!
>>
>> We, like Tom, plan to use Elastic Common Schema (ECS) to store events in
>> Metron.
>>
>> A feature of ECS is the nesting of JSON objects, and therefore the "
>> system.id" field implies storage in the form {"'system": {"id":
>> "<value>"}}
>>
>> Converting ECS to flat json where the fields take the form {"system.id":
>> "<value>"} is not a good option, because the very meaning of its use and
>> the convenience of the JSON format are lost.
>>
>> Now, in order to work with nested JSON using Stellar, we are forced to
>> use such complex constructs using the MAP_GET and MAP_PUT functions, for
>> example:
>>
>> "fieldTransformations": [
>>                 {
>>                         "output": ["system"],
>>                         "transformation": "STELLAR",
>>                         "config": {
>>                                 "system": "MAP_PUT('id',
>> MAP_GET('id',MAP_GET('os',host_id)), system)"
>>                         }
>>                 }
>>         ]
>>
>> And with deep nesting, this generally turns into unreadable,
>> hard-to-maintain code.
>>
>> And now the question: is there a way to easily work with nested JSON in
>> Stellar? Deep diving into the documentation and source code has not yet
>> given an answer.
>>
>> Now this is a fundamentally important issue that affects the moments of
>> enrichment, TI, profiling and simply changing data when parsing.
>>
>>
>> On 2019/11/01 17:50:29, Nick Allen <ni...@nickallen.org> wrote:
>> > Hi Tom -
>> >
>> > > In the case of Metron, should we be modifying the field names to
>> replace
>> > dots? Can the Metron STELLAR language handle a field name with a dot in
>> it,
>> > or are there any special steps required such as surrounding event fields
>> > with single or double-quotes in order to properly handle those field
>> names?
>> >
>> > I cannot think of any facilities within Metron itself that would have
>> > difficulties with periods in field names.
>> >
>> >
>> > > I noticed that our fields names arrive with a period in the name, for
>> > example "client.ip" and "user.id"... Our internal naming convention is
>> > intended to align the data ingestion solution with the Elasticsearch
>> Common
>> > Schema. From experience, working with those dots in Elasticsearch is a
>> > challenge
>> >
>> > You can use Metron to translate the field names however you like.  For
>> > example, replace "client.ip" with "client_ip". There are some examples
>> of
>> > this in the Parsers documentation here [1]
>> > <
>> https://metron.apache.org/current-book/metron-platform/metron-parsers/index.html
>> >.
>> > Looks under the section "fieldTransformation configuration".
>> >
>> >
>> > ---
>> >
>> > [1]
>> >
>> https://metron.apache.org/current-book/metron-platform/metron-parsers/index.html
>> >
>> >
>> >
>> >
>> >
>> > On Fri, Nov 1, 2019 at 1:21 PM Yerex, Tom <to...@ubc.ca> wrote:
>> >
>> > > Good day to everyone. I'm working on our own variation of the
>> Geographic
>> > > Login Outliers use case (
>> > >
>> https://metron.apache.org/current-book/use-cases/geographic_login_outliers/index.html
>> ).
>> > > I noticed that our fields names arrive with a period in the name, for
>> > > example "client.ip" and "user.id".
>> > >
>> > >
>> > >
>> > > Our internal naming convention is intended to align the data ingestion
>> > > solution with the Elasticsearch Common Schema. From experience,
>> working
>> > > with those dots in Elasticsearch is a challenge and it raises the
>> question
>> > > if we need to handle field names with a dot in a different matter in
>> Metron.
>> > >
>> > >
>> > >
>> > > In the case of Metron, should we be modifying the field names to
>> replace
>> > > dots? Can the Metron STELLAR language handle a field name with a dot
>> in it,
>> > > or are there any special steps required such as surrounding event
>> fields
>> > > with single or double-quotes in order to properly handle those field
>> names?
>> > >
>> > >
>> > >
>> > > Thank you,
>> > >
>> > >
>> > >
>> > > Tom.
>> > >
>> > >
>> > >
>> >
>>
>

-- 
Владимир Михайлов
директор ООО "Контент-Медиа"
8(347)293-48-20 тел/факс

Re: Fields with a period/dot in the name

Posted by Nick Allen <ni...@nickallen.org>.
Hi Valdimir -

> Converting ECS to flat json where the fields take the form {"system.id":
"<value>"} is not a good option, because the very meaning of its use and
the convenience of the JSON format are lost.

Right, it just depends on your use case.  My hope is that with the
facilities in Metron, you can manipulate the data in whatever manner works
best for you.


> And with deep nesting, this generally turns into unreadable,
hard-to-maintain code.

The overall approach with Metron has been to flatten nested fields, rather
than deal with deeply nested structures.  Most of the parsers are built to
flatten the data. This presents a "flat" view for all downstream
functionality like Enrichment, Profiling, etc. I assume in your example
that whatever parser you are using is not flattening the data.  Is that
correct?


> And now the question: is there a way to easily work with nested JSON in
Stellar? Deep diving into the documentation and source code has not yet
given an answer.
>
>     system := MAP_PUT('id', MAP_GET('id',MAP_GET('os',host_id)), system)
>

Using a map literal would simplify your example a bit.

system := { 'id': MAP_GET('id', MAP_GET('os', host_id)) }


If we complete METRON-2072
<https://issues.apache.org/jira/browse/METRON-2072?jql=text%20~%20%22Stellar%20Map%22>,
adding some syntactic sugar around MAP_PUT/GET, then your example could be
much simpler.


system := { 'id': host_id['os']['id'] }



> Now this is a fundamentally important issue that affects the moments of
enrichment, TI, profiling and simply changing data when parsing.

All that being said, I think this highlights one advantage of using a DSL
like Stellar.  If you do not want to flatten your data, it should be easy
enough to add whatever Stellar functions might be required to make the task
 simpler.

I hope this helps.





On Fri, Nov 15, 2019 at 5:14 AM Vladimir Mikhailov <
v.mikhailov@content-media.ru> wrote:

> Hi Nick!
>
> We, like Tom, plan to use Elastic Common Schema (ECS) to store events in
> Metron.
>
> A feature of ECS is the nesting of JSON objects, and therefore the "
> system.id" field implies storage in the form {"'system": {"id":
> "<value>"}}
>
> Converting ECS to flat json where the fields take the form {"system.id":
> "<value>"} is not a good option, because the very meaning of its use and
> the convenience of the JSON format are lost.
>
> Now, in order to work with nested JSON using Stellar, we are forced to use
> such complex constructs using the MAP_GET and MAP_PUT functions, for
> example:
>
> "fieldTransformations": [
>                 {
>                         "output": ["system"],
>                         "transformation": "STELLAR",
>                         "config": {
>                                 "system": "MAP_PUT('id',
> MAP_GET('id',MAP_GET('os',host_id)), system)"
>                         }
>                 }
>         ]
>
> And with deep nesting, this generally turns into unreadable,
> hard-to-maintain code.
>
> And now the question: is there a way to easily work with nested JSON in
> Stellar? Deep diving into the documentation and source code has not yet
> given an answer.
>
> Now this is a fundamentally important issue that affects the moments of
> enrichment, TI, profiling and simply changing data when parsing.
>
>
> On 2019/11/01 17:50:29, Nick Allen <ni...@nickallen.org> wrote:
> > Hi Tom -
> >
> > > In the case of Metron, should we be modifying the field names to
> replace
> > dots? Can the Metron STELLAR language handle a field name with a dot in
> it,
> > or are there any special steps required such as surrounding event fields
> > with single or double-quotes in order to properly handle those field
> names?
> >
> > I cannot think of any facilities within Metron itself that would have
> > difficulties with periods in field names.
> >
> >
> > > I noticed that our fields names arrive with a period in the name, for
> > example "client.ip" and "user.id"... Our internal naming convention is
> > intended to align the data ingestion solution with the Elasticsearch
> Common
> > Schema. From experience, working with those dots in Elasticsearch is a
> > challenge
> >
> > You can use Metron to translate the field names however you like.  For
> > example, replace "client.ip" with "client_ip". There are some examples of
> > this in the Parsers documentation here [1]
> > <
> https://metron.apache.org/current-book/metron-platform/metron-parsers/index.html
> >.
> > Looks under the section "fieldTransformation configuration".
> >
> >
> > ---
> >
> > [1]
> >
> https://metron.apache.org/current-book/metron-platform/metron-parsers/index.html
> >
> >
> >
> >
> >
> > On Fri, Nov 1, 2019 at 1:21 PM Yerex, Tom <to...@ubc.ca> wrote:
> >
> > > Good day to everyone. I'm working on our own variation of the
> Geographic
> > > Login Outliers use case (
> > >
> https://metron.apache.org/current-book/use-cases/geographic_login_outliers/index.html
> ).
> > > I noticed that our fields names arrive with a period in the name, for
> > > example "client.ip" and "user.id".
> > >
> > >
> > >
> > > Our internal naming convention is intended to align the data ingestion
> > > solution with the Elasticsearch Common Schema. From experience, working
> > > with those dots in Elasticsearch is a challenge and it raises the
> question
> > > if we need to handle field names with a dot in a different matter in
> Metron.
> > >
> > >
> > >
> > > In the case of Metron, should we be modifying the field names to
> replace
> > > dots? Can the Metron STELLAR language handle a field name with a dot
> in it,
> > > or are there any special steps required such as surrounding event
> fields
> > > with single or double-quotes in order to properly handle those field
> names?
> > >
> > >
> > >
> > > Thank you,
> > >
> > >
> > >
> > > Tom.
> > >
> > >
> > >
> >
>

Re: Fields with a period/dot in the name

Posted by Vladimir Mikhailov <v....@content-media.ru>.
Hi Nick!

We, like Tom, plan to use Elastic Common Schema (ECS) to store events in Metron.

A feature of ECS is the nesting of JSON objects, and therefore the "system.id" field implies storage in the form {"'system": {"id": "<value>"}}

Converting ECS ​​to flat json where the fields take the form {"system.id": "<value>"} is not a good option, because the very meaning of its use and the convenience of the JSON format are lost.

Now, in order to work with nested JSON using Stellar, we are forced to use such complex constructs using the MAP_GET and MAP_PUT functions, for example:

"fieldTransformations": [
                {
                        "output": ["system"],
                        "transformation": "STELLAR",
                        "config": {
                                "system": "MAP_PUT('id', MAP_GET('id',MAP_GET('os',host_id)), system)"
                        }
                }
        ]

And with deep nesting, this generally turns into unreadable, hard-to-maintain code.

And now the question: is there a way to easily work with nested JSON in Stellar? Deep diving into the documentation and source code has not yet given an answer.

Now this is a fundamentally important issue that affects the moments of enrichment, TI, profiling and simply changing data when parsing.


On 2019/11/01 17:50:29, Nick Allen <ni...@nickallen.org> wrote: 
> Hi Tom -
> 
> > In the case of Metron, should we be modifying the field names to replace
> dots? Can the Metron STELLAR language handle a field name with a dot in it,
> or are there any special steps required such as surrounding event fields
> with single or double-quotes in order to properly handle those field names?
> 
> I cannot think of any facilities within Metron itself that would have
> difficulties with periods in field names.
> 
> 
> > I noticed that our fields names arrive with a period in the name, for
> example "client.ip" and "user.id"... Our internal naming convention is
> intended to align the data ingestion solution with the Elasticsearch Common
> Schema. From experience, working with those dots in Elasticsearch is a
> challenge
> 
> You can use Metron to translate the field names however you like.  For
> example, replace "client.ip" with "client_ip". There are some examples of
> this in the Parsers documentation here [1]
> <https://metron.apache.org/current-book/metron-platform/metron-parsers/index.html>.
> Looks under the section "fieldTransformation configuration".
> 
> 
> ---
> 
> [1]
> https://metron.apache.org/current-book/metron-platform/metron-parsers/index.html
> 
> 
> 
> 
> 
> On Fri, Nov 1, 2019 at 1:21 PM Yerex, Tom <to...@ubc.ca> wrote:
> 
> > Good day to everyone. I'm working on our own variation of the Geographic
> > Login Outliers use case (
> > https://metron.apache.org/current-book/use-cases/geographic_login_outliers/index.html).
> > I noticed that our fields names arrive with a period in the name, for
> > example "client.ip" and "user.id".
> >
> >
> >
> > Our internal naming convention is intended to align the data ingestion
> > solution with the Elasticsearch Common Schema. From experience, working
> > with those dots in Elasticsearch is a challenge and it raises the question
> > if we need to handle field names with a dot in a different matter in Metron.
> >
> >
> >
> > In the case of Metron, should we be modifying the field names to replace
> > dots? Can the Metron STELLAR language handle a field name with a dot in it,
> > or are there any special steps required such as surrounding event fields
> > with single or double-quotes in order to properly handle those field names?
> >
> >
> >
> > Thank you,
> >
> >
> >
> > Tom.
> >
> >
> >
> 

Re: Fields with a period/dot in the name

Posted by Nick Allen <ni...@nickallen.org>.
Hi Tom -

> In the case of Metron, should we be modifying the field names to replace
dots? Can the Metron STELLAR language handle a field name with a dot in it,
or are there any special steps required such as surrounding event fields
with single or double-quotes in order to properly handle those field names?

I cannot think of any facilities within Metron itself that would have
difficulties with periods in field names.


> I noticed that our fields names arrive with a period in the name, for
example "client.ip" and "user.id"... Our internal naming convention is
intended to align the data ingestion solution with the Elasticsearch Common
Schema. From experience, working with those dots in Elasticsearch is a
challenge

You can use Metron to translate the field names however you like.  For
example, replace "client.ip" with "client_ip". There are some examples of
this in the Parsers documentation here [1]
<https://metron.apache.org/current-book/metron-platform/metron-parsers/index.html>.
Looks under the section "fieldTransformation configuration".


---

[1]
https://metron.apache.org/current-book/metron-platform/metron-parsers/index.html





On Fri, Nov 1, 2019 at 1:21 PM Yerex, Tom <to...@ubc.ca> wrote:

> Good day to everyone. I'm working on our own variation of the Geographic
> Login Outliers use case (
> https://metron.apache.org/current-book/use-cases/geographic_login_outliers/index.html).
> I noticed that our fields names arrive with a period in the name, for
> example "client.ip" and "user.id".
>
>
>
> Our internal naming convention is intended to align the data ingestion
> solution with the Elasticsearch Common Schema. From experience, working
> with those dots in Elasticsearch is a challenge and it raises the question
> if we need to handle field names with a dot in a different matter in Metron.
>
>
>
> In the case of Metron, should we be modifying the field names to replace
> dots? Can the Metron STELLAR language handle a field name with a dot in it,
> or are there any special steps required such as surrounding event fields
> with single or double-quotes in order to properly handle those field names?
>
>
>
> Thank you,
>
>
>
> Tom.
>
>
>