You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@metron.apache.org by Simon Elliston Ball <si...@simonellistonball.com> on 2017/12/01 00:59:50 UTC

DISCUSS: Quick change to parser config

I’m looking at the way parser config works, and transformation of field from their native names in, for example the ASA or CEF parsers, into a standard data model. 

At the moment I would do something like this: 

assuming I have fields [ipSrc, ipDst, pointlessExtraStuff, message] I might have:

{
  "fieldTransformations": [
    {
      "transformation": "STELLAR",
      "output": ["ip_src_addr", "ip_dst_addr", "message"],
      "config": {
        "ip_src_addr": "ipSrc",
        "ip_dest_addr": "ipDst"
      }
    }
  ]
}

which leave me with the field set: 
[ipSrc, ipDst, pointlessExtraStuff, message, ip_src_addr, ip_dest_addr]

unless I go with:-

{
  "fieldTransformations": [
    {
      "transformation": "STELLAR",
      "output": ["ip_src_addr", "ip_dst_addr", "message"],
      "config": {
        "ip_src_addr": "ipSrc",
        "ip_dest_addr": "ipDst",
        "pointlessExtraStuff": null,
        "ipSrc": null,
        "ipDst": null
      }
    }
  ]
}

which seems a little over verbose. 

Do you think it would be valuable to add a switch of some sort on the transformation to make it “complete”, i.e. to only preserve fields which are explicitly set. 

To my mind, this breaks a principal of mutability, but gives us much much cleaner mapping of data. 

I would propose something like:

{
  "fieldTransformations": [
    {
      "transformation": "STELLAR",
      "complete": true,
      "output": ["ip_src_addr", "ip_dst_addr", "message"],
      "config": {
        "ip_src_addr": "ipSrc",
        "ip_dest_addr": "ipDst"
      }
    }
  ]
}

which would give me the set ["ip_src_addr", "ip_dst_addr", "message”] effectively making the nulling in my previous example implicit. 

Thoughts? 

Also, in the second scenario, if ‘output' were to be empty would we assume that the output field set should be ["ip_src_addr", “ip_dst_addr”]? 

Simon

Re: DISCUSS: Quick change to parser config

Posted by Otto Fowler <ot...@gmail.com>.

I’m not sure about consensus. I would like to see it summarized.

My point about assignment has to do with how many assignment like operators
we are going to support.  The fact that the assignment is to a variable
that is temporary or not doesn’t need to be part of the grammar/language,
 since all variable management is external in Stellar, that may not be
necessary.



On December 4, 2017 at 13:14:23, Simon Elliston Ball (
simon@simonellistonball.com) wrote:

Personally I suspect that temporary variable is a different thing as is the
assignment PR. Might be useful for intermediate steps in a parser, but then
we’re potentially getting more complex than a parser wants to be. I am
warming to the idea of temporary variables though.

In terms of the removal, I like the idea of the COMPLETE transformation to
express a projection. That makes the output interface of the metron object
more explicit in a parser, which makes governance much easier.

Do we think this is a good consensus? Shall I ticket it (I might even code
it!) in the transformation form proposed?

Simon

On 4 Dec 2017, at 17:21, Casey Stella <ce...@gmail.com> wrote:

So, just chiming in here.  It seems to me that we have a problem with
extraneous fields in a couple of different ways:

* Temporary Variables

I think that the problem of temporary variables is one beyond just the
parser.  What I'd like to see is the Stellar field transformations operate
similar to the enrichment field transformations in that they are no longer
a map (this is useful beyond this case for having multiple assignments for
a variable) and having a special assignment indicator which would indicate
a temporary variable (e.g. ^= instead of :=).  This would clean up some of
the usecases in enrichments as well.  Combine this with the assumption that
all non-temporary fields are included in output for the field
transformation if it is not specified and I think we have something that is
sensible and somewhat backwards compatible.  To wit:
{
 "fieldTransformations": [
   {
     "transformation": "STELLAR",
     "config": [
       "ipSrc ^= TRIM(raw_ip_src)"
       "ip_src_addr := ipSrc"
     ]
   }
 ]
}

* Extraneous Fields from the Parser

For these, we do currently have a REMOVE field transformation, but I'd be
ok with a PROJECT or COMPLETE field transformation to provide a whitelist.
That might look like:
{
 "fieldTransformations": [
   {
     "transformation": "STELLAR",
     "config": [
       "ipSrc ^= TRIM(raw_ip_src)"
       "ip_src_addr := ipSrc"
     ]
   },
    {
     "transformation": "COMPLETE",
     "output" : [ "ip_src_addr", "ip_dst_addr", "message"]
   }
 ]
}

I think having these two treated separately makes sense because sometimes
you will want COMPLETE and sometimes not.  Also, this fits within the core
abstraction that we already have.

On Thu, Nov 30, 2017 at 8:21 PM, Simon Elliston Ball <
simon@simonellistonball.com> wrote:

Hmmm… Actually, I kinda like that.

May want a little refactoring in the back for clarity.

My question about whether we could ever imagine this ‘cleanup policy’
applying to other transforms would sway me to the field rather than
transformation name approach though.

Simon

On 1 Dec 2017, at 01:17, Otto Fowler <ot...@gmail.com> wrote:

Or, we can create new transformation types
STELLAR_COMPLETE, which may be more in line with the original design.



On November 30, 2017 at 20:14:46, Otto Fowler (ottobackwards@gmail.com

<mailto:ottobackwards@gmail.com <ot...@gmail.com>>) wrote:


I would suggest that instead of explicitly having “complete”, we have

“operation”:”complete”


Such that we can have multiple transformations, each with a different

“operation”.

No operation would be the status quo ante, if we can do it so that we

don’t get errors with old configs and the keep same behavior.


{
"fieldTransformations": [
{
"transformation": "STELLAR",
“operation": “complete",
"output": ["ip_src_addr", "ip_dst_addr"],
"config": {
"ip_src_addr": "ipSrc",
"ip_dest_addr": "ipDst"
} ,
{
"transformation": "STELLAR",
“operation": “SomeOtherThing",
"output": [“foo", “bar"],
"config": {
“foo": “TO_UPPER(foo)",
“bar": “TO_LOWER(bar)"
}
}
]
}


Sorry for the junk examples, but hopefully it makes sense.





On November 30, 2017 at 20:00:06, Simon Elliston Ball (

simon@simonellistonball.com <mailto:simon@simonellistonball.com
<si...@simonellistonball.com>>) wrote:


I’m looking at the way parser config works, and transformation of

field from their native names in, for example the ASA or CEF parsers, into
a standard data model.


At the moment I would do something like this:

assuming I have fields [ipSrc, ipDst, pointlessExtraStuff, message] I

might have:


{
"fieldTransformations": [
{
"transformation": "STELLAR",
"output": ["ip_src_addr", "ip_dst_addr", "message"],
"config": {
"ip_src_addr": "ipSrc",
"ip_dest_addr": "ipDst"
}
}
]
}

which leave me with the field set:
[ipSrc, ipDst, pointlessExtraStuff, message, ip_src_addr, ip_dest_addr]

unless I go with:-

{
"fieldTransformations": [
{
"transformation": "STELLAR",
"output": ["ip_src_addr", "ip_dst_addr", "message"],
"config": {
"ip_src_addr": "ipSrc",
"ip_dest_addr": "ipDst",
"pointlessExtraStuff": null,
"ipSrc": null,
"ipDst": null
}
}
]
}

which seems a little over verbose.

Do you think it would be valuable to add a switch of some sort on the

transformation to make it “complete”, i.e. to only preserve fields which
are explicitly set.


To my mind, this breaks a principal of mutability, but gives us much

much cleaner mapping of data.


I would propose something like:

{
"fieldTransformations": [
{
"transformation": "STELLAR",
"complete": true,
"output": ["ip_src_addr", "ip_dst_addr", "message"],
"config": {
"ip_src_addr": "ipSrc",
"ip_dest_addr": "ipDst"
}
}
]
}

which would give me the set ["ip_src_addr", "ip_dst_addr", "message”]

effectively making the nulling in my previous example implicit.


Thoughts?

Also, in the second scenario, if ‘output' were to be empty would we

assume that the output field set should be ["ip_src_addr", “ip_dst_addr”]?


Simon

Re: DISCUSS: Quick change to parser config

Posted by Simon Elliston Ball <si...@simonellistonball.com>.

Personally I suspect that temporary variable is a different thing as is the assignment PR. Might be useful for intermediate steps in a parser, but then we’re potentially getting more complex than a parser wants to be. I am warming to the idea of temporary variables though. 

In terms of the removal, I like the idea of the COMPLETE transformation to express a projection. That makes the output interface of the metron object more explicit in a parser, which makes governance much easier. 

Do we think this is a good consensus? Shall I ticket it (I might even code it!) in the transformation form proposed? 

Simon

> On 4 Dec 2017, at 17:21, Casey Stella <ce...@gmail.com> wrote:
> 
> So, just chiming in here.  It seems to me that we have a problem with
> extraneous fields in a couple of different ways:
> 
> * Temporary Variables
> 
> I think that the problem of temporary variables is one beyond just the
> parser.  What I'd like to see is the Stellar field transformations operate
> similar to the enrichment field transformations in that they are no longer
> a map (this is useful beyond this case for having multiple assignments for
> a variable) and having a special assignment indicator which would indicate
> a temporary variable (e.g. ^= instead of :=).  This would clean up some of
> the usecases in enrichments as well.  Combine this with the assumption that
> all non-temporary fields are included in output for the field
> transformation if it is not specified and I think we have something that is
> sensible and somewhat backwards compatible.  To wit:
> {
>  "fieldTransformations": [
>    {
>      "transformation": "STELLAR",
>      "config": [
>        "ipSrc ^= TRIM(raw_ip_src)"
>        "ip_src_addr := ipSrc"
>      ]
>    }
>  ]
> }
> 
> * Extraneous Fields from the Parser
> 
> For these, we do currently have a REMOVE field transformation, but I'd be
> ok with a PROJECT or COMPLETE field transformation to provide a whitelist.
> That might look like:
> {
>  "fieldTransformations": [
>    {
>      "transformation": "STELLAR",
>      "config": [
>        "ipSrc ^= TRIM(raw_ip_src)"
>        "ip_src_addr := ipSrc"
>      ]
>    },
>     {
>      "transformation": "COMPLETE",
>      "output" : [ "ip_src_addr", "ip_dst_addr", "message"]
>    }
>  ]
> }
> 
> I think having these two treated separately makes sense because sometimes
> you will want COMPLETE and sometimes not.  Also, this fits within the core
> abstraction that we already have.
> 
> On Thu, Nov 30, 2017 at 8:21 PM, Simon Elliston Ball <
> simon@simonellistonball.com <ma...@simonellistonball.com>> wrote:
> 
>> Hmmm… Actually, I kinda like that.
>> 
>> May want a little refactoring in the back for clarity.
>> 
>> My question about whether we could ever imagine this ‘cleanup policy’
>> applying to other transforms would sway me to the field rather than
>> transformation name approach though.
>> 
>> Simon
>> 
>>> On 1 Dec 2017, at 01:17, Otto Fowler <ot...@gmail.com> wrote:
>>> 
>>> Or, we can create new transformation types
>>> STELLAR_COMPLETE, which may be more in line with the original design.
>>> 
>>> 
>>> 
>>> On November 30, 2017 at 20:14:46, Otto Fowler (ottobackwards@gmail.com
>> <mailto:ottobackwards@gmail.com <ma...@gmail.com>>) wrote:
>>> 
>>>> I would suggest that instead of explicitly having “complete”, we have
>> “operation”:”complete”
>>>> 
>>>> Such that we can have multiple transformations, each with a different
>> “operation”.
>>>> No operation would be the status quo ante, if we can do it so that we
>> don’t get errors with old configs and the keep same behavior.
>>>> 
>>>> {
>>>> "fieldTransformations": [
>>>> {
>>>> "transformation": "STELLAR",
>>>> “operation": “complete",
>>>> "output": ["ip_src_addr", "ip_dst_addr"],
>>>> "config": {
>>>> "ip_src_addr": "ipSrc",
>>>> "ip_dest_addr": "ipDst"
>>>> } ,
>>>> {
>>>> "transformation": "STELLAR",
>>>> “operation": “SomeOtherThing",
>>>> "output": [“foo", “bar"],
>>>> "config": {
>>>> “foo": “TO_UPPER(foo)",
>>>> “bar": “TO_LOWER(bar)"
>>>> }
>>>> }
>>>> ]
>>>> }
>>>> 
>>>> 
>>>> Sorry for the junk examples, but hopefully it makes sense.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On November 30, 2017 at 20:00:06, Simon Elliston Ball (
>> simon@simonellistonball.com <ma...@simonellistonball.com> <mailto:simon@simonellistonball.com <ma...@simonellistonball.com>>) wrote:
>>>> 
>>>>> I’m looking at the way parser config works, and transformation of
>> field from their native names in, for example the ASA or CEF parsers, into
>> a standard data model.
>>>>> 
>>>>> At the moment I would do something like this:
>>>>> 
>>>>> assuming I have fields [ipSrc, ipDst, pointlessExtraStuff, message] I
>> might have:
>>>>> 
>>>>> {
>>>>> "fieldTransformations": [
>>>>> {
>>>>> "transformation": "STELLAR",
>>>>> "output": ["ip_src_addr", "ip_dst_addr", "message"],
>>>>> "config": {
>>>>> "ip_src_addr": "ipSrc",
>>>>> "ip_dest_addr": "ipDst"
>>>>> }
>>>>> }
>>>>> ]
>>>>> }
>>>>> 
>>>>> which leave me with the field set:
>>>>> [ipSrc, ipDst, pointlessExtraStuff, message, ip_src_addr, ip_dest_addr]
>>>>> 
>>>>> unless I go with:-
>>>>> 
>>>>> {
>>>>> "fieldTransformations": [
>>>>> {
>>>>> "transformation": "STELLAR",
>>>>> "output": ["ip_src_addr", "ip_dst_addr", "message"],
>>>>> "config": {
>>>>> "ip_src_addr": "ipSrc",
>>>>> "ip_dest_addr": "ipDst",
>>>>> "pointlessExtraStuff": null,
>>>>> "ipSrc": null,
>>>>> "ipDst": null
>>>>> }
>>>>> }
>>>>> ]
>>>>> }
>>>>> 
>>>>> which seems a little over verbose.
>>>>> 
>>>>> Do you think it would be valuable to add a switch of some sort on the
>> transformation to make it “complete”, i.e. to only preserve fields which
>> are explicitly set.
>>>>> 
>>>>> To my mind, this breaks a principal of mutability, but gives us much
>> much cleaner mapping of data.
>>>>> 
>>>>> I would propose something like:
>>>>> 
>>>>> {
>>>>> "fieldTransformations": [
>>>>> {
>>>>> "transformation": "STELLAR",
>>>>> "complete": true,
>>>>> "output": ["ip_src_addr", "ip_dst_addr", "message"],
>>>>> "config": {
>>>>> "ip_src_addr": "ipSrc",
>>>>> "ip_dest_addr": "ipDst"
>>>>> }
>>>>> }
>>>>> ]
>>>>> }
>>>>> 
>>>>> which would give me the set ["ip_src_addr", "ip_dst_addr", "message”]
>> effectively making the nulling in my previous example implicit.
>>>>> 
>>>>> Thoughts?
>>>>> 
>>>>> Also, in the second scenario, if ‘output' were to be empty would we
>> assume that the output field set should be ["ip_src_addr", “ip_dst_addr”]?
>>>>> 
>>>>> Simon

Re: DISCUSS: Quick change to parser config

Posted by Otto Fowler <ot...@gmail.com>.

Would https://github.com/apache/metron/pull/687 play some role in this?
Or could it be made to?


On December 4, 2017 at 12:21:40, Casey Stella (cestella@gmail.com) wrote:

So, just chiming in here.  It seems to me that we have a problem with
extraneous fields in a couple of different ways:

* Temporary Variables

I think that the problem of temporary variables is one beyond just the
parser.  What I'd like to see is the Stellar field transformations operate
similar to the enrichment field transformations in that they are no longer
a map (this is useful beyond this case for having multiple assignments for
a variable) and having a special assignment indicator which would indicate
a temporary variable (e.g. ^= instead of :=).  This would clean up some of
the usecases in enrichments as well.  Combine this with the assumption that
all non-temporary fields are included in output for the field
transformation if it is not specified and I think we have something that is
sensible and somewhat backwards compatible.  To wit:
{
  "fieldTransformations": [
    {
      "transformation": "STELLAR",
      "config": [
        "ipSrc ^= TRIM(raw_ip_src)"
        "ip_src_addr := ipSrc"
      ]
    }
  ]
}

* Extraneous Fields from the Parser

For these, we do currently have a REMOVE field transformation, but I'd be
ok with a PROJECT or COMPLETE field transformation to provide a whitelist.
That might look like:
{
  "fieldTransformations": [
    {
      "transformation": "STELLAR",
      "config": [
        "ipSrc ^= TRIM(raw_ip_src)"
        "ip_src_addr := ipSrc"
      ]
    },
     {
      "transformation": "COMPLETE",
      "output" : [ "ip_src_addr", "ip_dst_addr", "message"]
    }
  ]
}

I think having these two treated separately makes sense because sometimes
you will want COMPLETE and sometimes not.  Also, this fits within the core
abstraction that we already have.

On Thu, Nov 30, 2017 at 8:21 PM, Simon Elliston Ball <
simon@simonellistonball.com> wrote:

> Hmmm… Actually, I kinda like that.
>
> May want a little refactoring in the back for clarity.
>
> My question about whether we could ever imagine this ‘cleanup policy’
> applying to other transforms would sway me to the field rather than
> transformation name approach though.
>
> Simon
>
> > On 1 Dec 2017, at 01:17, Otto Fowler <ot...@gmail.com> wrote:
> >
> > Or, we can create new transformation types
> > STELLAR_COMPLETE, which may be more in line with the original design.
> >
> >
> >
> > On November 30, 2017 at 20:14:46, Otto Fowler (ottobackwards@gmail.com
> <ma...@gmail.com>) wrote:
> >
> >> I would suggest that instead of explicitly having “complete”, we have
> “operation”:”complete”
> >>
> >> Such that we can have multiple transformations, each with a different
> “operation”.
> >> No operation would be the status quo ante, if we can do it so that we
> don’t get errors with old configs and the keep same behavior.
> >>
> >> {
> >> "fieldTransformations": [
> >> {
> >> "transformation": "STELLAR",
> >> “operation": “complete",
> >> "output": ["ip_src_addr", "ip_dst_addr"],
> >> "config": {
> >> "ip_src_addr": "ipSrc",
> >> "ip_dest_addr": "ipDst"
> >> } ,
> >> {
> >> "transformation": "STELLAR",
> >> “operation": “SomeOtherThing",
> >> "output": [“foo", “bar"],
> >> "config": {
> >> “foo": “TO_UPPER(foo)",
> >> “bar": “TO_LOWER(bar)"
> >> }
> >> }
> >> ]
> >> }
> >>
> >>
> >> Sorry for the junk examples, but hopefully it makes sense.
> >>
> >>
> >>
> >>
> >>
> >> On November 30, 2017 at 20:00:06, Simon Elliston Ball (
> simon@simonellistonball.com <ma...@simonellistonball.com>) wrote:
> >>
> >>> I’m looking at the way parser config works, and transformation of
> field from their native names in, for example the ASA or CEF parsers, into
> a standard data model.
> >>>
> >>> At the moment I would do something like this:
> >>>
> >>> assuming I have fields [ipSrc, ipDst, pointlessExtraStuff, message] I
> might have:
> >>>
> >>> {
> >>> "fieldTransformations": [
> >>> {
> >>> "transformation": "STELLAR",
> >>> "output": ["ip_src_addr", "ip_dst_addr", "message"],
> >>> "config": {
> >>> "ip_src_addr": "ipSrc",
> >>> "ip_dest_addr": "ipDst"
> >>> }
> >>> }
> >>> ]
> >>> }
> >>>
> >>> which leave me with the field set:
> >>> [ipSrc, ipDst, pointlessExtraStuff, message, ip_src_addr, ip_dest_addr]
> >>>
> >>> unless I go with:-
> >>>
> >>> {
> >>> "fieldTransformations": [
> >>> {
> >>> "transformation": "STELLAR",
> >>> "output": ["ip_src_addr", "ip_dst_addr", "message"],
> >>> "config": {
> >>> "ip_src_addr": "ipSrc",
> >>> "ip_dest_addr": "ipDst",
> >>> "pointlessExtraStuff": null,
> >>> "ipSrc": null,
> >>> "ipDst": null
> >>> }
> >>> }
> >>> ]
> >>> }
> >>>
> >>> which seems a little over verbose.
> >>>
> >>> Do you think it would be valuable to add a switch of some sort on the
> transformation to make it “complete”, i.e. to only preserve fields which
> are explicitly set.
> >>>
> >>> To my mind, this breaks a principal of mutability, but gives us much
> much cleaner mapping of data.
> >>>
> >>> I would propose something like:
> >>>
> >>> {
> >>> "fieldTransformations": [
> >>> {
> >>> "transformation": "STELLAR",
> >>> "complete": true,
> >>> "output": ["ip_src_addr", "ip_dst_addr", "message"],
> >>> "config": {
> >>> "ip_src_addr": "ipSrc",
> >>> "ip_dest_addr": "ipDst"
> >>> }
> >>> }
> >>> ]
> >>> }
> >>>
> >>> which would give me the set ["ip_src_addr", "ip_dst_addr", "message”]
> effectively making the nulling in my previous example implicit.
> >>>
> >>> Thoughts?
> >>>
> >>> Also, in the second scenario, if ‘output' were to be empty would we
> assume that the output field set should be ["ip_src_addr", “ip_dst_addr”]?
> >>>
> >>> Simon
>
>

Re: DISCUSS: Quick change to parser config

Posted by Casey Stella <ce...@gmail.com>.

So, just chiming in here.  It seems to me that we have a problem with
extraneous fields in a couple of different ways:

* Temporary Variables

I think that the problem of temporary variables is one beyond just the
parser.  What I'd like to see is the Stellar field transformations operate
similar to the enrichment field transformations in that they are no longer
a map (this is useful beyond this case for having multiple assignments for
a variable) and having a special assignment indicator which would indicate
a temporary variable (e.g. ^= instead of :=).  This would clean up some of
the usecases in enrichments as well.  Combine this with the assumption that
all non-temporary fields are included in output for the field
transformation if it is not specified and I think we have something that is
sensible and somewhat backwards compatible.  To wit:
{
  "fieldTransformations": [
    {
      "transformation": "STELLAR",
      "config": [
        "ipSrc ^= TRIM(raw_ip_src)"
        "ip_src_addr := ipSrc"
      ]
    }
  ]
}

* Extraneous Fields from the Parser

For these, we do currently have a REMOVE field transformation, but I'd be
ok with a PROJECT or COMPLETE field transformation to provide a whitelist.
That might look like:
{
  "fieldTransformations": [
    {
      "transformation": "STELLAR",
      "config": [
        "ipSrc ^= TRIM(raw_ip_src)"
        "ip_src_addr := ipSrc"
      ]
    },
     {
      "transformation": "COMPLETE",
      "output" : [ "ip_src_addr", "ip_dst_addr", "message"]
    }
  ]
}

I think having these two treated separately makes sense because sometimes
you will want COMPLETE and sometimes not.  Also, this fits within the core
abstraction that we already have.

On Thu, Nov 30, 2017 at 8:21 PM, Simon Elliston Ball <
simon@simonellistonball.com> wrote:

> Hmmm… Actually, I kinda like that.
>
> May want a little refactoring in the back for clarity.
>
> My question about whether we could ever imagine this ‘cleanup policy’
> applying to other transforms would sway me to the field rather than
> transformation name approach though.
>
> Simon
>
> > On 1 Dec 2017, at 01:17, Otto Fowler <ot...@gmail.com> wrote:
> >
> > Or, we can create new transformation types
> > STELLAR_COMPLETE, which may be more in line with the original design.
> >
> >
> >
> > On November 30, 2017 at 20:14:46, Otto Fowler (ottobackwards@gmail.com
> <ma...@gmail.com>) wrote:
> >
> >> I would suggest that instead of explicitly having “complete”, we have
> “operation”:”complete”
> >>
> >> Such that we can have multiple transformations, each with a different
> “operation”.
> >> No operation would be the status quo ante, if we can do it so that we
> don’t get errors with old configs and the keep same behavior.
> >>
> >> {
> >> "fieldTransformations": [
> >> {
> >> "transformation": "STELLAR",
> >> “operation": “complete",
> >> "output": ["ip_src_addr", "ip_dst_addr"],
> >> "config": {
> >> "ip_src_addr": "ipSrc",
> >> "ip_dest_addr": "ipDst"
> >> } ,
> >> {
> >> "transformation": "STELLAR",
> >> “operation": “SomeOtherThing",
> >> "output": [“foo", “bar"],
> >> "config": {
> >> “foo": “TO_UPPER(foo)",
> >> “bar": “TO_LOWER(bar)"
> >> }
> >> }
> >> ]
> >> }
> >>
> >>
> >> Sorry for the junk examples, but hopefully it makes sense.
> >>
> >>
> >>
> >>
> >>
> >> On November 30, 2017 at 20:00:06, Simon Elliston Ball (
> simon@simonellistonball.com <ma...@simonellistonball.com>) wrote:
> >>
> >>> I’m looking at the way parser config works, and transformation of
> field from their native names in, for example the ASA or CEF parsers, into
> a standard data model.
> >>>
> >>> At the moment I would do something like this:
> >>>
> >>> assuming I have fields [ipSrc, ipDst, pointlessExtraStuff, message] I
> might have:
> >>>
> >>> {
> >>> "fieldTransformations": [
> >>> {
> >>> "transformation": "STELLAR",
> >>> "output": ["ip_src_addr", "ip_dst_addr", "message"],
> >>> "config": {
> >>> "ip_src_addr": "ipSrc",
> >>> "ip_dest_addr": "ipDst"
> >>> }
> >>> }
> >>> ]
> >>> }
> >>>
> >>> which leave me with the field set:
> >>> [ipSrc, ipDst, pointlessExtraStuff, message, ip_src_addr, ip_dest_addr]
> >>>
> >>> unless I go with:-
> >>>
> >>> {
> >>> "fieldTransformations": [
> >>> {
> >>> "transformation": "STELLAR",
> >>> "output": ["ip_src_addr", "ip_dst_addr", "message"],
> >>> "config": {
> >>> "ip_src_addr": "ipSrc",
> >>> "ip_dest_addr": "ipDst",
> >>> "pointlessExtraStuff": null,
> >>> "ipSrc": null,
> >>> "ipDst": null
> >>> }
> >>> }
> >>> ]
> >>> }
> >>>
> >>> which seems a little over verbose.
> >>>
> >>> Do you think it would be valuable to add a switch of some sort on the
> transformation to make it “complete”, i.e. to only preserve fields which
> are explicitly set.
> >>>
> >>> To my mind, this breaks a principal of mutability, but gives us much
> much cleaner mapping of data.
> >>>
> >>> I would propose something like:
> >>>
> >>> {
> >>> "fieldTransformations": [
> >>> {
> >>> "transformation": "STELLAR",
> >>> "complete": true,
> >>> "output": ["ip_src_addr", "ip_dst_addr", "message"],
> >>> "config": {
> >>> "ip_src_addr": "ipSrc",
> >>> "ip_dest_addr": "ipDst"
> >>> }
> >>> }
> >>> ]
> >>> }
> >>>
> >>> which would give me the set ["ip_src_addr", "ip_dst_addr", "message”]
> effectively making the nulling in my previous example implicit.
> >>>
> >>> Thoughts?
> >>>
> >>> Also, in the second scenario, if ‘output' were to be empty would we
> assume that the output field set should be ["ip_src_addr", “ip_dst_addr”]?
> >>>
> >>> Simon
>
>

Re: DISCUSS: Quick change to parser config

Posted by Simon Elliston Ball <si...@simonellistonball.com>.

Hmmm… Actually, I kinda like that. 

May want a little refactoring in the back for clarity. 

My question about whether we could ever imagine this ‘cleanup policy’ applying to other transforms would sway me to the field rather than transformation name approach though. 

Simon

> On 1 Dec 2017, at 01:17, Otto Fowler <ot...@gmail.com> wrote:
> 
> Or, we can create new transformation types
> STELLAR_COMPLETE, which may be more in line with the original design.
> 
> 
> 
> On November 30, 2017 at 20:14:46, Otto Fowler (ottobackwards@gmail.com <ma...@gmail.com>) wrote:
> 
>> I would suggest that instead of explicitly having “complete”, we have “operation”:”complete”
>> 
>> Such that we can have multiple transformations, each with a different “operation”.
>> No operation would be the status quo ante, if we can do it so that we don’t get errors with old configs and the keep same behavior.
>> 
>> { 
>> "fieldTransformations": [ 
>> { 
>> "transformation": "STELLAR", 
>> “operation": “complete", 
>> "output": ["ip_src_addr", "ip_dst_addr"], 
>> "config": { 
>> "ip_src_addr": "ipSrc", 
>> "ip_dest_addr": "ipDst" 
>> } ,
>> { 
>> "transformation": "STELLAR", 
>> “operation": “SomeOtherThing", 
>> "output": [“foo", “bar"], 
>> "config": { 
>> “foo": “TO_UPPER(foo)", 
>> “bar": “TO_LOWER(bar)" 
>> } 
>> } 
>> ] 
>> } 
>> 
>> 
>> Sorry for the junk examples, but hopefully it makes sense.
>> 
>> 
>> 
>> 
>> 
>> On November 30, 2017 at 20:00:06, Simon Elliston Ball (simon@simonellistonball.com <ma...@simonellistonball.com>) wrote:
>> 
>>> I’m looking at the way parser config works, and transformation of field from their native names in, for example the ASA or CEF parsers, into a standard data model.
>>> 
>>> At the moment I would do something like this:
>>> 
>>> assuming I have fields [ipSrc, ipDst, pointlessExtraStuff, message] I might have:
>>> 
>>> {
>>> "fieldTransformations": [
>>> {
>>> "transformation": "STELLAR",
>>> "output": ["ip_src_addr", "ip_dst_addr", "message"],
>>> "config": {
>>> "ip_src_addr": "ipSrc",
>>> "ip_dest_addr": "ipDst"
>>> }
>>> }
>>> ]
>>> }
>>> 
>>> which leave me with the field set:
>>> [ipSrc, ipDst, pointlessExtraStuff, message, ip_src_addr, ip_dest_addr]
>>> 
>>> unless I go with:-
>>> 
>>> {
>>> "fieldTransformations": [
>>> {
>>> "transformation": "STELLAR",
>>> "output": ["ip_src_addr", "ip_dst_addr", "message"],
>>> "config": {
>>> "ip_src_addr": "ipSrc",
>>> "ip_dest_addr": "ipDst",
>>> "pointlessExtraStuff": null,
>>> "ipSrc": null,
>>> "ipDst": null
>>> }
>>> }
>>> ]
>>> }
>>> 
>>> which seems a little over verbose.
>>> 
>>> Do you think it would be valuable to add a switch of some sort on the transformation to make it “complete”, i.e. to only preserve fields which are explicitly set.
>>> 
>>> To my mind, this breaks a principal of mutability, but gives us much much cleaner mapping of data.
>>> 
>>> I would propose something like:
>>> 
>>> {
>>> "fieldTransformations": [
>>> {
>>> "transformation": "STELLAR",
>>> "complete": true,
>>> "output": ["ip_src_addr", "ip_dst_addr", "message"],
>>> "config": {
>>> "ip_src_addr": "ipSrc",
>>> "ip_dest_addr": "ipDst"
>>> }
>>> }
>>> ]
>>> }
>>> 
>>> which would give me the set ["ip_src_addr", "ip_dst_addr", "message”] effectively making the nulling in my previous example implicit.
>>> 
>>> Thoughts?
>>> 
>>> Also, in the second scenario, if ‘output' were to be empty would we assume that the output field set should be ["ip_src_addr", “ip_dst_addr”]?
>>> 
>>> Simon

Re: DISCUSS: Quick change to parser config

Posted by Otto Fowler <ot...@gmail.com>.

Or, we can create new transformation types
STELLAR_COMPLETE, which may be more in line with the original design.



On November 30, 2017 at 20:14:46, Otto Fowler (ottobackwards@gmail.com)
wrote:

I would suggest that instead of explicitly having “complete”, we have
“operation”:”complete”

Such that we can have multiple transformations, each with a different
“operation”.
No operation would be the status quo ante, if we can do it so that we don’t
get errors with old configs and the keep same behavior.

{
"fieldTransformations": [
{
"transformation": "STELLAR",
“operation": “complete",
"output": ["ip_src_addr", "ip_dst_addr"],
"config": {
"ip_src_addr": "ipSrc",
"ip_dest_addr": "ipDst"
} ,
{
"transformation": "STELLAR",
“operation": “SomeOtherThing",
"output": [“foo", “bar"],
"config": {
“foo": “TO_UPPER(foo)",
“bar": “TO_LOWER(bar)"
}
}
]
}


Sorry for the junk examples, but hopefully it makes sense.




On November 30, 2017 at 20:00:06, Simon Elliston Ball (
simon@simonellistonball.com) wrote:

I’m looking at the way parser config works, and transformation of field
from their native names in, for example the ASA or CEF parsers, into a
standard data model.

At the moment I would do something like this:

assuming I have fields [ipSrc, ipDst, pointlessExtraStuff, message] I might
have:

{
"fieldTransformations": [
{
"transformation": "STELLAR",
"output": ["ip_src_addr", "ip_dst_addr", "message"],
"config": {
"ip_src_addr": "ipSrc",
"ip_dest_addr": "ipDst"
}
}
]
}

which leave me with the field set:
[ipSrc, ipDst, pointlessExtraStuff, message, ip_src_addr, ip_dest_addr]

unless I go with:-

{
"fieldTransformations": [
{
"transformation": "STELLAR",
"output": ["ip_src_addr", "ip_dst_addr", "message"],
"config": {
"ip_src_addr": "ipSrc",
"ip_dest_addr": "ipDst",
"pointlessExtraStuff": null,
"ipSrc": null,
"ipDst": null
}
}
]
}

which seems a little over verbose.

Do you think it would be valuable to add a switch of some sort on the
transformation to make it “complete”, i.e. to only preserve fields which
are explicitly set.

To my mind, this breaks a principal of mutability, but gives us much much
cleaner mapping of data.

I would propose something like:

{
"fieldTransformations": [
{
"transformation": "STELLAR",
"complete": true,
"output": ["ip_src_addr", "ip_dst_addr", "message"],
"config": {
"ip_src_addr": "ipSrc",
"ip_dest_addr": "ipDst"
}
}
]
}

which would give me the set ["ip_src_addr", "ip_dst_addr", "message”]
effectively making the nulling in my previous example implicit.

Thoughts?

Also, in the second scenario, if ‘output' were to be empty would we assume
that the output field set should be ["ip_src_addr", “ip_dst_addr”]?

Simon

Re: DISCUSS: Quick change to parser config

Posted by Simon Elliston Ball <si...@simonellistonball.com>.

Do you have any thoughts on what these other operations might be? 

What I’m imagining is something that basically specifies a policy on how to handle things that the transformation block does not explicitly handle. Right now, we just leave them along and they flow through. 

Would “policy”: “explicit”, or “policy”: “onlyExplict” make sense and give the flex? 

To my mind “operation” implies further transformation, which would just be another block, no? 

Maybe it’s just semantic pedantry on my part… would we see this sort of policy logic applying to other transformations? It doesn’t really make sense for “remove”, and well… who cares about any of the other legacy transforms now we have Stellar :) 

Simon

> On 1 Dec 2017, at 01:14, Otto Fowler <ot...@gmail.com> wrote:
> 
> I would suggest that instead of explicitly having “complete”, we have “operation”:”complete”
> 
> Such that we can have multiple transformations, each with a different “operation”.
> No operation would be the status quo ante, if we can do it so that we don’t get errors with old configs and the keep same behavior.
> 
> { 
> "fieldTransformations": [ 
> { 
> "transformation": "STELLAR", 
> “operation": “complete", 
> "output": ["ip_src_addr", "ip_dst_addr"], 
> "config": { 
> "ip_src_addr": "ipSrc", 
> "ip_dest_addr": "ipDst" 
> } ,
> { 
> "transformation": "STELLAR", 
> “operation": “SomeOtherThing", 
> "output": [“foo", “bar"], 
> "config": { 
> “foo": “TO_UPPER(foo)", 
> “bar": “TO_LOWER(bar)" 
> } 
> } 
> ] 
> } 
> 
> 
> Sorry for the junk examples, but hopefully it makes sense.
> 
> 
> 
> 
> 
> On November 30, 2017 at 20:00:06, Simon Elliston Ball (simon@simonellistonball.com <ma...@simonellistonball.com>) wrote:
> 
>> I’m looking at the way parser config works, and transformation of field from their native names in, for example the ASA or CEF parsers, into a standard data model.  
>> 
>> At the moment I would do something like this:  
>> 
>> assuming I have fields [ipSrc, ipDst, pointlessExtraStuff, message] I might have: 
>> 
>> { 
>> "fieldTransformations": [ 
>> { 
>> "transformation": "STELLAR", 
>> "output": ["ip_src_addr", "ip_dst_addr", "message"], 
>> "config": { 
>> "ip_src_addr": "ipSrc", 
>> "ip_dest_addr": "ipDst" 
>> } 
>> } 
>> ] 
>> } 
>> 
>> which leave me with the field set:  
>> [ipSrc, ipDst, pointlessExtraStuff, message, ip_src_addr, ip_dest_addr] 
>> 
>> unless I go with:- 
>> 
>> { 
>> "fieldTransformations": [ 
>> { 
>> "transformation": "STELLAR", 
>> "output": ["ip_src_addr", "ip_dst_addr", "message"], 
>> "config": { 
>> "ip_src_addr": "ipSrc", 
>> "ip_dest_addr": "ipDst", 
>> "pointlessExtraStuff": null, 
>> "ipSrc": null, 
>> "ipDst": null 
>> } 
>> } 
>> ] 
>> } 
>> 
>> which seems a little over verbose.  
>> 
>> Do you think it would be valuable to add a switch of some sort on the transformation to make it “complete”, i.e. to only preserve fields which are explicitly set.  
>> 
>> To my mind, this breaks a principal of mutability, but gives us much much cleaner mapping of data.  
>> 
>> I would propose something like: 
>> 
>> { 
>> "fieldTransformations": [ 
>> { 
>> "transformation": "STELLAR", 
>> "complete": true, 
>> "output": ["ip_src_addr", "ip_dst_addr", "message"], 
>> "config": { 
>> "ip_src_addr": "ipSrc", 
>> "ip_dest_addr": "ipDst" 
>> } 
>> } 
>> ] 
>> } 
>> 
>> which would give me the set ["ip_src_addr", "ip_dst_addr", "message”] effectively making the nulling in my previous example implicit.  
>> 
>> Thoughts?  
>> 
>> Also, in the second scenario, if ‘output' were to be empty would we assume that the output field set should be ["ip_src_addr", “ip_dst_addr”]?  
>> 
>> Simon

Re: DISCUSS: Quick change to parser config

Posted by Otto Fowler <ot...@gmail.com>.

I would suggest that instead of explicitly having “complete”, we have
“operation”:”complete”

Such that we can have multiple transformations, each with a different
“operation”.
No operation would be the status quo ante, if we can do it so that we don’t
get errors with old configs and the keep same behavior.

{
"fieldTransformations": [
{
"transformation": "STELLAR",
“operation": “complete",
"output": ["ip_src_addr", "ip_dst_addr"],
"config": {
"ip_src_addr": "ipSrc",
"ip_dest_addr": "ipDst"
} ,
{
"transformation": "STELLAR",
“operation": “SomeOtherThing",
"output": [“foo", “bar"],
"config": {
“foo": “TO_UPPER(foo)",
“bar": “TO_LOWER(bar)"
}
}
]
}


Sorry for the junk examples, but hopefully it makes sense.




On November 30, 2017 at 20:00:06, Simon Elliston Ball (
simon@simonellistonball.com) wrote:

I’m looking at the way parser config works, and transformation of field
from their native names in, for example the ASA or CEF parsers, into a
standard data model.

At the moment I would do something like this:

assuming I have fields [ipSrc, ipDst, pointlessExtraStuff, message] I might
have:

{
"fieldTransformations": [
{
"transformation": "STELLAR",
"output": ["ip_src_addr", "ip_dst_addr", "message"],
"config": {
"ip_src_addr": "ipSrc",
"ip_dest_addr": "ipDst"
}
}
]
}

which leave me with the field set:
[ipSrc, ipDst, pointlessExtraStuff, message, ip_src_addr, ip_dest_addr]

unless I go with:-

{
"fieldTransformations": [
{
"transformation": "STELLAR",
"output": ["ip_src_addr", "ip_dst_addr", "message"],
"config": {
"ip_src_addr": "ipSrc",
"ip_dest_addr": "ipDst",
"pointlessExtraStuff": null,
"ipSrc": null,
"ipDst": null
}
}
]
}

which seems a little over verbose.

Do you think it would be valuable to add a switch of some sort on the
transformation to make it “complete”, i.e. to only preserve fields which
are explicitly set.

To my mind, this breaks a principal of mutability, but gives us much much
cleaner mapping of data.

I would propose something like:

{
"fieldTransformations": [
{
"transformation": "STELLAR",
"complete": true,
"output": ["ip_src_addr", "ip_dst_addr", "message"],
"config": {
"ip_src_addr": "ipSrc",
"ip_dest_addr": "ipDst"
}
}
]
}

which would give me the set ["ip_src_addr", "ip_dst_addr", "message”]
effectively making the nulling in my previous example implicit.

Thoughts?

Also, in the second scenario, if ‘output' were to be empty would we assume
that the output field set should be ["ip_src_addr", “ip_dst_addr”]?

Simon