You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@metron.apache.org by Nick Allen <ni...@nickallen.org> on 2017/02/01 19:11:24 UTC

[Discuss] Improve Alerting

I'd like to explore the functionality that we have in Metron using a
motivating example.  I think this will help highlight some gaps where we
can enhance Metron.

The motivating example is that I would like to create an alert if the
number of inbound flows to any host over a 15 minute interval is abnormal.
I would like the alert to contain the specific information below to
streamline the triage process.

Rule: Abnormal number of inbound flows
Bin: 15 mins
Alert: The host 'powned.svr.bank.com' has '230' inbound flows, exceeding
the threshold of '202'


*What Works*

In some ways, this example is similar to the "Outlier Detection" demo that
I performed with the Profiler a few months back.   We have most of what we
need to do this with a couple caveats.

1. An enrichment would be added to enrich the message with the correct
internal hostname 'powned.svr.bank.com'.

2. With the Profiler, I can capture some idea of what "normal" is for the
number of inbound flows across 15 minute intervals.
3. With Threat Triage, I can create rules that alert when a value exceeds
what the Profiler defines as normal.


*What's Missing*

Its nice to know that we are almost all the way there with this example.
Unfortunately, there are two gaps that fall out of this.

 1. *Threat Triage Transparency*

There is little transparency into the Threat Triage process itself.  When
Threat Triage runs, all I get is a score.  I don't know how that score was
arrived at, which rules were triggered, and the specific values that caused
a rule to trigger.

More specifically, there is no way to generate a message that looks like
"The host 'powned.svr.bank.com' has '230' inbound flows, exceeding the
threshold of '202'".


2. *Triage Calculated Values from the Profiler*

Also, the value being interrogated here, the number of inbound flows, is
not a static value contained within any single telemetry message.  This
value is calculated across multiple messages by the Profiler.  The current
Threat Triage process cannot be used to interrogate values calculated by
the Profiler.


To try and keep this email concise and digestible, I am going to send a
follow-on discussing proposed solutions for each of these separately.

Re: [Discuss] Improve Alerting

Posted by Nick Allen <ni...@nickallen.org>.

*Problem*

Triage Calculated Values from the Profiler

The value being interrogated here, the number of inbound flows, is not a
static value contained within any single telemetry message.  This value is
calculated across multiple messages by the Profiler.  The current Threat
Triage process cannot be used to interrogate values calculated by the
Profiler.

*Proposed Solution*

What I am proposing here is that we treat the Profiler as a source of
telemetry.   The measurements captured by the Profiler would be enqueued
into a Kafka topic.  We would then treat those Profiler messages like any
other telemetry.  We would parse, enrich, triage, and index those messages.

While this isn't fully formed in my head, I am throwing this out there as I
think its rather interesting.  I think this would have the following
advantages.

1.  We would be able to resuse the same threat triage mechanism for values
calculated by the Profiler.

2.  We would be able to generate profiles from the profiled data - aka
meta-profiles anyone?  I am really curious if this helps the MAD use case
at all.

3.  We could also potentially have all the data produced by the Profiler
written to HBase using the same indexing mechanism as all the other
telemetry.

On Wed, Feb 1, 2017 at 2:11 PM, Nick Allen <ni...@nickallen.org> wrote:

> I'd like to explore the functionality that we have in Metron using a
> motivating example.  I think this will help highlight some gaps where we
> can enhance Metron.
>
> The motivating example is that I would like to create an alert if the
> number of inbound flows to any host over a 15 minute interval is abnormal.
> I would like the alert to contain the specific information below to
> streamline the triage process.
>
> Rule: Abnormal number of inbound flows
> Bin: 15 mins
> Alert: The host 'powned.svr.bank.com' has '230' inbound flows, exceeding
> the threshold of '202'
>
>
> *What Works*
>
> In some ways, this example is similar to the "Outlier Detection" demo that
> I performed with the Profiler a few months back.   We have most of what we
> need to do this with a couple caveats.
>
> 1. An enrichment would be added to enrich the message with the correct
> internal hostname 'powned.svr.bank.com'.
>
> 2. With the Profiler, I can capture some idea of what "normal" is for the
> number of inbound flows across 15 minute intervals.
> 3. With Threat Triage, I can create rules that alert when a value exceeds
> what the Profiler defines as normal.
>
>
> *What's Missing*
>
> Its nice to know that we are almost all the way there with this example.
> Unfortunately, there are two gaps that fall out of this.
>
>  1. *Threat Triage Transparency*
>
> There is little transparency into the Threat Triage process itself.  When
> Threat Triage runs, all I get is a score.  I don't know how that score was
> arrived at, which rules were triggered, and the specific values that caused
> a rule to trigger.
>
> More specifically, there is no way to generate a message that looks like
> "The host 'powned.svr.bank.com' has '230' inbound flows, exceeding the
> threshold of '202'".
>
>
> 2. *Triage Calculated Values from the Profiler*
>
> Also, the value being interrogated here, the number of inbound flows, is
> not a static value contained within any single telemetry message.  This
> value is calculated across multiple messages by the Profiler.  The current
> Threat Triage process cannot be used to interrogate values calculated by
> the Profiler.
>
>
> To try and keep this email concise and digestible, I am going to send a
> follow-on discussing proposed solutions for each of these separately.
>
>
>
>
>
>

-- 
Nick Allen <ni...@nickallen.org>

Re: [Discuss] Improve Alerting

Posted by Nick Allen <ni...@nickallen.org>.

Agreed.

What do you think about using the existing Indexing topology to write the
data to HBase for the profiler?

   - The Profiler would have only one output; Kafka.  The Profiler would
   not write to HBase.


   - Since the Profiler is just another source of telemetry, it is parsed,
   enriched, triaged, and then indexed.


   - Thanks to your recent work we can now configure each 'indexer'
   separately, so we would just have an HBase indexer.

Seems like a logical extension of this idea.  Probably a little
overreaching as a first pass.  Maybe that is something we can evolve
towards.



On Wed, Feb 1, 2017 at 2:37 PM, Casey Stella <ce...@gmail.com> wrote:

> Yeah, I think your solution and mine are the same based on reading your
> suggestion.  Just add a write section to the profile and you can write
> right back into the kafka queue and get all the triage goodness.  You would
> need to ensure that you don't end up with infinite loops back in the
> profiler.  So, things like profiles that interact with EVERY message and
> send a message back to the kafka queue in enrichment would be bad.
>
> On Wed, Feb 1, 2017 at 2:35 PM, Nick Allen <ni...@nickallen.org> wrote:
>
> > Great.  I think we're thinking along the same lines.  I just sent a
> > follow-up of another proposal that takes this idea a little further.
> What
> > if we treated the Profiler as another source of telemetry?
> >
> > On Wed, Feb 1, 2017 at 2:23 PM, Casey Stella <ce...@gmail.com> wrote:
> >
> > > Regarding point 2, could we enable the profiler to write data to kafka
> > and
> > > the enrichment queue?
> > >
> > > I'm proposing the profiler do something like this:
> > >
> > >    - Count the number of inbound flows
> > >    - On the tick, send a message to the enrichment queue containing:
> > >       - the number of flows
> > >       - A source type of 'system_alert'
> > >       - is_alert set to true
> > >    - In enrichment, we enrich and triage system_alert source data in
> the
> > >    same way we do any other.
> > >
> > > This would not solve the transparency issue, but at least make it so we
> > > keep triage in one place in the architecture.  Also, enabling kafka
> > writing
> > > would enable other types of use-cases, like situations where we find
> > > outliers *directly* in the profile and send the alerts directly to the
> > > indexing queue without triage.
> > >
> > > The only changes this proposal would require would be
> > >
> > >    1. a "write" section to a profile that takes a list of stellar
> > >    statements and gets run on the tick write
> > >    2. fixing the kafka writing stellar functions
> > >
> > > Casey
> > >
> > > On Wed, Feb 1, 2017 at 2:11 PM, Nick Allen <ni...@nickallen.org> wrote:
> > >
> > > > I'd like to explore the functionality that we have in Metron using a
> > > > motivating example.  I think this will help highlight some gaps where
> > we
> > > > can enhance Metron.
> > > >
> > > > The motivating example is that I would like to create an alert if the
> > > > number of inbound flows to any host over a 15 minute interval is
> > > abnormal.
> > > > I would like the alert to contain the specific information below to
> > > > streamline the triage process.
> > > >
> > > > Rule: Abnormal number of inbound flows
> > > > Bin: 15 mins
> > > > Alert: The host 'powned.svr.bank.com' has '230' inbound flows,
> > exceeding
> > > > the threshold of '202'
> > > >
> > > >
> > > > *What Works*
> > > >
> > > > In some ways, this example is similar to the "Outlier Detection" demo
> > > that
> > > > I performed with the Profiler a few months back.   We have most of
> what
> > > we
> > > > need to do this with a couple caveats.
> > > >
> > > > 1. An enrichment would be added to enrich the message with the
> correct
> > > > internal hostname 'powned.svr.bank.com'.
> > > >
> > > > 2. With the Profiler, I can capture some idea of what "normal" is for
> > the
> > > > number of inbound flows across 15 minute intervals.
> > > > 3. With Threat Triage, I can create rules that alert when a value
> > exceeds
> > > > what the Profiler defines as normal.
> > > >
> > > >
> > > > *What's Missing*
> > > >
> > > > Its nice to know that we are almost all the way there with this
> > example.
> > > > Unfortunately, there are two gaps that fall out of this.
> > > >
> > > >  1. *Threat Triage Transparency*
> > > >
> > > > There is little transparency into the Threat Triage process itself.
> > When
> > > > Threat Triage runs, all I get is a score.  I don't know how that
> score
> > > was
> > > > arrived at, which rules were triggered, and the specific values that
> > > caused
> > > > a rule to trigger.
> > > >
> > > > More specifically, there is no way to generate a message that looks
> > like
> > > > "The host 'powned.svr.bank.com' has '230' inbound flows, exceeding
> the
> > > > threshold of '202'".
> > > >
> > > >
> > > > 2. *Triage Calculated Values from the Profiler*
> > > >
> > > > Also, the value being interrogated here, the number of inbound flows,
> > is
> > > > not a static value contained within any single telemetry message.
> This
> > > > value is calculated across multiple messages by the Profiler.  The
> > > current
> > > > Threat Triage process cannot be used to interrogate values calculated
> > by
> > > > the Profiler.
> > > >
> > > >
> > > > To try and keep this email concise and digestible, I am going to
> send a
> > > > follow-on discussing proposed solutions for each of these separately.
> > > >
> > >
> >
> >
> >
> > --
> > Nick Allen <ni...@nickallen.org>
> >
>



-- 
Nick Allen <ni...@nickallen.org>

Re: [Discuss] Improve Alerting

Posted by "Zeolla@GMail.com" <ze...@gmail.com>.

Otto, I think you're thinking of the "Enrich enrichment" dev mailing list
thread that Dima started.  In that case we chatted about passing through
enrichment multiple times while decrementing a TTL field to prevent
infinite loops (and drop a message to the error queue if TTL == 0) would
work just fine, and that sounds like it may relate here as well.

I'm not comfortable enough with the profiler in this scenario to comment
authoritatively, other than I can say that I'm really liking the
suggestions that have gone back and forth so far.

Jon

On Wed, Feb 1, 2017 at 2:48 PM Otto Fowler <ot...@gmail.com> wrote:

Isn’t this similar to the discussion around multi-pass enrichment?

JonZ -> you are good at looking up discussions ;)



On February 1, 2017 at 14:38:05, Casey Stella (cestella@gmail.com) wrote:

Yeah, I think your solution and mine are the same based on reading your
suggestion. Just add a write section to the profile and you can write
right back into the kafka queue and get all the triage goodness. You would
need to ensure that you don't end up with infinite loops back in the
profiler. So, things like profiles that interact with EVERY message and
send a message back to the kafka queue in enrichment would be bad.

On Wed, Feb 1, 2017 at 2:35 PM, Nick Allen <ni...@nickallen.org> wrote:

> Great. I think we're thinking along the same lines. I just sent a
> follow-up of another proposal that takes this idea a little further. What
> if we treated the Profiler as another source of telemetry?
>
> On Wed, Feb 1, 2017 at 2:23 PM, Casey Stella <ce...@gmail.com> wrote:
>
> > Regarding point 2, could we enable the profiler to write data to kafka
> and
> > the enrichment queue?
> >
> > I'm proposing the profiler do something like this:
> >
> > - Count the number of inbound flows
> > - On the tick, send a message to the enrichment queue containing:
> > - the number of flows
> > - A source type of 'system_alert'
> > - is_alert set to true
> > - In enrichment, we enrich and triage system_alert source data in the
> > same way we do any other.
> >
> > This would not solve the transparency issue, but at least make it so we
> > keep triage in one place in the architecture. Also, enabling kafka
> writing
> > would enable other types of use-cases, like situations where we find
> > outliers *directly* in the profile and send the alerts directly to the
> > indexing queue without triage.
> >
> > The only changes this proposal would require would be
> >
> > 1. a "write" section to a profile that takes a list of stellar
> > statements and gets run on the tick write
> > 2. fixing the kafka writing stellar functions
> >
> > Casey
> >
> > On Wed, Feb 1, 2017 at 2:11 PM, Nick Allen <ni...@nickallen.org> wrote:
> >
> > > I'd like to explore the functionality that we have in Metron using a
> > > motivating example. I think this will help highlight some gaps where
> we
> > > can enhance Metron.
> > >
> > > The motivating example is that I would like to create an alert if the
> > > number of inbound flows to any host over a 15 minute interval is
> > abnormal.
> > > I would like the alert to contain the specific information below to
> > > streamline the triage process.
> > >
> > > Rule: Abnormal number of inbound flows
> > > Bin: 15 mins
> > > Alert: The host 'powned.svr.bank.com' has '230' inbound flows,
> exceeding
> > > the threshold of '202'
> > >
> > >
> > > *What Works*
> > >
> > > In some ways, this example is similar to the "Outlier Detection" demo
> > that
> > > I performed with the Profiler a few months back. We have most of what
> > we
> > > need to do this with a couple caveats.
> > >
> > > 1. An enrichment would be added to enrich the message with the
correct
> > > internal hostname 'powned.svr.bank.com'.
> > >
> > > 2. With the Profiler, I can capture some idea of what "normal" is for
> the
> > > number of inbound flows across 15 minute intervals.
> > > 3. With Threat Triage, I can create rules that alert when a value
> exceeds
> > > what the Profiler defines as normal.
> > >
> > >
> > > *What's Missing*
> > >
> > > Its nice to know that we are almost all the way there with this
> example.
> > > Unfortunately, there are two gaps that fall out of this.
> > >
> > > 1. *Threat Triage Transparency*
> > >
> > > There is little transparency into the Threat Triage process itself.
> When
> > > Threat Triage runs, all I get is a score. I don't know how that score
> > was
> > > arrived at, which rules were triggered, and the specific values that
> > caused
> > > a rule to trigger.
> > >
> > > More specifically, there is no way to generate a message that looks
> like
> > > "The host 'powned.svr.bank.com' has '230' inbound flows, exceeding
the
> > > threshold of '202'".
> > >
> > >
> > > 2. *Triage Calculated Values from the Profiler*
> > >
> > > Also, the value being interrogated here, the number of inbound flows,
> is
> > > not a static value contained within any single telemetry message.
This
> > > value is calculated across multiple messages by the Profiler. The
> > current
> > > Threat Triage process cannot be used to interrogate values calculated
> by
> > > the Profiler.
> > >
> > >
> > > To try and keep this email concise and digestible, I am going to send
a
> > > follow-on discussing proposed solutions for each of these separately.
> > >
> >
>
>
>
> --
> Nick Allen <ni...@nickallen.org>
>

-- 

Jon

Sent from my mobile device

Re: [Discuss] Improve Alerting

Posted by Otto Fowler <ot...@gmail.com>.

Isn’t this similar to the discussion around multi-pass enrichment?

JonZ -> you are good at looking up discussions ;)



On February 1, 2017 at 14:38:05, Casey Stella (cestella@gmail.com) wrote:

Yeah, I think your solution and mine are the same based on reading your
suggestion. Just add a write section to the profile and you can write
right back into the kafka queue and get all the triage goodness. You would
need to ensure that you don't end up with infinite loops back in the
profiler. So, things like profiles that interact with EVERY message and
send a message back to the kafka queue in enrichment would be bad.

On Wed, Feb 1, 2017 at 2:35 PM, Nick Allen <ni...@nickallen.org> wrote:

> Great. I think we're thinking along the same lines. I just sent a
> follow-up of another proposal that takes this idea a little further. What
> if we treated the Profiler as another source of telemetry?
>
> On Wed, Feb 1, 2017 at 2:23 PM, Casey Stella <ce...@gmail.com> wrote:
>
> > Regarding point 2, could we enable the profiler to write data to kafka
> and
> > the enrichment queue?
> >
> > I'm proposing the profiler do something like this:
> >
> > - Count the number of inbound flows
> > - On the tick, send a message to the enrichment queue containing:
> > - the number of flows
> > - A source type of 'system_alert'
> > - is_alert set to true
> > - In enrichment, we enrich and triage system_alert source data in the
> > same way we do any other.
> >
> > This would not solve the transparency issue, but at least make it so we
> > keep triage in one place in the architecture. Also, enabling kafka
> writing
> > would enable other types of use-cases, like situations where we find
> > outliers *directly* in the profile and send the alerts directly to the
> > indexing queue without triage.
> >
> > The only changes this proposal would require would be
> >
> > 1. a "write" section to a profile that takes a list of stellar
> > statements and gets run on the tick write
> > 2. fixing the kafka writing stellar functions
> >
> > Casey
> >
> > On Wed, Feb 1, 2017 at 2:11 PM, Nick Allen <ni...@nickallen.org> wrote:
> >
> > > I'd like to explore the functionality that we have in Metron using a
> > > motivating example. I think this will help highlight some gaps where
> we
> > > can enhance Metron.
> > >
> > > The motivating example is that I would like to create an alert if the
> > > number of inbound flows to any host over a 15 minute interval is
> > abnormal.
> > > I would like the alert to contain the specific information below to
> > > streamline the triage process.
> > >
> > > Rule: Abnormal number of inbound flows
> > > Bin: 15 mins
> > > Alert: The host 'powned.svr.bank.com' has '230' inbound flows,
> exceeding
> > > the threshold of '202'
> > >
> > >
> > > *What Works*
> > >
> > > In some ways, this example is similar to the "Outlier Detection" demo
> > that
> > > I performed with the Profiler a few months back. We have most of what
> > we
> > > need to do this with a couple caveats.
> > >
> > > 1. An enrichment would be added to enrich the message with the
correct
> > > internal hostname 'powned.svr.bank.com'.
> > >
> > > 2. With the Profiler, I can capture some idea of what "normal" is for
> the
> > > number of inbound flows across 15 minute intervals.
> > > 3. With Threat Triage, I can create rules that alert when a value
> exceeds
> > > what the Profiler defines as normal.
> > >
> > >
> > > *What's Missing*
> > >
> > > Its nice to know that we are almost all the way there with this
> example.
> > > Unfortunately, there are two gaps that fall out of this.
> > >
> > > 1. *Threat Triage Transparency*
> > >
> > > There is little transparency into the Threat Triage process itself.
> When
> > > Threat Triage runs, all I get is a score. I don't know how that score
> > was
> > > arrived at, which rules were triggered, and the specific values that
> > caused
> > > a rule to trigger.
> > >
> > > More specifically, there is no way to generate a message that looks
> like
> > > "The host 'powned.svr.bank.com' has '230' inbound flows, exceeding
the
> > > threshold of '202'".
> > >
> > >
> > > 2. *Triage Calculated Values from the Profiler*
> > >
> > > Also, the value being interrogated here, the number of inbound flows,
> is
> > > not a static value contained within any single telemetry message.
This
> > > value is calculated across multiple messages by the Profiler. The
> > current
> > > Threat Triage process cannot be used to interrogate values calculated
> by
> > > the Profiler.
> > >
> > >
> > > To try and keep this email concise and digestible, I am going to send
a
> > > follow-on discussing proposed solutions for each of these separately.
> > >
> >
>
>
>
> --
> Nick Allen <ni...@nickallen.org>
>

Re: [Discuss] Improve Alerting

Posted by Casey Stella <ce...@gmail.com>.

Yeah, I think your solution and mine are the same based on reading your
suggestion.  Just add a write section to the profile and you can write
right back into the kafka queue and get all the triage goodness.  You would
need to ensure that you don't end up with infinite loops back in the
profiler.  So, things like profiles that interact with EVERY message and
send a message back to the kafka queue in enrichment would be bad.

On Wed, Feb 1, 2017 at 2:35 PM, Nick Allen <ni...@nickallen.org> wrote:

> Great.  I think we're thinking along the same lines.  I just sent a
> follow-up of another proposal that takes this idea a little further.  What
> if we treated the Profiler as another source of telemetry?
>
> On Wed, Feb 1, 2017 at 2:23 PM, Casey Stella <ce...@gmail.com> wrote:
>
> > Regarding point 2, could we enable the profiler to write data to kafka
> and
> > the enrichment queue?
> >
> > I'm proposing the profiler do something like this:
> >
> >    - Count the number of inbound flows
> >    - On the tick, send a message to the enrichment queue containing:
> >       - the number of flows
> >       - A source type of 'system_alert'
> >       - is_alert set to true
> >    - In enrichment, we enrich and triage system_alert source data in the
> >    same way we do any other.
> >
> > This would not solve the transparency issue, but at least make it so we
> > keep triage in one place in the architecture.  Also, enabling kafka
> writing
> > would enable other types of use-cases, like situations where we find
> > outliers *directly* in the profile and send the alerts directly to the
> > indexing queue without triage.
> >
> > The only changes this proposal would require would be
> >
> >    1. a "write" section to a profile that takes a list of stellar
> >    statements and gets run on the tick write
> >    2. fixing the kafka writing stellar functions
> >
> > Casey
> >
> > On Wed, Feb 1, 2017 at 2:11 PM, Nick Allen <ni...@nickallen.org> wrote:
> >
> > > I'd like to explore the functionality that we have in Metron using a
> > > motivating example.  I think this will help highlight some gaps where
> we
> > > can enhance Metron.
> > >
> > > The motivating example is that I would like to create an alert if the
> > > number of inbound flows to any host over a 15 minute interval is
> > abnormal.
> > > I would like the alert to contain the specific information below to
> > > streamline the triage process.
> > >
> > > Rule: Abnormal number of inbound flows
> > > Bin: 15 mins
> > > Alert: The host 'powned.svr.bank.com' has '230' inbound flows,
> exceeding
> > > the threshold of '202'
> > >
> > >
> > > *What Works*
> > >
> > > In some ways, this example is similar to the "Outlier Detection" demo
> > that
> > > I performed with the Profiler a few months back.   We have most of what
> > we
> > > need to do this with a couple caveats.
> > >
> > > 1. An enrichment would be added to enrich the message with the correct
> > > internal hostname 'powned.svr.bank.com'.
> > >
> > > 2. With the Profiler, I can capture some idea of what "normal" is for
> the
> > > number of inbound flows across 15 minute intervals.
> > > 3. With Threat Triage, I can create rules that alert when a value
> exceeds
> > > what the Profiler defines as normal.
> > >
> > >
> > > *What's Missing*
> > >
> > > Its nice to know that we are almost all the way there with this
> example.
> > > Unfortunately, there are two gaps that fall out of this.
> > >
> > >  1. *Threat Triage Transparency*
> > >
> > > There is little transparency into the Threat Triage process itself.
> When
> > > Threat Triage runs, all I get is a score.  I don't know how that score
> > was
> > > arrived at, which rules were triggered, and the specific values that
> > caused
> > > a rule to trigger.
> > >
> > > More specifically, there is no way to generate a message that looks
> like
> > > "The host 'powned.svr.bank.com' has '230' inbound flows, exceeding the
> > > threshold of '202'".
> > >
> > >
> > > 2. *Triage Calculated Values from the Profiler*
> > >
> > > Also, the value being interrogated here, the number of inbound flows,
> is
> > > not a static value contained within any single telemetry message.  This
> > > value is calculated across multiple messages by the Profiler.  The
> > current
> > > Threat Triage process cannot be used to interrogate values calculated
> by
> > > the Profiler.
> > >
> > >
> > > To try and keep this email concise and digestible, I am going to send a
> > > follow-on discussing proposed solutions for each of these separately.
> > >
> >
>
>
>
> --
> Nick Allen <ni...@nickallen.org>
>

Re: [Discuss] Improve Alerting

Posted by Nick Allen <ni...@nickallen.org>.

Great.  I think we're thinking along the same lines.  I just sent a
follow-up of another proposal that takes this idea a little further.  What
if we treated the Profiler as another source of telemetry?

On Wed, Feb 1, 2017 at 2:23 PM, Casey Stella <ce...@gmail.com> wrote:

> Regarding point 2, could we enable the profiler to write data to kafka and
> the enrichment queue?
>
> I'm proposing the profiler do something like this:
>
>    - Count the number of inbound flows
>    - On the tick, send a message to the enrichment queue containing:
>       - the number of flows
>       - A source type of 'system_alert'
>       - is_alert set to true
>    - In enrichment, we enrich and triage system_alert source data in the
>    same way we do any other.
>
> This would not solve the transparency issue, but at least make it so we
> keep triage in one place in the architecture.  Also, enabling kafka writing
> would enable other types of use-cases, like situations where we find
> outliers *directly* in the profile and send the alerts directly to the
> indexing queue without triage.
>
> The only changes this proposal would require would be
>
>    1. a "write" section to a profile that takes a list of stellar
>    statements and gets run on the tick write
>    2. fixing the kafka writing stellar functions
>
> Casey
>
> On Wed, Feb 1, 2017 at 2:11 PM, Nick Allen <ni...@nickallen.org> wrote:
>
> > I'd like to explore the functionality that we have in Metron using a
> > motivating example.  I think this will help highlight some gaps where we
> > can enhance Metron.
> >
> > The motivating example is that I would like to create an alert if the
> > number of inbound flows to any host over a 15 minute interval is
> abnormal.
> > I would like the alert to contain the specific information below to
> > streamline the triage process.
> >
> > Rule: Abnormal number of inbound flows
> > Bin: 15 mins
> > Alert: The host 'powned.svr.bank.com' has '230' inbound flows, exceeding
> > the threshold of '202'
> >
> >
> > *What Works*
> >
> > In some ways, this example is similar to the "Outlier Detection" demo
> that
> > I performed with the Profiler a few months back.   We have most of what
> we
> > need to do this with a couple caveats.
> >
> > 1. An enrichment would be added to enrich the message with the correct
> > internal hostname 'powned.svr.bank.com'.
> >
> > 2. With the Profiler, I can capture some idea of what "normal" is for the
> > number of inbound flows across 15 minute intervals.
> > 3. With Threat Triage, I can create rules that alert when a value exceeds
> > what the Profiler defines as normal.
> >
> >
> > *What's Missing*
> >
> > Its nice to know that we are almost all the way there with this example.
> > Unfortunately, there are two gaps that fall out of this.
> >
> >  1. *Threat Triage Transparency*
> >
> > There is little transparency into the Threat Triage process itself.  When
> > Threat Triage runs, all I get is a score.  I don't know how that score
> was
> > arrived at, which rules were triggered, and the specific values that
> caused
> > a rule to trigger.
> >
> > More specifically, there is no way to generate a message that looks like
> > "The host 'powned.svr.bank.com' has '230' inbound flows, exceeding the
> > threshold of '202'".
> >
> >
> > 2. *Triage Calculated Values from the Profiler*
> >
> > Also, the value being interrogated here, the number of inbound flows, is
> > not a static value contained within any single telemetry message.  This
> > value is calculated across multiple messages by the Profiler.  The
> current
> > Threat Triage process cannot be used to interrogate values calculated by
> > the Profiler.
> >
> >
> > To try and keep this email concise and digestible, I am going to send a
> > follow-on discussing proposed solutions for each of these separately.
> >
>



-- 
Nick Allen <ni...@nickallen.org>

Re: [Discuss] Improve Alerting

Posted by Casey Stella <ce...@gmail.com>.

Regarding point 2, could we enable the profiler to write data to kafka and
the enrichment queue?

I'm proposing the profiler do something like this:

   - Count the number of inbound flows
   - On the tick, send a message to the enrichment queue containing:
      - the number of flows
      - A source type of 'system_alert'
      - is_alert set to true
   - In enrichment, we enrich and triage system_alert source data in the
   same way we do any other.

This would not solve the transparency issue, but at least make it so we
keep triage in one place in the architecture.  Also, enabling kafka writing
would enable other types of use-cases, like situations where we find
outliers *directly* in the profile and send the alerts directly to the
indexing queue without triage.

The only changes this proposal would require would be

   1. a "write" section to a profile that takes a list of stellar
   statements and gets run on the tick write
   2. fixing the kafka writing stellar functions

Casey

On Wed, Feb 1, 2017 at 2:11 PM, Nick Allen <ni...@nickallen.org> wrote:

> I'd like to explore the functionality that we have in Metron using a
> motivating example.  I think this will help highlight some gaps where we
> can enhance Metron.
>
> The motivating example is that I would like to create an alert if the
> number of inbound flows to any host over a 15 minute interval is abnormal.
> I would like the alert to contain the specific information below to
> streamline the triage process.
>
> Rule: Abnormal number of inbound flows
> Bin: 15 mins
> Alert: The host 'powned.svr.bank.com' has '230' inbound flows, exceeding
> the threshold of '202'
>
>
> *What Works*
>
> In some ways, this example is similar to the "Outlier Detection" demo that
> I performed with the Profiler a few months back.   We have most of what we
> need to do this with a couple caveats.
>
> 1. An enrichment would be added to enrich the message with the correct
> internal hostname 'powned.svr.bank.com'.
>
> 2. With the Profiler, I can capture some idea of what "normal" is for the
> number of inbound flows across 15 minute intervals.
> 3. With Threat Triage, I can create rules that alert when a value exceeds
> what the Profiler defines as normal.
>
>
> *What's Missing*
>
> Its nice to know that we are almost all the way there with this example.
> Unfortunately, there are two gaps that fall out of this.
>
>  1. *Threat Triage Transparency*
>
> There is little transparency into the Threat Triage process itself.  When
> Threat Triage runs, all I get is a score.  I don't know how that score was
> arrived at, which rules were triggered, and the specific values that caused
> a rule to trigger.
>
> More specifically, there is no way to generate a message that looks like
> "The host 'powned.svr.bank.com' has '230' inbound flows, exceeding the
> threshold of '202'".
>
>
> 2. *Triage Calculated Values from the Profiler*
>
> Also, the value being interrogated here, the number of inbound flows, is
> not a static value contained within any single telemetry message.  This
> value is calculated across multiple messages by the Profiler.  The current
> Threat Triage process cannot be used to interrogate values calculated by
> the Profiler.
>
>
> To try and keep this email concise and digestible, I am going to send a
> follow-on discussing proposed solutions for each of these separately.
>

Re: [Discuss] Improve Alerting

Posted by Nick Allen <ni...@nickallen.org>.

To close out this discussion, I created another JIRA to take care of
the "*Triage
Calculated Values from the Profiler" *problem.  Feel free to let me know if
anything else was missed.

[1] Triage Metrics Produced by the Profiler
https://issues.apache.org/jira/browse/METRON-701



On Thu, Feb 2, 2017 at 10:15 AM, Nick Allen <ni...@nickallen.org> wrote:

> I created 3 separate JIRAs to track the "Threat Triage Transparency"
> portion of the work falling out of this discussion thread.  The first would
> create a mechanism to do string interpolation.  The second would enhance
> threat triage to use the string interpolation.  The third would enhance the
> output of threat triage.
>
> [1] Create String Formatting Function for Stellar
> https://issues.apache.org/jira/browse/METRON-687
>
> [2] Allow Threat Triage Comment Field to Contain Stellar Expressions
> https://issues.apache.org/jira/browse/METRON-688
>
> [3] Record of Rule Set that Fired During Threat Triage
> https://issues.apache.org/jira/browse/METRON-686
>
> Please let me know if anyone's concerns were not captured.  I will create
> additional JIRAs for the other portion of the effort (*Triage Calculated
> Values from the Profiler)* once I've given everyone a little more time to
> voice an opinion.
> 
>
> On Thu, Feb 2, 2017 at 9:46 AM, Nick Allen <ni...@nickallen.org> wrote:
>
>> Oh, I see.  Yes, very useful.
>>
>>
>> On Thu, Feb 2, 2017 at 9:39 AM, Simon Elliston Ball <
>> simon@simonellistonball.com> wrote:
>>
>>> That’s a part of it, certainly (and fixes another of my bug bears, so
>>> thank you!)
>>>
>>> In addition to the aggregation being stellar, I want score to be a
>>> stellar statement, I’ve put in a separate ticket for that.
>>> https://issues.apache.org/jira/browse/METRON-685 <
>>> https://issues.apache.org/jira/browse/METRON-685>
>>>
>>> Simon
>>>
>>> > On 2 Feb 2017, at 14:31, Nick Allen <ni...@nickallen.org> wrote:
>>> >
>>> >> I would much rather be able to say something like score = some stellar
>>> >> statement that returns a float...
>>> >
>>> >
>>> > Completely agree.  FYI - We added METRON-683 yesterday that I believe
>>> > supports what you are saying.  Feel free to add commentary.
>>> >
>>> > https://issues.apache.org/jira/browse/METRON-683
>>> >
>>> > On Thu, Feb 2, 2017 at 9:02 AM, Simon Elliston Ball <
>>> > simon@simonellistonball.com> wrote:
>>> >
>>> >> I completely agree with Nick’s transparency comments, and like the
>>> design
>>> >> of the configuration, especially provision for messaging around the
>>> nature
>>> >> of the rule fired.
>>> >>
>>> >> I would just like to add a small point on the capabilities here. If
>>> the
>>> >> message could have embedded values through some sort of template for a
>>> >> stellar statement, it would make for a better more dynamic alert
>>> reason.
>>> >>
>>> >> I would also like to see the score field capable of outputting the
>>> value
>>> >> of a stellar statement. At the moment the idea of a static score being
>>> >> passed on means that if I have a probabilistic result I want to
>>> combine
>>> >> with other triage sources, I have to do a lot of bucketing into fixed
>>> >> values. I would much rather be able to say something like score = some
>>> >> stellar statement that returns a float, ‘alertness' = threshold of
>>> this.
>>> >> That way I can combine multiple triage rules to trigger an overall
>>> alert,
>>> >> making the aggregators more meaningful.
>>> >>
>>> >> Simon
>>> >>
>>> >>
>>> >>> On 2 Feb 2017, at 12:40, Carolyn Duby <cd...@hortonworks.com> wrote:
>>> >>>
>>> >>> For profiler alerts it will be helpful during analysis to see the
>>> alerts
>>> >> that caused the anomaly.  The meta alert is useful for incidents
>>> involving
>>> >> correlation of multiple events.
>>> >>>
>>> >>> Also you will need to filter out known hosts that trigger anomalies.
>>> >> For example vulnerability scanning software.
>>> >>>
>>> >>> One final thing to consider is anomalies happen every day without a
>>> >> security incident.  Depending on the network the profiler alerts
>>> could get
>>> >> very noisy so it might be better to correlate profiler alerts with
>>> other
>>> >> alerts.
>>> >>>
>>> >>> Thanks
>>> >>> Carolyn
>>> >>>
>>> >>>
>>> >>>
>>> >>> Sent from my Verizon, Samsung Galaxy smartphone
>>> >>>
>>> >>>
>>> >>> -------- Original message --------
>>> >>> From: Casey Stella <ce...@gmail.com>
>>> >>> Date: 2/1/17 2:28 PM (GMT-05:00)
>>> >>> To: dev@metron.incubator.apache.org
>>> >>> Subject: Re: [Discuss] Improve Alerting
>>> >>>
>>> >>> I like the direction.  One thing that we may want is for comment to
>>> just
>>> >> be
>>> >>> a stellar expression and construct a function to essentially do
>>> >>> String.format().  So, that'd become:
>>> >>> "triageConfig" : {
>>> >>> "riskLevelRules" : [
>>> >>>   {
>>> >>>     "name" : "Abnormal Value",
>>> >>>     "comment" : "FORMAT('For %s; the value %s exceeds threshold of
>>> %d',
>>> >>> hostname, value, value_threshold)"
>>> >>>     "rule" : "value > value_threshold",
>>> >>>     "score" : 10
>>> >>>   }
>>> >>> ],
>>> >>> "aggregator" : "MAX"
>>> >>> }
>>> >>>
>>> >>> The reason:
>>> >>>
>>> >>>  - It's integrated and stellar is our default scripting layer
>>> >>>  - It supports doing some computation in the message
>>> >>>
>>> >>>
>>> >>> On Wed, Feb 1, 2017 at 2:21 PM, Nick Allen <ni...@nickallen.org>
>>> wrote:
>>> >>>
>>> >>>> Like I said, here is a proposed solution to one of the gaps I
>>> >> identified in
>>> >>>> the previous email.
>>> >>>>
>>> >>>> *Problem*
>>> >>>>
>>> >>>> There is little transparency into the Threat Triage process itself.
>>> >> When
>>> >>>> Threat Triage runs, all I get is a score.  I don't know how that
>>> score
>>> >> was
>>> >>>> arrived at, which rules were triggered, and the specific values that
>>> >> caused
>>> >>>> a rule to trigger.
>>> >>>>
>>> >>>> More specifically, there is no way to generate a message that looks
>>> like
>>> >>>> "The host 'powned.svr.bank.com' has '230' inbound flows, exceeding
>>> the
>>> >>>> threshold of '202'".  This makes it difficult for an analyst to
>>> action
>>> >> the
>>> >>>> alert.
>>> >>>>
>>> >>>> *Proposed Solution*
>>> >>>>
>>> >>>> To improve the transparency of the Threat Triage process, I am
>>> proposing
>>> >>>> these enhancements.
>>> >>>>
>>> >>>> 1. Threat Triage should attach to each message all of the rules that
>>> >> fired
>>> >>>> in addition to the total calculated threat triage score.
>>> >>>>
>>> >>>> 2. Threat Triage should allow a custom message to be generated for
>>> each
>>> >>>> rule.  The custom message would allow for some form of string
>>> >> interpolation
>>> >>>> so that I can add specific values from each message to the generated
>>> >>>> alert.  We could allow this in one or both of the new fields that
>>> Casey
>>> >>>> just added, name and comment.
>>> >>>>
>>> >>>>
>>> >>>> *Example*
>>> >>>>
>>> >>>> 1. In this example, we have a telemetry message with a field called
>>> >> 'value'
>>> >>>> that we need to monitor.  In Enrichment, I calculate some sort of
>>> value
>>> >>>> threshold, over which an alert should be generated.
>>> >>>>
>>> >>>>
>>> >>>> 2. In Threat Triage, I use the calculated value threshold to alert
>>> on
>>> >> any
>>> >>>> message that has a value exceeding this threshold.
>>> >>>>
>>> >>>> 3. I can embed values from the message, like the hostname, value,
>>> and
>>> >> value
>>> >>>> threshold, into the alert produced by Threat Triage.  Notice that I
>>> am
>>> >>>> using ${this} for string interpolation, but it could be any syntax
>>> that
>>> >> we
>>> >>>> choose.
>>> >>>>
>>> >>>>
>>> >>>> "triageConfig" : {
>>> >>>> "riskLevelRules" : [
>>> >>>>   {
>>> >>>>     "name" : "Abnormal Value",
>>> >>>>     "comment" : "For ${hostname}; the value ${value} exceeds
>>> threshold
>>> >> of
>>> >>>> ${value_threshold}",
>>> >>>>     "rule" : "value > value_threshold",
>>> >>>>     "score" : 10
>>> >>>>   }
>>> >>>> ],
>>> >>>> "aggregator" : "MAX"
>>> >>>> }
>>> >>>>
>>> >>>>
>>> >>>> 4. The Threat Triage process today would add only the total
>>> calculated
>>> >>>> score.
>>> >>>>
>>> >>>> "threat.triage.level": 10.0
>>> >>>>
>>> >>>>
>>> >>>> With this proposal, Threat Triage would add the following to the
>>> >> message.
>>> >>>>
>>> >>>> Notice how each of the ${variables} have been replaced with the
>>> actual
>>> >>>> values extracted from the message.  This allows for more contextual
>>> >>>> information to action the alert.
>>> >>>>
>>> >>>> "threat.triage": {
>>> >>>>   "score": 10.0,
>>> >>>>   "rules": [
>>> >>>>     {
>>> >>>>       "name": "Abnormal Value",
>>> >>>>       "comment" : "For 10.0.0.1; the value 101 exceeds threshold of
>>> >> 42",
>>> >>>>       "score" : 10
>>> >>>>     }
>>> >>>>   ]
>>> >>>> }
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> What do you think?  Any alternative ideas?
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> On Wed, Feb 1, 2017 at 2:11 PM, Nick Allen <ni...@nickallen.org>
>>> wrote:
>>> >>>>
>>> >>>>> I'd like to explore the functionality that we have in Metron using
>>> a
>>> >>>>> motivating example.  I think this will help highlight some gaps
>>> where
>>> >> we
>>> >>>>> can enhance Metron.
>>> >>>>>
>>> >>>>> The motivating example is that I would like to create an alert if
>>> the
>>> >>>>> number of inbound flows to any host over a 15 minute interval is
>>> >>>> abnormal.
>>> >>>>> I would like the alert to contain the specific information below to
>>> >>>>> streamline the triage process.
>>> >>>>>
>>> >>>>> Rule: Abnormal number of inbound flows
>>> >>>>> Bin: 15 mins
>>> >>>>> Alert: The host 'powned.svr.bank.com' has '230' inbound flows,
>>> >> exceeding
>>> >>>>> the threshold of '202'
>>> >>>>>
>>> >>>>>
>>> >>>>> *What Works*
>>> >>>>>
>>> >>>>> In some ways, this example is similar to the "Outlier Detection"
>>> demo
>>> >>>> that
>>> >>>>> I performed with the Profiler a few months back.   We have most of
>>> what
>>> >>>> we
>>> >>>>> need to do this with a couple caveats.
>>> >>>>>
>>> >>>>> 1. An enrichment would be added to enrich the message with the
>>> correct
>>> >>>>> internal hostname 'powned.svr.bank.com'.
>>> >>>>>
>>> >>>>> 2. With the Profiler, I can capture some idea of what "normal" is
>>> for
>>> >> the
>>> >>>>> number of inbound flows across 15 minute intervals.
>>> >>>>> 3. With Threat Triage, I can create rules that alert when a value
>>> >> exceeds
>>> >>>>> what the Profiler defines as normal.
>>> >>>>>
>>> >>>>>
>>> >>>>> *What's Missing*
>>> >>>>>
>>> >>>>> Its nice to know that we are almost all the way there with this
>>> >> example.
>>> >>>>> Unfortunately, there are two gaps that fall out of this.
>>> >>>>>
>>> >>>>> 1. *Threat Triage Transparency*
>>> >>>>>
>>> >>>>> There is little transparency into the Threat Triage process itself.
>>> >> When
>>> >>>>> Threat Triage runs, all I get is a score.  I don't know how that
>>> score
>>> >>>> was
>>> >>>>> arrived at, which rules were triggered, and the specific values
>>> that
>>> >>>> caused
>>> >>>>> a rule to trigger.
>>> >>>>>
>>> >>>>> More specifically, there is no way to generate a message that looks
>>> >> like
>>> >>>>> "The host 'powned.svr.bank.com' has '230' inbound flows,
>>> exceeding the
>>> >>>>> threshold of '202'".
>>> >>>>>
>>> >>>>>
>>> >>>>> 2. *Triage Calculated Values from the Profiler*
>>> >>>>>
>>> >>>>> Also, the value being interrogated here, the number of inbound
>>> flows,
>>> >> is
>>> >>>>> not a static value contained within any single telemetry message.
>>> This
>>> >>>>> value is calculated across multiple messages by the Profiler.  The
>>> >>>> current
>>> >>>>> Threat Triage process cannot be used to interrogate values
>>> calculated
>>> >> by
>>> >>>>> the Profiler.
>>> >>>>>
>>> >>>>>
>>> >>>>> To try and keep this email concise and digestible, I am going to
>>> send a
>>> >>>>> follow-on discussing proposed solutions for each of these
>>> separately.
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>
>>> >>>>
>>> >>>> --
>>> >>>> Nick Allen <ni...@nickallen.org>
>>> >>>>
>>> >>
>>> >>
>>>
>>>
>>
>

Re: [Discuss] Improve Alerting

Posted by Nick Allen <ni...@nickallen.org>.

I created 3 separate JIRAs to track the "Threat Triage Transparency"
portion of the work falling out of this discussion thread.  The first would
create a mechanism to do string interpolation.  The second would enhance
threat triage to use the string interpolation.  The third would enhance the
output of threat triage.

[1] Create String Formatting Function for Stellar
https://issues.apache.org/jira/browse/METRON-687

[2] Allow Threat Triage Comment Field to Contain Stellar Expressions
https://issues.apache.org/jira/browse/METRON-688

[3] Record of Rule Set that Fired During Threat Triage
https://issues.apache.org/jira/browse/METRON-686

Please let me know if anyone's concerns were not captured.  I will create
additional JIRAs for the other portion of the effort (*Triage Calculated
Values from the Profiler)* once I've given everyone a little more time to
voice an opinion.


On Thu, Feb 2, 2017 at 9:46 AM, Nick Allen <ni...@nickallen.org> wrote:

> Oh, I see.  Yes, very useful.
>
>
> On Thu, Feb 2, 2017 at 9:39 AM, Simon Elliston Ball <
> simon@simonellistonball.com> wrote:
>
>> That’s a part of it, certainly (and fixes another of my bug bears, so
>> thank you!)
>>
>> In addition to the aggregation being stellar, I want score to be a
>> stellar statement, I’ve put in a separate ticket for that.
>> https://issues.apache.org/jira/browse/METRON-685 <
>> https://issues.apache.org/jira/browse/METRON-685>
>>
>> Simon
>>
>> > On 2 Feb 2017, at 14:31, Nick Allen <ni...@nickallen.org> wrote:
>> >
>> >> I would much rather be able to say something like score = some stellar
>> >> statement that returns a float...
>> >
>> >
>> > Completely agree.  FYI - We added METRON-683 yesterday that I believe
>> > supports what you are saying.  Feel free to add commentary.
>> >
>> > https://issues.apache.org/jira/browse/METRON-683
>> >
>> > On Thu, Feb 2, 2017 at 9:02 AM, Simon Elliston Ball <
>> > simon@simonellistonball.com> wrote:
>> >
>> >> I completely agree with Nick’s transparency comments, and like the
>> design
>> >> of the configuration, especially provision for messaging around the
>> nature
>> >> of the rule fired.
>> >>
>> >> I would just like to add a small point on the capabilities here. If the
>> >> message could have embedded values through some sort of template for a
>> >> stellar statement, it would make for a better more dynamic alert
>> reason.
>> >>
>> >> I would also like to see the score field capable of outputting the
>> value
>> >> of a stellar statement. At the moment the idea of a static score being
>> >> passed on means that if I have a probabilistic result I want to combine
>> >> with other triage sources, I have to do a lot of bucketing into fixed
>> >> values. I would much rather be able to say something like score = some
>> >> stellar statement that returns a float, ‘alertness' = threshold of
>> this.
>> >> That way I can combine multiple triage rules to trigger an overall
>> alert,
>> >> making the aggregators more meaningful.
>> >>
>> >> Simon
>> >>
>> >>
>> >>> On 2 Feb 2017, at 12:40, Carolyn Duby <cd...@hortonworks.com> wrote:
>> >>>
>> >>> For profiler alerts it will be helpful during analysis to see the
>> alerts
>> >> that caused the anomaly.  The meta alert is useful for incidents
>> involving
>> >> correlation of multiple events.
>> >>>
>> >>> Also you will need to filter out known hosts that trigger anomalies.
>> >> For example vulnerability scanning software.
>> >>>
>> >>> One final thing to consider is anomalies happen every day without a
>> >> security incident.  Depending on the network the profiler alerts could
>> get
>> >> very noisy so it might be better to correlate profiler alerts with
>> other
>> >> alerts.
>> >>>
>> >>> Thanks
>> >>> Carolyn
>> >>>
>> >>>
>> >>>
>> >>> Sent from my Verizon, Samsung Galaxy smartphone
>> >>>
>> >>>
>> >>> -------- Original message --------
>> >>> From: Casey Stella <ce...@gmail.com>
>> >>> Date: 2/1/17 2:28 PM (GMT-05:00)
>> >>> To: dev@metron.incubator.apache.org
>> >>> Subject: Re: [Discuss] Improve Alerting
>> >>>
>> >>> I like the direction.  One thing that we may want is for comment to
>> just
>> >> be
>> >>> a stellar expression and construct a function to essentially do
>> >>> String.format().  So, that'd become:
>> >>> "triageConfig" : {
>> >>> "riskLevelRules" : [
>> >>>   {
>> >>>     "name" : "Abnormal Value",
>> >>>     "comment" : "FORMAT('For %s; the value %s exceeds threshold of
>> %d',
>> >>> hostname, value, value_threshold)"
>> >>>     "rule" : "value > value_threshold",
>> >>>     "score" : 10
>> >>>   }
>> >>> ],
>> >>> "aggregator" : "MAX"
>> >>> }
>> >>>
>> >>> The reason:
>> >>>
>> >>>  - It's integrated and stellar is our default scripting layer
>> >>>  - It supports doing some computation in the message
>> >>>
>> >>>
>> >>> On Wed, Feb 1, 2017 at 2:21 PM, Nick Allen <ni...@nickallen.org>
>> wrote:
>> >>>
>> >>>> Like I said, here is a proposed solution to one of the gaps I
>> >> identified in
>> >>>> the previous email.
>> >>>>
>> >>>> *Problem*
>> >>>>
>> >>>> There is little transparency into the Threat Triage process itself.
>> >> When
>> >>>> Threat Triage runs, all I get is a score.  I don't know how that
>> score
>> >> was
>> >>>> arrived at, which rules were triggered, and the specific values that
>> >> caused
>> >>>> a rule to trigger.
>> >>>>
>> >>>> More specifically, there is no way to generate a message that looks
>> like
>> >>>> "The host 'powned.svr.bank.com' has '230' inbound flows, exceeding
>> the
>> >>>> threshold of '202'".  This makes it difficult for an analyst to
>> action
>> >> the
>> >>>> alert.
>> >>>>
>> >>>> *Proposed Solution*
>> >>>>
>> >>>> To improve the transparency of the Threat Triage process, I am
>> proposing
>> >>>> these enhancements.
>> >>>>
>> >>>> 1. Threat Triage should attach to each message all of the rules that
>> >> fired
>> >>>> in addition to the total calculated threat triage score.
>> >>>>
>> >>>> 2. Threat Triage should allow a custom message to be generated for
>> each
>> >>>> rule.  The custom message would allow for some form of string
>> >> interpolation
>> >>>> so that I can add specific values from each message to the generated
>> >>>> alert.  We could allow this in one or both of the new fields that
>> Casey
>> >>>> just added, name and comment.
>> >>>>
>> >>>>
>> >>>> *Example*
>> >>>>
>> >>>> 1. In this example, we have a telemetry message with a field called
>> >> 'value'
>> >>>> that we need to monitor.  In Enrichment, I calculate some sort of
>> value
>> >>>> threshold, over which an alert should be generated.
>> >>>>
>> >>>>
>> >>>> 2. In Threat Triage, I use the calculated value threshold to alert on
>> >> any
>> >>>> message that has a value exceeding this threshold.
>> >>>>
>> >>>> 3. I can embed values from the message, like the hostname, value, and
>> >> value
>> >>>> threshold, into the alert produced by Threat Triage.  Notice that I
>> am
>> >>>> using ${this} for string interpolation, but it could be any syntax
>> that
>> >> we
>> >>>> choose.
>> >>>>
>> >>>>
>> >>>> "triageConfig" : {
>> >>>> "riskLevelRules" : [
>> >>>>   {
>> >>>>     "name" : "Abnormal Value",
>> >>>>     "comment" : "For ${hostname}; the value ${value} exceeds
>> threshold
>> >> of
>> >>>> ${value_threshold}",
>> >>>>     "rule" : "value > value_threshold",
>> >>>>     "score" : 10
>> >>>>   }
>> >>>> ],
>> >>>> "aggregator" : "MAX"
>> >>>> }
>> >>>>
>> >>>>
>> >>>> 4. The Threat Triage process today would add only the total
>> calculated
>> >>>> score.
>> >>>>
>> >>>> "threat.triage.level": 10.0
>> >>>>
>> >>>>
>> >>>> With this proposal, Threat Triage would add the following to the
>> >> message.
>> >>>>
>> >>>> Notice how each of the ${variables} have been replaced with the
>> actual
>> >>>> values extracted from the message.  This allows for more contextual
>> >>>> information to action the alert.
>> >>>>
>> >>>> "threat.triage": {
>> >>>>   "score": 10.0,
>> >>>>   "rules": [
>> >>>>     {
>> >>>>       "name": "Abnormal Value",
>> >>>>       "comment" : "For 10.0.0.1; the value 101 exceeds threshold of
>> >> 42",
>> >>>>       "score" : 10
>> >>>>     }
>> >>>>   ]
>> >>>> }
>> >>>>
>> >>>>
>> >>>>
>> >>>> What do you think?  Any alternative ideas?
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Wed, Feb 1, 2017 at 2:11 PM, Nick Allen <ni...@nickallen.org>
>> wrote:
>> >>>>
>> >>>>> I'd like to explore the functionality that we have in Metron using a
>> >>>>> motivating example.  I think this will help highlight some gaps
>> where
>> >> we
>> >>>>> can enhance Metron.
>> >>>>>
>> >>>>> The motivating example is that I would like to create an alert if
>> the
>> >>>>> number of inbound flows to any host over a 15 minute interval is
>> >>>> abnormal.
>> >>>>> I would like the alert to contain the specific information below to
>> >>>>> streamline the triage process.
>> >>>>>
>> >>>>> Rule: Abnormal number of inbound flows
>> >>>>> Bin: 15 mins
>> >>>>> Alert: The host 'powned.svr.bank.com' has '230' inbound flows,
>> >> exceeding
>> >>>>> the threshold of '202'
>> >>>>>
>> >>>>>
>> >>>>> *What Works*
>> >>>>>
>> >>>>> In some ways, this example is similar to the "Outlier Detection"
>> demo
>> >>>> that
>> >>>>> I performed with the Profiler a few months back.   We have most of
>> what
>> >>>> we
>> >>>>> need to do this with a couple caveats.
>> >>>>>
>> >>>>> 1. An enrichment would be added to enrich the message with the
>> correct
>> >>>>> internal hostname 'powned.svr.bank.com'.
>> >>>>>
>> >>>>> 2. With the Profiler, I can capture some idea of what "normal" is
>> for
>> >> the
>> >>>>> number of inbound flows across 15 minute intervals.
>> >>>>> 3. With Threat Triage, I can create rules that alert when a value
>> >> exceeds
>> >>>>> what the Profiler defines as normal.
>> >>>>>
>> >>>>>
>> >>>>> *What's Missing*
>> >>>>>
>> >>>>> Its nice to know that we are almost all the way there with this
>> >> example.
>> >>>>> Unfortunately, there are two gaps that fall out of this.
>> >>>>>
>> >>>>> 1. *Threat Triage Transparency*
>> >>>>>
>> >>>>> There is little transparency into the Threat Triage process itself.
>> >> When
>> >>>>> Threat Triage runs, all I get is a score.  I don't know how that
>> score
>> >>>> was
>> >>>>> arrived at, which rules were triggered, and the specific values that
>> >>>> caused
>> >>>>> a rule to trigger.
>> >>>>>
>> >>>>> More specifically, there is no way to generate a message that looks
>> >> like
>> >>>>> "The host 'powned.svr.bank.com' has '230' inbound flows, exceeding
>> the
>> >>>>> threshold of '202'".
>> >>>>>
>> >>>>>
>> >>>>> 2. *Triage Calculated Values from the Profiler*
>> >>>>>
>> >>>>> Also, the value being interrogated here, the number of inbound
>> flows,
>> >> is
>> >>>>> not a static value contained within any single telemetry message.
>> This
>> >>>>> value is calculated across multiple messages by the Profiler.  The
>> >>>> current
>> >>>>> Threat Triage process cannot be used to interrogate values
>> calculated
>> >> by
>> >>>>> the Profiler.
>> >>>>>
>> >>>>>
>> >>>>> To try and keep this email concise and digestible, I am going to
>> send a
>> >>>>> follow-on discussing proposed solutions for each of these
>> separately.
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Nick Allen <ni...@nickallen.org>
>> >>>>
>> >>
>> >>
>>
>>
>

Re: [Discuss] Improve Alerting

Posted by Nick Allen <ni...@nickallen.org>.

Oh, I see.  Yes, very useful.


On Thu, Feb 2, 2017 at 9:39 AM, Simon Elliston Ball <
simon@simonellistonball.com> wrote:

> That’s a part of it, certainly (and fixes another of my bug bears, so
> thank you!)
>
> In addition to the aggregation being stellar, I want score to be a stellar
> statement, I’ve put in a separate ticket for that.
> https://issues.apache.org/jira/browse/METRON-685 <
> https://issues.apache.org/jira/browse/METRON-685>
>
> Simon
>
> > On 2 Feb 2017, at 14:31, Nick Allen <ni...@nickallen.org> wrote:
> >
> >> I would much rather be able to say something like score = some stellar
> >> statement that returns a float...
> >
> >
> > Completely agree.  FYI - We added METRON-683 yesterday that I believe
> > supports what you are saying.  Feel free to add commentary.
> >
> > https://issues.apache.org/jira/browse/METRON-683
> >
> > On Thu, Feb 2, 2017 at 9:02 AM, Simon Elliston Ball <
> > simon@simonellistonball.com> wrote:
> >
> >> I completely agree with Nick’s transparency comments, and like the
> design
> >> of the configuration, especially provision for messaging around the
> nature
> >> of the rule fired.
> >>
> >> I would just like to add a small point on the capabilities here. If the
> >> message could have embedded values through some sort of template for a
> >> stellar statement, it would make for a better more dynamic alert reason.
> >>
> >> I would also like to see the score field capable of outputting the value
> >> of a stellar statement. At the moment the idea of a static score being
> >> passed on means that if I have a probabilistic result I want to combine
> >> with other triage sources, I have to do a lot of bucketing into fixed
> >> values. I would much rather be able to say something like score = some
> >> stellar statement that returns a float, ‘alertness' = threshold of this.
> >> That way I can combine multiple triage rules to trigger an overall
> alert,
> >> making the aggregators more meaningful.
> >>
> >> Simon
> >>
> >>
> >>> On 2 Feb 2017, at 12:40, Carolyn Duby <cd...@hortonworks.com> wrote:
> >>>
> >>> For profiler alerts it will be helpful during analysis to see the
> alerts
> >> that caused the anomaly.  The meta alert is useful for incidents
> involving
> >> correlation of multiple events.
> >>>
> >>> Also you will need to filter out known hosts that trigger anomalies.
> >> For example vulnerability scanning software.
> >>>
> >>> One final thing to consider is anomalies happen every day without a
> >> security incident.  Depending on the network the profiler alerts could
> get
> >> very noisy so it might be better to correlate profiler alerts with other
> >> alerts.
> >>>
> >>> Thanks
> >>> Carolyn
> >>>
> >>>
> >>>
> >>> Sent from my Verizon, Samsung Galaxy smartphone
> >>>
> >>>
> >>> -------- Original message --------
> >>> From: Casey Stella <ce...@gmail.com>
> >>> Date: 2/1/17 2:28 PM (GMT-05:00)
> >>> To: dev@metron.incubator.apache.org
> >>> Subject: Re: [Discuss] Improve Alerting
> >>>
> >>> I like the direction.  One thing that we may want is for comment to
> just
> >> be
> >>> a stellar expression and construct a function to essentially do
> >>> String.format().  So, that'd become:
> >>> "triageConfig" : {
> >>> "riskLevelRules" : [
> >>>   {
> >>>     "name" : "Abnormal Value",
> >>>     "comment" : "FORMAT('For %s; the value %s exceeds threshold of %d',
> >>> hostname, value, value_threshold)"
> >>>     "rule" : "value > value_threshold",
> >>>     "score" : 10
> >>>   }
> >>> ],
> >>> "aggregator" : "MAX"
> >>> }
> >>>
> >>> The reason:
> >>>
> >>>  - It's integrated and stellar is our default scripting layer
> >>>  - It supports doing some computation in the message
> >>>
> >>>
> >>> On Wed, Feb 1, 2017 at 2:21 PM, Nick Allen <ni...@nickallen.org> wrote:
> >>>
> >>>> Like I said, here is a proposed solution to one of the gaps I
> >> identified in
> >>>> the previous email.
> >>>>
> >>>> *Problem*
> >>>>
> >>>> There is little transparency into the Threat Triage process itself.
> >> When
> >>>> Threat Triage runs, all I get is a score.  I don't know how that score
> >> was
> >>>> arrived at, which rules were triggered, and the specific values that
> >> caused
> >>>> a rule to trigger.
> >>>>
> >>>> More specifically, there is no way to generate a message that looks
> like
> >>>> "The host 'powned.svr.bank.com' has '230' inbound flows, exceeding
> the
> >>>> threshold of '202'".  This makes it difficult for an analyst to action
> >> the
> >>>> alert.
> >>>>
> >>>> *Proposed Solution*
> >>>>
> >>>> To improve the transparency of the Threat Triage process, I am
> proposing
> >>>> these enhancements.
> >>>>
> >>>> 1. Threat Triage should attach to each message all of the rules that
> >> fired
> >>>> in addition to the total calculated threat triage score.
> >>>>
> >>>> 2. Threat Triage should allow a custom message to be generated for
> each
> >>>> rule.  The custom message would allow for some form of string
> >> interpolation
> >>>> so that I can add specific values from each message to the generated
> >>>> alert.  We could allow this in one or both of the new fields that
> Casey
> >>>> just added, name and comment.
> >>>>
> >>>>
> >>>> *Example*
> >>>>
> >>>> 1. In this example, we have a telemetry message with a field called
> >> 'value'
> >>>> that we need to monitor.  In Enrichment, I calculate some sort of
> value
> >>>> threshold, over which an alert should be generated.
> >>>>
> >>>>
> >>>> 2. In Threat Triage, I use the calculated value threshold to alert on
> >> any
> >>>> message that has a value exceeding this threshold.
> >>>>
> >>>> 3. I can embed values from the message, like the hostname, value, and
> >> value
> >>>> threshold, into the alert produced by Threat Triage.  Notice that I am
> >>>> using ${this} for string interpolation, but it could be any syntax
> that
> >> we
> >>>> choose.
> >>>>
> >>>>
> >>>> "triageConfig" : {
> >>>> "riskLevelRules" : [
> >>>>   {
> >>>>     "name" : "Abnormal Value",
> >>>>     "comment" : "For ${hostname}; the value ${value} exceeds threshold
> >> of
> >>>> ${value_threshold}",
> >>>>     "rule" : "value > value_threshold",
> >>>>     "score" : 10
> >>>>   }
> >>>> ],
> >>>> "aggregator" : "MAX"
> >>>> }
> >>>>
> >>>>
> >>>> 4. The Threat Triage process today would add only the total calculated
> >>>> score.
> >>>>
> >>>> "threat.triage.level": 10.0
> >>>>
> >>>>
> >>>> With this proposal, Threat Triage would add the following to the
> >> message.
> >>>>
> >>>> Notice how each of the ${variables} have been replaced with the actual
> >>>> values extracted from the message.  This allows for more contextual
> >>>> information to action the alert.
> >>>>
> >>>> "threat.triage": {
> >>>>   "score": 10.0,
> >>>>   "rules": [
> >>>>     {
> >>>>       "name": "Abnormal Value",
> >>>>       "comment" : "For 10.0.0.1; the value 101 exceeds threshold of
> >> 42",
> >>>>       "score" : 10
> >>>>     }
> >>>>   ]
> >>>> }
> >>>>
> >>>>
> >>>>
> >>>> What do you think?  Any alternative ideas?
> >>>>
> >>>>
> >>>>
> >>>> On Wed, Feb 1, 2017 at 2:11 PM, Nick Allen <ni...@nickallen.org>
> wrote:
> >>>>
> >>>>> I'd like to explore the functionality that we have in Metron using a
> >>>>> motivating example.  I think this will help highlight some gaps where
> >> we
> >>>>> can enhance Metron.
> >>>>>
> >>>>> The motivating example is that I would like to create an alert if the
> >>>>> number of inbound flows to any host over a 15 minute interval is
> >>>> abnormal.
> >>>>> I would like the alert to contain the specific information below to
> >>>>> streamline the triage process.
> >>>>>
> >>>>> Rule: Abnormal number of inbound flows
> >>>>> Bin: 15 mins
> >>>>> Alert: The host 'powned.svr.bank.com' has '230' inbound flows,
> >> exceeding
> >>>>> the threshold of '202'
> >>>>>
> >>>>>
> >>>>> *What Works*
> >>>>>
> >>>>> In some ways, this example is similar to the "Outlier Detection" demo
> >>>> that
> >>>>> I performed with the Profiler a few months back.   We have most of
> what
> >>>> we
> >>>>> need to do this with a couple caveats.
> >>>>>
> >>>>> 1. An enrichment would be added to enrich the message with the
> correct
> >>>>> internal hostname 'powned.svr.bank.com'.
> >>>>>
> >>>>> 2. With the Profiler, I can capture some idea of what "normal" is for
> >> the
> >>>>> number of inbound flows across 15 minute intervals.
> >>>>> 3. With Threat Triage, I can create rules that alert when a value
> >> exceeds
> >>>>> what the Profiler defines as normal.
> >>>>>
> >>>>>
> >>>>> *What's Missing*
> >>>>>
> >>>>> Its nice to know that we are almost all the way there with this
> >> example.
> >>>>> Unfortunately, there are two gaps that fall out of this.
> >>>>>
> >>>>> 1. *Threat Triage Transparency*
> >>>>>
> >>>>> There is little transparency into the Threat Triage process itself.
> >> When
> >>>>> Threat Triage runs, all I get is a score.  I don't know how that
> score
> >>>> was
> >>>>> arrived at, which rules were triggered, and the specific values that
> >>>> caused
> >>>>> a rule to trigger.
> >>>>>
> >>>>> More specifically, there is no way to generate a message that looks
> >> like
> >>>>> "The host 'powned.svr.bank.com' has '230' inbound flows, exceeding
> the
> >>>>> threshold of '202'".
> >>>>>
> >>>>>
> >>>>> 2. *Triage Calculated Values from the Profiler*
> >>>>>
> >>>>> Also, the value being interrogated here, the number of inbound flows,
> >> is
> >>>>> not a static value contained within any single telemetry message.
> This
> >>>>> value is calculated across multiple messages by the Profiler.  The
> >>>> current
> >>>>> Threat Triage process cannot be used to interrogate values calculated
> >> by
> >>>>> the Profiler.
> >>>>>
> >>>>>
> >>>>> To try and keep this email concise and digestible, I am going to
> send a
> >>>>> follow-on discussing proposed solutions for each of these separately.
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Nick Allen <ni...@nickallen.org>
> >>>>
> >>
> >>
>
>

Re: [Discuss] Improve Alerting

Posted by Simon Elliston Ball <si...@simonellistonball.com>.

That’s a part of it, certainly (and fixes another of my bug bears, so thank you!) 

In addition to the aggregation being stellar, I want score to be a stellar statement, I’ve put in a separate ticket for that. https://issues.apache.org/jira/browse/METRON-685 <https://issues.apache.org/jira/browse/METRON-685>

Simon

> On 2 Feb 2017, at 14:31, Nick Allen <ni...@nickallen.org> wrote:
> 
>> I would much rather be able to say something like score = some stellar
>> statement that returns a float...
> 
> 
> Completely agree.  FYI - We added METRON-683 yesterday that I believe
> supports what you are saying.  Feel free to add commentary.
> 
> https://issues.apache.org/jira/browse/METRON-683
> 
> On Thu, Feb 2, 2017 at 9:02 AM, Simon Elliston Ball <
> simon@simonellistonball.com> wrote:
> 
>> I completely agree with Nick’s transparency comments, and like the design
>> of the configuration, especially provision for messaging around the nature
>> of the rule fired.
>> 
>> I would just like to add a small point on the capabilities here. If the
>> message could have embedded values through some sort of template for a
>> stellar statement, it would make for a better more dynamic alert reason.
>> 
>> I would also like to see the score field capable of outputting the value
>> of a stellar statement. At the moment the idea of a static score being
>> passed on means that if I have a probabilistic result I want to combine
>> with other triage sources, I have to do a lot of bucketing into fixed
>> values. I would much rather be able to say something like score = some
>> stellar statement that returns a float, ‘alertness' = threshold of this.
>> That way I can combine multiple triage rules to trigger an overall alert,
>> making the aggregators more meaningful.
>> 
>> Simon
>> 
>> 
>>> On 2 Feb 2017, at 12:40, Carolyn Duby <cd...@hortonworks.com> wrote:
>>> 
>>> For profiler alerts it will be helpful during analysis to see the alerts
>> that caused the anomaly.  The meta alert is useful for incidents involving
>> correlation of multiple events.
>>> 
>>> Also you will need to filter out known hosts that trigger anomalies.
>> For example vulnerability scanning software.
>>> 
>>> One final thing to consider is anomalies happen every day without a
>> security incident.  Depending on the network the profiler alerts could get
>> very noisy so it might be better to correlate profiler alerts with other
>> alerts.
>>> 
>>> Thanks
>>> Carolyn
>>> 
>>> 
>>> 
>>> Sent from my Verizon, Samsung Galaxy smartphone
>>> 
>>> 
>>> -------- Original message --------
>>> From: Casey Stella <ce...@gmail.com>
>>> Date: 2/1/17 2:28 PM (GMT-05:00)
>>> To: dev@metron.incubator.apache.org
>>> Subject: Re: [Discuss] Improve Alerting
>>> 
>>> I like the direction.  One thing that we may want is for comment to just
>> be
>>> a stellar expression and construct a function to essentially do
>>> String.format().  So, that'd become:
>>> "triageConfig" : {
>>> "riskLevelRules" : [
>>>   {
>>>     "name" : "Abnormal Value",
>>>     "comment" : "FORMAT('For %s; the value %s exceeds threshold of %d',
>>> hostname, value, value_threshold)"
>>>     "rule" : "value > value_threshold",
>>>     "score" : 10
>>>   }
>>> ],
>>> "aggregator" : "MAX"
>>> }
>>> 
>>> The reason:
>>> 
>>>  - It's integrated and stellar is our default scripting layer
>>>  - It supports doing some computation in the message
>>> 
>>> 
>>> On Wed, Feb 1, 2017 at 2:21 PM, Nick Allen <ni...@nickallen.org> wrote:
>>> 
>>>> Like I said, here is a proposed solution to one of the gaps I
>> identified in
>>>> the previous email.
>>>> 
>>>> *Problem*
>>>> 
>>>> There is little transparency into the Threat Triage process itself.
>> When
>>>> Threat Triage runs, all I get is a score.  I don't know how that score
>> was
>>>> arrived at, which rules were triggered, and the specific values that
>> caused
>>>> a rule to trigger.
>>>> 
>>>> More specifically, there is no way to generate a message that looks like
>>>> "The host 'powned.svr.bank.com' has '230' inbound flows, exceeding the
>>>> threshold of '202'".  This makes it difficult for an analyst to action
>> the
>>>> alert.
>>>> 
>>>> *Proposed Solution*
>>>> 
>>>> To improve the transparency of the Threat Triage process, I am proposing
>>>> these enhancements.
>>>> 
>>>> 1. Threat Triage should attach to each message all of the rules that
>> fired
>>>> in addition to the total calculated threat triage score.
>>>> 
>>>> 2. Threat Triage should allow a custom message to be generated for each
>>>> rule.  The custom message would allow for some form of string
>> interpolation
>>>> so that I can add specific values from each message to the generated
>>>> alert.  We could allow this in one or both of the new fields that Casey
>>>> just added, name and comment.
>>>> 
>>>> 
>>>> *Example*
>>>> 
>>>> 1. In this example, we have a telemetry message with a field called
>> 'value'
>>>> that we need to monitor.  In Enrichment, I calculate some sort of value
>>>> threshold, over which an alert should be generated.
>>>> 
>>>> 
>>>> 2. In Threat Triage, I use the calculated value threshold to alert on
>> any
>>>> message that has a value exceeding this threshold.
>>>> 
>>>> 3. I can embed values from the message, like the hostname, value, and
>> value
>>>> threshold, into the alert produced by Threat Triage.  Notice that I am
>>>> using ${this} for string interpolation, but it could be any syntax that
>> we
>>>> choose.
>>>> 
>>>> 
>>>> "triageConfig" : {
>>>> "riskLevelRules" : [
>>>>   {
>>>>     "name" : "Abnormal Value",
>>>>     "comment" : "For ${hostname}; the value ${value} exceeds threshold
>> of
>>>> ${value_threshold}",
>>>>     "rule" : "value > value_threshold",
>>>>     "score" : 10
>>>>   }
>>>> ],
>>>> "aggregator" : "MAX"
>>>> }
>>>> 
>>>> 
>>>> 4. The Threat Triage process today would add only the total calculated
>>>> score.
>>>> 
>>>> "threat.triage.level": 10.0
>>>> 
>>>> 
>>>> With this proposal, Threat Triage would add the following to the
>> message.
>>>> 
>>>> Notice how each of the ${variables} have been replaced with the actual
>>>> values extracted from the message.  This allows for more contextual
>>>> information to action the alert.
>>>> 
>>>> "threat.triage": {
>>>>   "score": 10.0,
>>>>   "rules": [
>>>>     {
>>>>       "name": "Abnormal Value",
>>>>       "comment" : "For 10.0.0.1; the value 101 exceeds threshold of
>> 42",
>>>>       "score" : 10
>>>>     }
>>>>   ]
>>>> }
>>>> 
>>>> 
>>>> 
>>>> What do you think?  Any alternative ideas?
>>>> 
>>>> 
>>>> 
>>>> On Wed, Feb 1, 2017 at 2:11 PM, Nick Allen <ni...@nickallen.org> wrote:
>>>> 
>>>>> I'd like to explore the functionality that we have in Metron using a
>>>>> motivating example.  I think this will help highlight some gaps where
>> we
>>>>> can enhance Metron.
>>>>> 
>>>>> The motivating example is that I would like to create an alert if the
>>>>> number of inbound flows to any host over a 15 minute interval is
>>>> abnormal.
>>>>> I would like the alert to contain the specific information below to
>>>>> streamline the triage process.
>>>>> 
>>>>> Rule: Abnormal number of inbound flows
>>>>> Bin: 15 mins
>>>>> Alert: The host 'powned.svr.bank.com' has '230' inbound flows,
>> exceeding
>>>>> the threshold of '202'
>>>>> 
>>>>> 
>>>>> *What Works*
>>>>> 
>>>>> In some ways, this example is similar to the "Outlier Detection" demo
>>>> that
>>>>> I performed with the Profiler a few months back.   We have most of what
>>>> we
>>>>> need to do this with a couple caveats.
>>>>> 
>>>>> 1. An enrichment would be added to enrich the message with the correct
>>>>> internal hostname 'powned.svr.bank.com'.
>>>>> 
>>>>> 2. With the Profiler, I can capture some idea of what "normal" is for
>> the
>>>>> number of inbound flows across 15 minute intervals.
>>>>> 3. With Threat Triage, I can create rules that alert when a value
>> exceeds
>>>>> what the Profiler defines as normal.
>>>>> 
>>>>> 
>>>>> *What's Missing*
>>>>> 
>>>>> Its nice to know that we are almost all the way there with this
>> example.
>>>>> Unfortunately, there are two gaps that fall out of this.
>>>>> 
>>>>> 1. *Threat Triage Transparency*
>>>>> 
>>>>> There is little transparency into the Threat Triage process itself.
>> When
>>>>> Threat Triage runs, all I get is a score.  I don't know how that score
>>>> was
>>>>> arrived at, which rules were triggered, and the specific values that
>>>> caused
>>>>> a rule to trigger.
>>>>> 
>>>>> More specifically, there is no way to generate a message that looks
>> like
>>>>> "The host 'powned.svr.bank.com' has '230' inbound flows, exceeding the
>>>>> threshold of '202'".
>>>>> 
>>>>> 
>>>>> 2. *Triage Calculated Values from the Profiler*
>>>>> 
>>>>> Also, the value being interrogated here, the number of inbound flows,
>> is
>>>>> not a static value contained within any single telemetry message.  This
>>>>> value is calculated across multiple messages by the Profiler.  The
>>>> current
>>>>> Threat Triage process cannot be used to interrogate values calculated
>> by
>>>>> the Profiler.
>>>>> 
>>>>> 
>>>>> To try and keep this email concise and digestible, I am going to send a
>>>>> follow-on discussing proposed solutions for each of these separately.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> Nick Allen <ni...@nickallen.org>
>>>> 
>> 
>>

Re: [Discuss] Improve Alerting

Posted by Nick Allen <ni...@nickallen.org>.

> I would much rather be able to say something like score = some stellar
> statement that returns a float...


Completely agree.  FYI - We added METRON-683 yesterday that I believe
supports what you are saying.  Feel free to add commentary.

https://issues.apache.org/jira/browse/METRON-683

On Thu, Feb 2, 2017 at 9:02 AM, Simon Elliston Ball <
simon@simonellistonball.com> wrote:

> I completely agree with Nick’s transparency comments, and like the design
> of the configuration, especially provision for messaging around the nature
> of the rule fired.
>
> I would just like to add a small point on the capabilities here. If the
> message could have embedded values through some sort of template for a
> stellar statement, it would make for a better more dynamic alert reason.
>
> I would also like to see the score field capable of outputting the value
> of a stellar statement. At the moment the idea of a static score being
> passed on means that if I have a probabilistic result I want to combine
> with other triage sources, I have to do a lot of bucketing into fixed
> values. I would much rather be able to say something like score = some
> stellar statement that returns a float, ‘alertness' = threshold of this.
> That way I can combine multiple triage rules to trigger an overall alert,
> making the aggregators more meaningful.
>
> Simon
>
>
> > On 2 Feb 2017, at 12:40, Carolyn Duby <cd...@hortonworks.com> wrote:
> >
> > For profiler alerts it will be helpful during analysis to see the alerts
> that caused the anomaly.  The meta alert is useful for incidents involving
> correlation of multiple events.
> >
> > Also you will need to filter out known hosts that trigger anomalies.
> For example vulnerability scanning software.
> >
> > One final thing to consider is anomalies happen every day without a
> security incident.  Depending on the network the profiler alerts could get
> very noisy so it might be better to correlate profiler alerts with other
> alerts.
> >
> > Thanks
> > Carolyn
> >
> >
> >
> > Sent from my Verizon, Samsung Galaxy smartphone
> >
> >
> > -------- Original message --------
> > From: Casey Stella <ce...@gmail.com>
> > Date: 2/1/17 2:28 PM (GMT-05:00)
> > To: dev@metron.incubator.apache.org
> > Subject: Re: [Discuss] Improve Alerting
> >
> > I like the direction.  One thing that we may want is for comment to just
> be
> > a stellar expression and construct a function to essentially do
> > String.format().  So, that'd become:
> > "triageConfig" : {
> >  "riskLevelRules" : [
> >    {
> >      "name" : "Abnormal Value",
> >      "comment" : "FORMAT('For %s; the value %s exceeds threshold of %d',
> > hostname, value, value_threshold)"
> >      "rule" : "value > value_threshold",
> >      "score" : 10
> >    }
> >  ],
> >  "aggregator" : "MAX"
> > }
> >
> > The reason:
> >
> >   - It's integrated and stellar is our default scripting layer
> >   - It supports doing some computation in the message
> >
> >
> > On Wed, Feb 1, 2017 at 2:21 PM, Nick Allen <ni...@nickallen.org> wrote:
> >
> >> Like I said, here is a proposed solution to one of the gaps I
> identified in
> >> the previous email.
> >>
> >> *Problem*
> >>
> >> There is little transparency into the Threat Triage process itself.
> When
> >> Threat Triage runs, all I get is a score.  I don't know how that score
> was
> >> arrived at, which rules were triggered, and the specific values that
> caused
> >> a rule to trigger.
> >>
> >> More specifically, there is no way to generate a message that looks like
> >> "The host 'powned.svr.bank.com' has '230' inbound flows, exceeding the
> >> threshold of '202'".  This makes it difficult for an analyst to action
> the
> >> alert.
> >>
> >> *Proposed Solution*
> >>
> >> To improve the transparency of the Threat Triage process, I am proposing
> >> these enhancements.
> >>
> >> 1. Threat Triage should attach to each message all of the rules that
> fired
> >> in addition to the total calculated threat triage score.
> >>
> >> 2. Threat Triage should allow a custom message to be generated for each
> >> rule.  The custom message would allow for some form of string
> interpolation
> >> so that I can add specific values from each message to the generated
> >> alert.  We could allow this in one or both of the new fields that Casey
> >> just added, name and comment.
> >>
> >>
> >> *Example*
> >>
> >> 1. In this example, we have a telemetry message with a field called
> 'value'
> >> that we need to monitor.  In Enrichment, I calculate some sort of value
> >> threshold, over which an alert should be generated.
> >>
> >>
> >> 2. In Threat Triage, I use the calculated value threshold to alert on
> any
> >> message that has a value exceeding this threshold.
> >>
> >> 3. I can embed values from the message, like the hostname, value, and
> value
> >> threshold, into the alert produced by Threat Triage.  Notice that I am
> >> using ${this} for string interpolation, but it could be any syntax that
> we
> >> choose.
> >>
> >>
> >> "triageConfig" : {
> >>  "riskLevelRules" : [
> >>    {
> >>      "name" : "Abnormal Value",
> >>      "comment" : "For ${hostname}; the value ${value} exceeds threshold
> of
> >> ${value_threshold}",
> >>      "rule" : "value > value_threshold",
> >>      "score" : 10
> >>    }
> >>  ],
> >>  "aggregator" : "MAX"
> >> }
> >>
> >>
> >> 4. The Threat Triage process today would add only the total calculated
> >> score.
> >>
> >> "threat.triage.level": 10.0
> >>
> >>
> >> With this proposal, Threat Triage would add the following to the
> message.
> >>
> >> Notice how each of the ${variables} have been replaced with the actual
> >> values extracted from the message.  This allows for more contextual
> >> information to action the alert.
> >>
> >> "threat.triage": {
> >>    "score": 10.0,
> >>    "rules": [
> >>      {
> >>        "name": "Abnormal Value",
> >>        "comment" : "For 10.0.0.1; the value 101 exceeds threshold of
> 42",
> >>        "score" : 10
> >>      }
> >>    ]
> >> }
> >>
> >>
> >>
> >> What do you think?  Any alternative ideas?
> >>
> >>
> >>
> >> On Wed, Feb 1, 2017 at 2:11 PM, Nick Allen <ni...@nickallen.org> wrote:
> >>
> >>> I'd like to explore the functionality that we have in Metron using a
> >>> motivating example.  I think this will help highlight some gaps where
> we
> >>> can enhance Metron.
> >>>
> >>> The motivating example is that I would like to create an alert if the
> >>> number of inbound flows to any host over a 15 minute interval is
> >> abnormal.
> >>> I would like the alert to contain the specific information below to
> >>> streamline the triage process.
> >>>
> >>> Rule: Abnormal number of inbound flows
> >>> Bin: 15 mins
> >>> Alert: The host 'powned.svr.bank.com' has '230' inbound flows,
> exceeding
> >>> the threshold of '202'
> >>>
> >>>
> >>> *What Works*
> >>>
> >>> In some ways, this example is similar to the "Outlier Detection" demo
> >> that
> >>> I performed with the Profiler a few months back.   We have most of what
> >> we
> >>> need to do this with a couple caveats.
> >>>
> >>> 1. An enrichment would be added to enrich the message with the correct
> >>> internal hostname 'powned.svr.bank.com'.
> >>>
> >>> 2. With the Profiler, I can capture some idea of what "normal" is for
> the
> >>> number of inbound flows across 15 minute intervals.
> >>> 3. With Threat Triage, I can create rules that alert when a value
> exceeds
> >>> what the Profiler defines as normal.
> >>>
> >>>
> >>> *What's Missing*
> >>>
> >>> Its nice to know that we are almost all the way there with this
> example.
> >>> Unfortunately, there are two gaps that fall out of this.
> >>>
> >>> 1. *Threat Triage Transparency*
> >>>
> >>> There is little transparency into the Threat Triage process itself.
> When
> >>> Threat Triage runs, all I get is a score.  I don't know how that score
> >> was
> >>> arrived at, which rules were triggered, and the specific values that
> >> caused
> >>> a rule to trigger.
> >>>
> >>> More specifically, there is no way to generate a message that looks
> like
> >>> "The host 'powned.svr.bank.com' has '230' inbound flows, exceeding the
> >>> threshold of '202'".
> >>>
> >>>
> >>> 2. *Triage Calculated Values from the Profiler*
> >>>
> >>> Also, the value being interrogated here, the number of inbound flows,
> is
> >>> not a static value contained within any single telemetry message.  This
> >>> value is calculated across multiple messages by the Profiler.  The
> >> current
> >>> Threat Triage process cannot be used to interrogate values calculated
> by
> >>> the Profiler.
> >>>
> >>>
> >>> To try and keep this email concise and digestible, I am going to send a
> >>> follow-on discussing proposed solutions for each of these separately.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >>
> >> --
> >> Nick Allen <ni...@nickallen.org>
> >>
>
>

Re: [Discuss] Improve Alerting

Posted by Simon Elliston Ball <si...@simonellistonball.com>.

I completely agree with Nick’s transparency comments, and like the design of the configuration, especially provision for messaging around the nature of the rule fired. 

I would just like to add a small point on the capabilities here. If the message could have embedded values through some sort of template for a stellar statement, it would make for a better more dynamic alert reason. 

I would also like to see the score field capable of outputting the value of a stellar statement. At the moment the idea of a static score being passed on means that if I have a probabilistic result I want to combine with other triage sources, I have to do a lot of bucketing into fixed values. I would much rather be able to say something like score = some stellar statement that returns a float, ‘alertness' = threshold of this. That way I can combine multiple triage rules to trigger an overall alert, making the aggregators more meaningful.

Simon


> On 2 Feb 2017, at 12:40, Carolyn Duby <cd...@hortonworks.com> wrote:
> 
> For profiler alerts it will be helpful during analysis to see the alerts that caused the anomaly.  The meta alert is useful for incidents involving correlation of multiple events.
> 
> Also you will need to filter out known hosts that trigger anomalies.  For example vulnerability scanning software.
> 
> One final thing to consider is anomalies happen every day without a security incident.  Depending on the network the profiler alerts could get very noisy so it might be better to correlate profiler alerts with other alerts.
> 
> Thanks
> Carolyn
> 
> 
> 
> Sent from my Verizon, Samsung Galaxy smartphone
> 
> 
> -------- Original message --------
> From: Casey Stella <ce...@gmail.com>
> Date: 2/1/17 2:28 PM (GMT-05:00)
> To: dev@metron.incubator.apache.org
> Subject: Re: [Discuss] Improve Alerting
> 
> I like the direction.  One thing that we may want is for comment to just be
> a stellar expression and construct a function to essentially do
> String.format().  So, that'd become:
> "triageConfig" : {
>  "riskLevelRules" : [
>    {
>      "name" : "Abnormal Value",
>      "comment" : "FORMAT('For %s; the value %s exceeds threshold of %d',
> hostname, value, value_threshold)"
>      "rule" : "value > value_threshold",
>      "score" : 10
>    }
>  ],
>  "aggregator" : "MAX"
> }
> 
> The reason:
> 
>   - It's integrated and stellar is our default scripting layer
>   - It supports doing some computation in the message
> 
> 
> On Wed, Feb 1, 2017 at 2:21 PM, Nick Allen <ni...@nickallen.org> wrote:
> 
>> Like I said, here is a proposed solution to one of the gaps I identified in
>> the previous email.
>> 
>> *Problem*
>> 
>> There is little transparency into the Threat Triage process itself.  When
>> Threat Triage runs, all I get is a score.  I don't know how that score was
>> arrived at, which rules were triggered, and the specific values that caused
>> a rule to trigger.
>> 
>> More specifically, there is no way to generate a message that looks like
>> "The host 'powned.svr.bank.com' has '230' inbound flows, exceeding the
>> threshold of '202'".  This makes it difficult for an analyst to action the
>> alert.
>> 
>> *Proposed Solution*
>> 
>> To improve the transparency of the Threat Triage process, I am proposing
>> these enhancements.
>> 
>> 1. Threat Triage should attach to each message all of the rules that fired
>> in addition to the total calculated threat triage score.
>> 
>> 2. Threat Triage should allow a custom message to be generated for each
>> rule.  The custom message would allow for some form of string interpolation
>> so that I can add specific values from each message to the generated
>> alert.  We could allow this in one or both of the new fields that Casey
>> just added, name and comment.
>> 
>> 
>> *Example*
>> 
>> 1. In this example, we have a telemetry message with a field called 'value'
>> that we need to monitor.  In Enrichment, I calculate some sort of value
>> threshold, over which an alert should be generated.
>> 
>> 
>> 2. In Threat Triage, I use the calculated value threshold to alert on any
>> message that has a value exceeding this threshold.
>> 
>> 3. I can embed values from the message, like the hostname, value, and value
>> threshold, into the alert produced by Threat Triage.  Notice that I am
>> using ${this} for string interpolation, but it could be any syntax that we
>> choose.
>> 
>> 
>> "triageConfig" : {
>>  "riskLevelRules" : [
>>    {
>>      "name" : "Abnormal Value",
>>      "comment" : "For ${hostname}; the value ${value} exceeds threshold of
>> ${value_threshold}",
>>      "rule" : "value > value_threshold",
>>      "score" : 10
>>    }
>>  ],
>>  "aggregator" : "MAX"
>> }
>> 
>> 
>> 4. The Threat Triage process today would add only the total calculated
>> score.
>> 
>> "threat.triage.level": 10.0
>> 
>> 
>> With this proposal, Threat Triage would add the following to the message.
>> 
>> Notice how each of the ${variables} have been replaced with the actual
>> values extracted from the message.  This allows for more contextual
>> information to action the alert.
>> 
>> "threat.triage": {
>>    "score": 10.0,
>>    "rules": [
>>      {
>>        "name": "Abnormal Value",
>>        "comment" : "For 10.0.0.1; the value 101 exceeds threshold of 42",
>>        "score" : 10
>>      }
>>    ]
>> }
>> 
>> 
>> 
>> What do you think?  Any alternative ideas?
>> 
>> 
>> 
>> On Wed, Feb 1, 2017 at 2:11 PM, Nick Allen <ni...@nickallen.org> wrote:
>> 
>>> I'd like to explore the functionality that we have in Metron using a
>>> motivating example.  I think this will help highlight some gaps where we
>>> can enhance Metron.
>>> 
>>> The motivating example is that I would like to create an alert if the
>>> number of inbound flows to any host over a 15 minute interval is
>> abnormal.
>>> I would like the alert to contain the specific information below to
>>> streamline the triage process.
>>> 
>>> Rule: Abnormal number of inbound flows
>>> Bin: 15 mins
>>> Alert: The host 'powned.svr.bank.com' has '230' inbound flows, exceeding
>>> the threshold of '202'
>>> 
>>> 
>>> *What Works*
>>> 
>>> In some ways, this example is similar to the "Outlier Detection" demo
>> that
>>> I performed with the Profiler a few months back.   We have most of what
>> we
>>> need to do this with a couple caveats.
>>> 
>>> 1. An enrichment would be added to enrich the message with the correct
>>> internal hostname 'powned.svr.bank.com'.
>>> 
>>> 2. With the Profiler, I can capture some idea of what "normal" is for the
>>> number of inbound flows across 15 minute intervals.
>>> 3. With Threat Triage, I can create rules that alert when a value exceeds
>>> what the Profiler defines as normal.
>>> 
>>> 
>>> *What's Missing*
>>> 
>>> Its nice to know that we are almost all the way there with this example.
>>> Unfortunately, there are two gaps that fall out of this.
>>> 
>>> 1. *Threat Triage Transparency*
>>> 
>>> There is little transparency into the Threat Triage process itself.  When
>>> Threat Triage runs, all I get is a score.  I don't know how that score
>> was
>>> arrived at, which rules were triggered, and the specific values that
>> caused
>>> a rule to trigger.
>>> 
>>> More specifically, there is no way to generate a message that looks like
>>> "The host 'powned.svr.bank.com' has '230' inbound flows, exceeding the
>>> threshold of '202'".
>>> 
>>> 
>>> 2. *Triage Calculated Values from the Profiler*
>>> 
>>> Also, the value being interrogated here, the number of inbound flows, is
>>> not a static value contained within any single telemetry message.  This
>>> value is calculated across multiple messages by the Profiler.  The
>> current
>>> Threat Triage process cannot be used to interrogate values calculated by
>>> the Profiler.
>>> 
>>> 
>>> To try and keep this email concise and digestible, I am going to send a
>>> follow-on discussing proposed solutions for each of these separately.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> --
>> Nick Allen <ni...@nickallen.org>
>>

RE: [Discuss] Improve Alerting

Posted by Carolyn Duby <cd...@hortonworks.com>.

For profiler alerts it will be helpful during analysis to see the alerts that caused the anomaly.  The meta alert is useful for incidents involving correlation of multiple events.

Also you will need to filter out known hosts that trigger anomalies.  For example vulnerability scanning software.

One final thing to consider is anomalies happen every day without a security incident.  Depending on the network the profiler alerts could get very noisy so it might be better to correlate profiler alerts with other alerts.

Thanks
Carolyn



Sent from my Verizon, Samsung Galaxy smartphone


-------- Original message --------
From: Casey Stella <ce...@gmail.com>
Date: 2/1/17 2:28 PM (GMT-05:00)
To: dev@metron.incubator.apache.org
Subject: Re: [Discuss] Improve Alerting

I like the direction.  One thing that we may want is for comment to just be
a stellar expression and construct a function to essentially do
String.format().  So, that'd become:
"triageConfig" : {
  "riskLevelRules" : [
    {
      "name" : "Abnormal Value",
      "comment" : "FORMAT('For %s; the value %s exceeds threshold of %d',
hostname, value, value_threshold)"
      "rule" : "value > value_threshold",
      "score" : 10
    }
  ],
  "aggregator" : "MAX"
}

The reason:

   - It's integrated and stellar is our default scripting layer
   - It supports doing some computation in the message


On Wed, Feb 1, 2017 at 2:21 PM, Nick Allen <ni...@nickallen.org> wrote:

> Like I said, here is a proposed solution to one of the gaps I identified in
> the previous email.
>
> *Problem*
>
> There is little transparency into the Threat Triage process itself.  When
> Threat Triage runs, all I get is a score.  I don't know how that score was
> arrived at, which rules were triggered, and the specific values that caused
> a rule to trigger.
>
> More specifically, there is no way to generate a message that looks like
> "The host 'powned.svr.bank.com' has '230' inbound flows, exceeding the
> threshold of '202'".  This makes it difficult for an analyst to action the
> alert.
>
> *Proposed Solution*
>
> To improve the transparency of the Threat Triage process, I am proposing
> these enhancements.
>
> 1. Threat Triage should attach to each message all of the rules that fired
> in addition to the total calculated threat triage score.
>
> 2. Threat Triage should allow a custom message to be generated for each
> rule.  The custom message would allow for some form of string interpolation
> so that I can add specific values from each message to the generated
> alert.  We could allow this in one or both of the new fields that Casey
> just added, name and comment.
>
>
> *Example*
>
> 1. In this example, we have a telemetry message with a field called 'value'
> that we need to monitor.  In Enrichment, I calculate some sort of value
> threshold, over which an alert should be generated.
>
>
> 2. In Threat Triage, I use the calculated value threshold to alert on any
> message that has a value exceeding this threshold.
>
> 3. I can embed values from the message, like the hostname, value, and value
> threshold, into the alert produced by Threat Triage.  Notice that I am
> using ${this} for string interpolation, but it could be any syntax that we
> choose.
>
>
> "triageConfig" : {
>   "riskLevelRules" : [
>     {
>       "name" : "Abnormal Value",
>       "comment" : "For ${hostname}; the value ${value} exceeds threshold of
> ${value_threshold}",
>       "rule" : "value > value_threshold",
>       "score" : 10
>     }
>   ],
>   "aggregator" : "MAX"
> }
>
>
> 4. The Threat Triage process today would add only the total calculated
> score.
>
> "threat.triage.level": 10.0
>
>
> With this proposal, Threat Triage would add the following to the message.
>
> Notice how each of the ${variables} have been replaced with the actual
> values extracted from the message.  This allows for more contextual
> information to action the alert.
>
> "threat.triage": {
>     "score": 10.0,
>     "rules": [
>       {
>         "name": "Abnormal Value",
>         "comment" : "For 10.0.0.1; the value 101 exceeds threshold of 42",
>         "score" : 10
>       }
>     ]
> }
>
>
>
> What do you think?  Any alternative ideas?
>
>
>
> On Wed, Feb 1, 2017 at 2:11 PM, Nick Allen <ni...@nickallen.org> wrote:
>
> > I'd like to explore the functionality that we have in Metron using a
> > motivating example.  I think this will help highlight some gaps where we
> > can enhance Metron.
> >
> > The motivating example is that I would like to create an alert if the
> > number of inbound flows to any host over a 15 minute interval is
> abnormal.
> > I would like the alert to contain the specific information below to
> > streamline the triage process.
> >
> > Rule: Abnormal number of inbound flows
> > Bin: 15 mins
> > Alert: The host 'powned.svr.bank.com' has '230' inbound flows, exceeding
> > the threshold of '202'
> >
> >
> > *What Works*
> >
> > In some ways, this example is similar to the "Outlier Detection" demo
> that
> > I performed with the Profiler a few months back.   We have most of what
> we
> > need to do this with a couple caveats.
> >
> > 1. An enrichment would be added to enrich the message with the correct
> > internal hostname 'powned.svr.bank.com'.
> >
> > 2. With the Profiler, I can capture some idea of what "normal" is for the
> > number of inbound flows across 15 minute intervals.
> > 3. With Threat Triage, I can create rules that alert when a value exceeds
> > what the Profiler defines as normal.
> >
> >
> > *What's Missing*
> >
> > Its nice to know that we are almost all the way there with this example.
> > Unfortunately, there are two gaps that fall out of this.
> >
> >  1. *Threat Triage Transparency*
> >
> > There is little transparency into the Threat Triage process itself.  When
> > Threat Triage runs, all I get is a score.  I don't know how that score
> was
> > arrived at, which rules were triggered, and the specific values that
> caused
> > a rule to trigger.
> >
> > More specifically, there is no way to generate a message that looks like
> > "The host 'powned.svr.bank.com' has '230' inbound flows, exceeding the
> > threshold of '202'".
> >
> >
> > 2. *Triage Calculated Values from the Profiler*
> >
> > Also, the value being interrogated here, the number of inbound flows, is
> > not a static value contained within any single telemetry message.  This
> > value is calculated across multiple messages by the Profiler.  The
> current
> > Threat Triage process cannot be used to interrogate values calculated by
> > the Profiler.
> >
> >
> > To try and keep this email concise and digestible, I am going to send a
> > follow-on discussing proposed solutions for each of these separately.
> >
> >
> >
> >
> >
> >
>
>
> --
> Nick Allen <ni...@nickallen.org>
>

Re: [Discuss] Improve Alerting

Posted by Nick Allen <ni...@nickallen.org>.

I would be open to doing it this way.  It makes the implementation simpelr.

The only problem with that is if I want to just use a string, I would have
to embed a quote.

"comment": " 'This is my rule comment with no values'"


Not really a big deal though.



On Wed, Feb 1, 2017 at 2:28 PM, Casey Stella <ce...@gmail.com> wrote:

> I like the direction.  One thing that we may want is for comment to just be
> a stellar expression and construct a function to essentially do
> String.format().  So, that'd become:
> "triageConfig" : {
>   "riskLevelRules" : [
>     {
>       "name" : "Abnormal Value",
>       "comment" : "FORMAT('For %s; the value %s exceeds threshold of %d',
> hostname, value, value_threshold)"
>       "rule" : "value > value_threshold",
>       "score" : 10
>     }
>   ],
>   "aggregator" : "MAX"
> }
>
> The reason:
>
>    - It's integrated and stellar is our default scripting layer
>    - It supports doing some computation in the message
>
>
> On Wed, Feb 1, 2017 at 2:21 PM, Nick Allen <ni...@nickallen.org> wrote:
>
> > Like I said, here is a proposed solution to one of the gaps I identified
> in
> > the previous email.
> >
> > *Problem*
> >
> > There is little transparency into the Threat Triage process itself.  When
> > Threat Triage runs, all I get is a score.  I don't know how that score
> was
> > arrived at, which rules were triggered, and the specific values that
> caused
> > a rule to trigger.
> >
> > More specifically, there is no way to generate a message that looks like
> > "The host 'powned.svr.bank.com' has '230' inbound flows, exceeding the
> > threshold of '202'".  This makes it difficult for an analyst to action
> the
> > alert.
> >
> > *Proposed Solution*
> >
> > To improve the transparency of the Threat Triage process, I am proposing
> > these enhancements.
> >
> > 1. Threat Triage should attach to each message all of the rules that
> fired
> > in addition to the total calculated threat triage score.
> >
> > 2. Threat Triage should allow a custom message to be generated for each
> > rule.  The custom message would allow for some form of string
> interpolation
> > so that I can add specific values from each message to the generated
> > alert.  We could allow this in one or both of the new fields that Casey
> > just added, name and comment.
> >
> >
> > *Example*
> >
> > 1. In this example, we have a telemetry message with a field called
> 'value'
> > that we need to monitor.  In Enrichment, I calculate some sort of value
> > threshold, over which an alert should be generated.
> >
> >
> > 2. In Threat Triage, I use the calculated value threshold to alert on any
> > message that has a value exceeding this threshold.
> >
> > 3. I can embed values from the message, like the hostname, value, and
> value
> > threshold, into the alert produced by Threat Triage.  Notice that I am
> > using ${this} for string interpolation, but it could be any syntax that
> we
> > choose.
> >
> >
> > "triageConfig" : {
> >   "riskLevelRules" : [
> >     {
> >       "name" : "Abnormal Value",
> >       "comment" : "For ${hostname}; the value ${value} exceeds threshold
> of
> > ${value_threshold}",
> >       "rule" : "value > value_threshold",
> >       "score" : 10
> >     }
> >   ],
> >   "aggregator" : "MAX"
> > }
> >
> >
> > 4. The Threat Triage process today would add only the total calculated
> > score.
> >
> > "threat.triage.level": 10.0
> >
> >
> > With this proposal, Threat Triage would add the following to the message.
> >
> > Notice how each of the ${variables} have been replaced with the actual
> > values extracted from the message.  This allows for more contextual
> > information to action the alert.
> >
> > "threat.triage": {
> >     "score": 10.0,
> >     "rules": [
> >       {
> >         "name": "Abnormal Value",
> >         "comment" : "For 10.0.0.1; the value 101 exceeds threshold of
> 42",
> >         "score" : 10
> >       }
> >     ]
> > }
> >
> >
> >
> > What do you think?  Any alternative ideas?
> >
> >
> >
> > On Wed, Feb 1, 2017 at 2:11 PM, Nick Allen <ni...@nickallen.org> wrote:
> >
> > > I'd like to explore the functionality that we have in Metron using a
> > > motivating example.  I think this will help highlight some gaps where
> we
> > > can enhance Metron.
> > >
> > > The motivating example is that I would like to create an alert if the
> > > number of inbound flows to any host over a 15 minute interval is
> > abnormal.
> > > I would like the alert to contain the specific information below to
> > > streamline the triage process.
> > >
> > > Rule: Abnormal number of inbound flows
> > > Bin: 15 mins
> > > Alert: The host 'powned.svr.bank.com' has '230' inbound flows,
> exceeding
> > > the threshold of '202'
> > >
> > >
> > > *What Works*
> > >
> > > In some ways, this example is similar to the "Outlier Detection" demo
> > that
> > > I performed with the Profiler a few months back.   We have most of what
> > we
> > > need to do this with a couple caveats.
> > >
> > > 1. An enrichment would be added to enrich the message with the correct
> > > internal hostname 'powned.svr.bank.com'.
> > >
> > > 2. With the Profiler, I can capture some idea of what "normal" is for
> the
> > > number of inbound flows across 15 minute intervals.
> > > 3. With Threat Triage, I can create rules that alert when a value
> exceeds
> > > what the Profiler defines as normal.
> > >
> > >
> > > *What's Missing*
> > >
> > > Its nice to know that we are almost all the way there with this
> example.
> > > Unfortunately, there are two gaps that fall out of this.
> > >
> > >  1. *Threat Triage Transparency*
> > >
> > > There is little transparency into the Threat Triage process itself.
> When
> > > Threat Triage runs, all I get is a score.  I don't know how that score
> > was
> > > arrived at, which rules were triggered, and the specific values that
> > caused
> > > a rule to trigger.
> > >
> > > More specifically, there is no way to generate a message that looks
> like
> > > "The host 'powned.svr.bank.com' has '230' inbound flows, exceeding the
> > > threshold of '202'".
> > >
> > >
> > > 2. *Triage Calculated Values from the Profiler*
> > >
> > > Also, the value being interrogated here, the number of inbound flows,
> is
> > > not a static value contained within any single telemetry message.  This
> > > value is calculated across multiple messages by the Profiler.  The
> > current
> > > Threat Triage process cannot be used to interrogate values calculated
> by
> > > the Profiler.
> > >
> > >
> > > To try and keep this email concise and digestible, I am going to send a
> > > follow-on discussing proposed solutions for each of these separately.
> > >
> > >
> > >
> > >
> > >
> > >
> >
> >
> > --
> > Nick Allen <ni...@nickallen.org>
> >
>



-- 
Nick Allen <ni...@nickallen.org>

Re: [Discuss] Improve Alerting

Posted by Casey Stella <ce...@gmail.com>.

I like the direction.  One thing that we may want is for comment to just be
a stellar expression and construct a function to essentially do
String.format().  So, that'd become:
"triageConfig" : {
  "riskLevelRules" : [
    {
      "name" : "Abnormal Value",
      "comment" : "FORMAT('For %s; the value %s exceeds threshold of %d',
hostname, value, value_threshold)"
      "rule" : "value > value_threshold",
      "score" : 10
    }
  ],
  "aggregator" : "MAX"
}

The reason:

   - It's integrated and stellar is our default scripting layer
   - It supports doing some computation in the message


On Wed, Feb 1, 2017 at 2:21 PM, Nick Allen <ni...@nickallen.org> wrote:

> Like I said, here is a proposed solution to one of the gaps I identified in
> the previous email.
>
> *Problem*
>
> There is little transparency into the Threat Triage process itself.  When
> Threat Triage runs, all I get is a score.  I don't know how that score was
> arrived at, which rules were triggered, and the specific values that caused
> a rule to trigger.
>
> More specifically, there is no way to generate a message that looks like
> "The host 'powned.svr.bank.com' has '230' inbound flows, exceeding the
> threshold of '202'".  This makes it difficult for an analyst to action the
> alert.
>
> *Proposed Solution*
>
> To improve the transparency of the Threat Triage process, I am proposing
> these enhancements.
>
> 1. Threat Triage should attach to each message all of the rules that fired
> in addition to the total calculated threat triage score.
>
> 2. Threat Triage should allow a custom message to be generated for each
> rule.  The custom message would allow for some form of string interpolation
> so that I can add specific values from each message to the generated
> alert.  We could allow this in one or both of the new fields that Casey
> just added, name and comment.
>
>
> *Example*
>
> 1. In this example, we have a telemetry message with a field called 'value'
> that we need to monitor.  In Enrichment, I calculate some sort of value
> threshold, over which an alert should be generated.
>
>
> 2. In Threat Triage, I use the calculated value threshold to alert on any
> message that has a value exceeding this threshold.
>
> 3. I can embed values from the message, like the hostname, value, and value
> threshold, into the alert produced by Threat Triage.  Notice that I am
> using ${this} for string interpolation, but it could be any syntax that we
> choose.
>
>
> "triageConfig" : {
>   "riskLevelRules" : [
>     {
>       "name" : "Abnormal Value",
>       "comment" : "For ${hostname}; the value ${value} exceeds threshold of
> ${value_threshold}",
>       "rule" : "value > value_threshold",
>       "score" : 10
>     }
>   ],
>   "aggregator" : "MAX"
> }
>
>
> 4. The Threat Triage process today would add only the total calculated
> score.
>
> "threat.triage.level": 10.0
>
>
> With this proposal, Threat Triage would add the following to the message.
>
> Notice how each of the ${variables} have been replaced with the actual
> values extracted from the message.  This allows for more contextual
> information to action the alert.
>
> "threat.triage": {
>     "score": 10.0,
>     "rules": [
>       {
>         "name": "Abnormal Value",
>         "comment" : "For 10.0.0.1; the value 101 exceeds threshold of 42",
>         "score" : 10
>       }
>     ]
> }
>
>
>
> What do you think?  Any alternative ideas?
>
>
>
> On Wed, Feb 1, 2017 at 2:11 PM, Nick Allen <ni...@nickallen.org> wrote:
>
> > I'd like to explore the functionality that we have in Metron using a
> > motivating example.  I think this will help highlight some gaps where we
> > can enhance Metron.
> >
> > The motivating example is that I would like to create an alert if the
> > number of inbound flows to any host over a 15 minute interval is
> abnormal.
> > I would like the alert to contain the specific information below to
> > streamline the triage process.
> >
> > Rule: Abnormal number of inbound flows
> > Bin: 15 mins
> > Alert: The host 'powned.svr.bank.com' has '230' inbound flows, exceeding
> > the threshold of '202'
> >
> >
> > *What Works*
> >
> > In some ways, this example is similar to the "Outlier Detection" demo
> that
> > I performed with the Profiler a few months back.   We have most of what
> we
> > need to do this with a couple caveats.
> >
> > 1. An enrichment would be added to enrich the message with the correct
> > internal hostname 'powned.svr.bank.com'.
> >
> > 2. With the Profiler, I can capture some idea of what "normal" is for the
> > number of inbound flows across 15 minute intervals.
> > 3. With Threat Triage, I can create rules that alert when a value exceeds
> > what the Profiler defines as normal.
> >
> >
> > *What's Missing*
> >
> > Its nice to know that we are almost all the way there with this example.
> > Unfortunately, there are two gaps that fall out of this.
> >
> >  1. *Threat Triage Transparency*
> >
> > There is little transparency into the Threat Triage process itself.  When
> > Threat Triage runs, all I get is a score.  I don't know how that score
> was
> > arrived at, which rules were triggered, and the specific values that
> caused
> > a rule to trigger.
> >
> > More specifically, there is no way to generate a message that looks like
> > "The host 'powned.svr.bank.com' has '230' inbound flows, exceeding the
> > threshold of '202'".
> >
> >
> > 2. *Triage Calculated Values from the Profiler*
> >
> > Also, the value being interrogated here, the number of inbound flows, is
> > not a static value contained within any single telemetry message.  This
> > value is calculated across multiple messages by the Profiler.  The
> current
> > Threat Triage process cannot be used to interrogate values calculated by
> > the Profiler.
> >
> >
> > To try and keep this email concise and digestible, I am going to send a
> > follow-on discussing proposed solutions for each of these separately.
> >
> >
> >
> >
> >
> >
>
>
> --
> Nick Allen <ni...@nickallen.org>
>

Re: [Discuss] Improve Alerting

Posted by Nick Allen <ni...@nickallen.org>.

Like I said, here is a proposed solution to one of the gaps I identified in
the previous email.

*Problem*

There is little transparency into the Threat Triage process itself.  When
Threat Triage runs, all I get is a score.  I don't know how that score was
arrived at, which rules were triggered, and the specific values that caused
a rule to trigger.

More specifically, there is no way to generate a message that looks like
"The host 'powned.svr.bank.com' has '230' inbound flows, exceeding the
threshold of '202'".  This makes it difficult for an analyst to action the
alert.

*Proposed Solution*

To improve the transparency of the Threat Triage process, I am proposing
these enhancements.

1. Threat Triage should attach to each message all of the rules that fired
in addition to the total calculated threat triage score.

2. Threat Triage should allow a custom message to be generated for each
rule.  The custom message would allow for some form of string interpolation
so that I can add specific values from each message to the generated
alert.  We could allow this in one or both of the new fields that Casey
just added, name and comment.

*Example*

1. In this example, we have a telemetry message with a field called 'value'
that we need to monitor.  In Enrichment, I calculate some sort of value
threshold, over which an alert should be generated.

2. In Threat Triage, I use the calculated value threshold to alert on any
message that has a value exceeding this threshold.

3. I can embed values from the message, like the hostname, value, and value
threshold, into the alert produced by Threat Triage.  Notice that I am
using ${this} for string interpolation, but it could be any syntax that we
choose.

"triageConfig" : {
  "riskLevelRules" : [
    {
      "name" : "Abnormal Value",
      "comment" : "For ${hostname}; the value ${value} exceeds threshold of
${value_threshold}",
      "rule" : "value > value_threshold",
      "score" : 10
    }
  ],
  "aggregator" : "MAX"
}

4. The Threat Triage process today would add only the total calculated
score.

"threat.triage.level": 10.0

With this proposal, Threat Triage would add the following to the message.

Notice how each of the ${variables} have been replaced with the actual
values extracted from the message.  This allows for more contextual
information to action the alert.

"threat.triage": {
    "score": 10.0,
    "rules": [
      {
        "name": "Abnormal Value",
        "comment" : "For 10.0.0.1; the value 101 exceeds threshold of 42",
        "score" : 10
      }
    ]
}

What do you think?  Any alternative ideas?

On Wed, Feb 1, 2017 at 2:11 PM, Nick Allen <ni...@nickallen.org> wrote:

> I'd like to explore the functionality that we have in Metron using a
> motivating example.  I think this will help highlight some gaps where we
> can enhance Metron.
>
> The motivating example is that I would like to create an alert if the
> number of inbound flows to any host over a 15 minute interval is abnormal.
> I would like the alert to contain the specific information below to
> streamline the triage process.
>
> Rule: Abnormal number of inbound flows
> Bin: 15 mins
> Alert: The host 'powned.svr.bank.com' has '230' inbound flows, exceeding
> the threshold of '202'
>
>
> *What Works*
>
> In some ways, this example is similar to the "Outlier Detection" demo that
> I performed with the Profiler a few months back.   We have most of what we
> need to do this with a couple caveats.
>
> 1. An enrichment would be added to enrich the message with the correct
> internal hostname 'powned.svr.bank.com'.
>
> 2. With the Profiler, I can capture some idea of what "normal" is for the
> number of inbound flows across 15 minute intervals.
> 3. With Threat Triage, I can create rules that alert when a value exceeds
> what the Profiler defines as normal.
>
>
> *What's Missing*
>
> Its nice to know that we are almost all the way there with this example.
> Unfortunately, there are two gaps that fall out of this.
>
>  1. *Threat Triage Transparency*
>
> There is little transparency into the Threat Triage process itself.  When
> Threat Triage runs, all I get is a score.  I don't know how that score was
> arrived at, which rules were triggered, and the specific values that caused
> a rule to trigger.
>
> More specifically, there is no way to generate a message that looks like
> "The host 'powned.svr.bank.com' has '230' inbound flows, exceeding the
> threshold of '202'".
>
>
> 2. *Triage Calculated Values from the Profiler*
>
> Also, the value being interrogated here, the number of inbound flows, is
> not a static value contained within any single telemetry message.  This
> value is calculated across multiple messages by the Profiler.  The current
> Threat Triage process cannot be used to interrogate values calculated by
> the Profiler.
>
>
> To try and keep this email concise and digestible, I am going to send a
> follow-on discussing proposed solutions for each of these separately.
>
>
>
>
>
>

-- 
Nick Allen <ni...@nickallen.org>