You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spot.apache.org by Giacomo Bernardi <mi...@minux.it> on 2017/06/20 15:58:37 UTC

Re: Spot Suspicious Connects Description and questions related to 'feedback' from UI to ML

Hi Brandon and all,
I'm resuming this thread to check whether any thought has already been
given to such "streaming use case".

Are you planning of somehow using streaming-LDA in that case too? Or
something different (fancy RNNs? HTM?) to model the state of each IP?

Thanks,
Giacomo

On 25 May 2017 at 18:27, Edwards, Brandon <br...@intel.com> wrote:

> The Spot team feels that changes are needed to this ‘feedback’
> functionality, and see these changes as happening concurrent with
> improvements to the ability for context from an LDA model trained on a given
> batch of data to be carried forward to the next training run (or even
> training in a streaming use case). The value of ‘feedback’ is dependent on
> the quality of the model-context we can carry over.

Re: Spot Suspicious Connects Description and questions related to 'feedback' from UI to ML

Posted by "Edwards, Brandon" <br...@intel.com>.

The idea would be to use the online optimizer: first training the model on a whole day’s worth of data to establish a model foothold, finding anomalies within that first day. From then on minibatches would be brought in (near real time) to further train the model and evaluate the most recent anomalies. Do you have thoughts on this topic Giacomo? Are you hoping to contribute?

Brandon

On 6/20/17, 10:01 AM, "Giacomo Bernardi" <mi...@minux.it> wrote:

    Thanks.
    I wasn't referring to extra time based series, but to the topic
    modelling and anomaly detection itself. So, plan is to use
    OnlineLDAOptimizer with mini-batches of the last (few?) minutes, then?
    
    G.
    
    
    On 20 June 2017 at 17:45, Edwards, Brandon <br...@intel.com> wrote:
    > Giacomo,
    > Spark has an online optimizer for LDA which would enable the use of LDA in a mini-batch or streaming use case. However, if you are talking about machine learning that would look for anomalies that incorporate time-based features, we would like to explore this. It’s on the road map, but is not being worked on right now. We have thought of including new time based features into the LDA model, and/or training additional time series models to be included with LDA in a model-ensemble.
    > Brandon
    >
    > On 6/20/17, 8:58 AM, "Giacomo Bernardi" <mi...@minux.it> wrote:
    >
    >     Hi Brandon and all,
    >     I'm resuming this thread to check whether any thought has already been
    >     given to such "streaming use case".
    >
    >     Are you planning of somehow using streaming-LDA in that case too? Or
    >     something different (fancy RNNs? HTM?) to model the state of each IP?
    >
    >     Thanks,
    >     Giacomo
    >
    >
    >     On 25 May 2017 at 18:27, Edwards, Brandon <br...@intel.com> wrote:
    >
    >     > The Spot team feels that changes are needed to this ‘feedback’
    >     > functionality, and see these changes as happening concurrent with
    >     > improvements to the ability for context from an LDA model trained on a given
    >     > batch of data to be carried forward to the next training run (or even
    >     > training in a streaming use case). The value of ‘feedback’ is dependent on
    >     > the quality of the model-context we can carry over.
    >
    >

Re: Spot Suspicious Connects Description and questions related to 'feedback' from UI to ML

Posted by Giacomo Bernardi <mi...@minux.it>.

Thanks.
I wasn't referring to extra time based series, but to the topic
modelling and anomaly detection itself. So, plan is to use
OnlineLDAOptimizer with mini-batches of the last (few?) minutes, then?

G.


On 20 June 2017 at 17:45, Edwards, Brandon <br...@intel.com> wrote:
> Giacomo,
> Spark has an online optimizer for LDA which would enable the use of LDA in a mini-batch or streaming use case. However, if you are talking about machine learning that would look for anomalies that incorporate time-based features, we would like to explore this. It’s on the road map, but is not being worked on right now. We have thought of including new time based features into the LDA model, and/or training additional time series models to be included with LDA in a model-ensemble.
> Brandon
>
> On 6/20/17, 8:58 AM, "Giacomo Bernardi" <mi...@minux.it> wrote:
>
>     Hi Brandon and all,
>     I'm resuming this thread to check whether any thought has already been
>     given to such "streaming use case".
>
>     Are you planning of somehow using streaming-LDA in that case too? Or
>     something different (fancy RNNs? HTM?) to model the state of each IP?
>
>     Thanks,
>     Giacomo
>
>
>     On 25 May 2017 at 18:27, Edwards, Brandon <br...@intel.com> wrote:
>
>     > The Spot team feels that changes are needed to this ‘feedback’
>     > functionality, and see these changes as happening concurrent with
>     > improvements to the ability for context from an LDA model trained on a given
>     > batch of data to be carried forward to the next training run (or even
>     > training in a streaming use case). The value of ‘feedback’ is dependent on
>     > the quality of the model-context we can carry over.
>
>

Re: Spot Suspicious Connects Description and questions related to 'feedback' from UI to ML

Posted by "Edwards, Brandon" <br...@intel.com>.

Giacomo,
Spark has an online optimizer for LDA which would enable the use of LDA in a mini-batch or streaming use case. However, if you are talking about machine learning that would look for anomalies that incorporate time-based features, we would like to explore this. It’s on the road map, but is not being worked on right now. We have thought of including new time based features into the LDA model, and/or training additional time series models to be included with LDA in a model-ensemble. 
Brandon 

On 6/20/17, 8:58 AM, "Giacomo Bernardi" <mi...@minux.it> wrote:

    Hi Brandon and all,
    I'm resuming this thread to check whether any thought has already been
    given to such "streaming use case".

    Are you planning of somehow using streaming-LDA in that case too? Or
    something different (fancy RNNs? HTM?) to model the state of each IP?

    Thanks,
    Giacomo

    On 25 May 2017 at 18:27, Edwards, Brandon <br...@intel.com> wrote:

    > The Spot team feels that changes are needed to this ‘feedback’
    > functionality, and see these changes as happening concurrent with
    > improvements to the ability for context from an LDA model trained on a given
    > batch of data to be carried forward to the next training run (or even
    > training in a streaming use case). The value of ‘feedback’ is dependent on
    > the quality of the model-context we can carry over.