You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@metron.apache.org by moshe jarusalem <tu...@gmail.com> on 2017/12/06 07:45:41 UTC

machine learning libraries supported

Hi All,
Would you please suggest some documentation about machine learning
libraries can be used in metron architecture? and how ? any examples
appretiated.

regards,

Re: machine learning libraries supported

Posted by Otto Fowler <ot...@gmail.com>.
Right now, you can look at MaaS, for plugging in machine learning services.

If you want to use spark, and you have it on your cluster, you could write
your own spark drivers and have them pull from the
kakfa topics ( indexing for example ) and run your spark stuff there.


On December 7, 2017 at 03:37:00, moshe jarusalem (tuutdo@gmail.com) wrote:

Hi all,

ping

On Wed, Dec 6, 2017 at 1:23 PM, Gaurav Bapat <ga...@gmail.com> wrote:

> Hi Moshe,
>
> Even I want to know about ML libraries on Metron, I think Spark might help
> but I dont know how will I setup Metron
>
> Be in touch!!
>
> Thank You,
> Gaurav
>
> On 6 December 2017 at 13:15, moshe jarusalem <tu...@gmail.com> wrote:
>
>> Hi All,
>> Would you please suggest some documentation about machine learning
>> libraries can be used in metron architecture? and how ? any examples
>> appretiated.
>>
>> regards,
>>
>>
>

Re: machine learning libraries supported

Posted by moshe jarusalem <tu...@gmail.com>.
Hi all,

ping

On Wed, Dec 6, 2017 at 1:23 PM, Gaurav Bapat <ga...@gmail.com> wrote:

> Hi Moshe,
>
> Even I want to know about ML libraries on Metron, I think Spark might help
> but I dont know how will I setup Metron
>
> Be in touch!!
>
> Thank You,
> Gaurav
>
> On 6 December 2017 at 13:15, moshe jarusalem <tu...@gmail.com> wrote:
>
>> Hi All,
>> Would you please suggest some documentation about machine learning
>> libraries can be used in metron architecture? and how ? any examples
>> appretiated.
>>
>> regards,
>>
>>
>

Re: machine learning libraries supported

Posted by Gaurav Bapat <ga...@gmail.com>.
Hi Moshe,

Even I want to know about ML libraries on Metron, I think Spark might help
but I dont know how will I setup Metron

Be in touch!!

Thank You,
Gaurav

On 6 December 2017 at 13:15, moshe jarusalem <tu...@gmail.com> wrote:

> Hi All,
> Would you please suggest some documentation about machine learning
> libraries can be used in metron architecture? and how ? any examples
> appretiated.
>
> regards,
>
>

Re: machine learning libraries supported

Posted by Otto Fowler <ot...@gmail.com>.
Simon,
What do you think a good example of python, spark and MaaS would look like?


On December 7, 2017 at 07:56:00, Simon Elliston Ball (
simon@simonellistonball.com) wrote:

I would recommend starting out with something like Spark, but the short
answer is that anything that will run inside a yarn container, so the
answer is most ML libraries.

Using Spark to train models on the historical store is a good bet, and then
using the trained models with model as a service.

See
https://github.com/apache/metron/tree/master/metron-analytics/metron-maas-service
for
information on models and some sample boilerplate for deploying your own
python based models.

You could as some have suggested use spark streaming, but to be honest, the
spark ML models are not well suited to streaming use cases, and you would
be very much breaking the metron flow rather than benefitting from elements
like MaaS (you’d basically be building a 100% custom side project, which
would be fine, but you’re missing a lot of the benefits of Metron that
way). If you do go down that route I would strong recommend having the
output of your streaming jobs feed back into a Metron sensor. To be honest
though, you’re much better off training in batch and scoring / inferring
via the Model as a Service approach.

Simon


On 6 Dec 2017, at 07:45, moshe jarusalem <tu...@gmail.com> wrote:

Hi All,
Would you please suggest some documentation about machine learning
libraries can be used in metron architecture? and how ? any examples
appretiated.

regards,

Re: machine learning libraries supported

Posted by Simon Elliston Ball <si...@simonellistonball.com>.
Spark’s ML models are primarily batch in their nature. There is talk about incorporating things like naive bayes and streaming kmeans to structured streaming (which will require some schema work in metron to make sense). These are still open issues not seeing a lot of progress in the spark community. 

The most common mistake I’ve seen using spark streaming with ML in the cyber world is people thinking that the FP Growth association rules models can be online-learnt, because there exists a class of streaming FP Growth models. The Spark implementations of FP Growth however, rely on Batch (mathematically!) and while technically can be run on the micro-batches Spark streaming provides, are not actually meaningful. Just because your model runs and gives you an output, doesn’t mean it’s mathematically defensible to do so. 

All that said...

Streaming inference makes some sense in spark, but that’s probably better handled through MaaS in Metron, which will generalise to spark and other libraries, and absolutely, use the spark models and the ML pipelining to perform inference in a spark job run with parallel instances in MaaS. Note that the reason for this is primality that Spark is a data parallel engine, where as Metron MaaS applies task parallelism, in order to reduce latency. 

To the point of a good example of python / spark / MaaS / Metron, I would recommend taking a look at Casey’s blog at https://hortonworks.com/blog/model-service-modern-streaming-data-science-apache-metron/ <https://hortonworks.com/blog/model-service-modern-streaming-data-science-apache-metron/> which is a walk though on score ad python scikit-learn model in MaaS. For the spark piece, I’ve seen a number of examples based on these same principals, using the spark classes for scoring based on saved models produced by a batch trainer. Apologies, I don’t have any readily publishable examples of the whole thing, but may work something synthetic up if it would be useful. 

Simon

> On 7 Dec 2017, at 13:09, Martin Andreoni <ma...@gta.ufrj.br> wrote:
> 
> Hello Simon,
> 
> thanks for the information.
> 
> However, why do u affirm that the streaming models are not well suited?
> 
>> You could as some have suggested use spark streaming, but to be honest, the spark ML models are not well suited to streaming use cases
> Is there a performance problem or how would you justify that phrase? 
> 
> thanks
> 
> Le 07/12/2017 à 13:55, Simon Elliston Ball a écrit :
>> I would recommend starting out with something like Spark, but the short answer is that anything that will run inside a yarn container, so the answer is most ML libraries. 
>> 
>> Using Spark to train models on the historical store is a good bet, and then using the trained models with model as a service.
>> 
>> See https://github.com/apache/metron/tree/master/metron-analytics/metron-maas-service <https://github.com/apache/metron/tree/master/metron-analytics/metron-maas-service> for information on models and some sample boilerplate for deploying your own python based models. 
>> 
>> You could as some have suggested use spark streaming, but to be honest, the spark ML models are not well suited to streaming use cases, and you would be very much breaking the metron flow rather than benefitting from elements like MaaS (you’d basically be building a 100% custom side project, which would be fine, but you’re missing a lot of the benefits of Metron that way). If you do go down that route I would strong recommend having the output of your streaming jobs feed back into a Metron sensor. To be honest though, you’re much better off training in batch and scoring / inferring via the Model as a Service approach. 
>> 
>> Simon
>> 
>> 
>>> On 6 Dec 2017, at 07:45, moshe jarusalem <tuutdo@gmail.com <ma...@gmail.com>> wrote:
>>> 
>>> Hi All,
>>> Would you please suggest some documentation about machine learning libraries can be used in metron architecture? and how ? any examples appretiated.
>>> 
>>> regards,
>>> 
>> 
> 
> -- 
> Martin Andreoni
> PhD. Candidate at GTA/LIP6
> 
> UFRJ/UPMC
> 
> www.gta.ufrj.br/~martin <http://www.gta.ufrj.br/%7Emartin>

Re: machine learning libraries supported

Posted by Martin Andreoni <ma...@gta.ufrj.br>.
Hello Simon,

thanks for the information.

However, why do u affirm that the streaming models are not well suited?

> You could as some have suggested use spark streaming, but to be 
> honest, the spark ML models are not well suited to streaming use cases
Is there a performance problem or how would you justify that phrase?

thanks


Le 07/12/2017 à 13:55, Simon Elliston Ball a écrit :
> I would recommend starting out with something like Spark, but the 
> short answer is that anything that will run inside a yarn container, 
> so the answer is most ML libraries.
>
> Using Spark to train models on the historical store is a good bet, and 
> then using the trained models with model as a service.
>
> See 
> https://github.com/apache/metron/tree/master/metron-analytics/metron-maas-service for 
> information on models and some sample boilerplate for deploying your 
> own python based models.
>
> You could as some have suggested use spark streaming, but to be 
> honest, the spark ML models are not well suited to streaming use 
> cases, and you would be very much breaking the metron flow rather than 
> benefitting from elements like MaaS (you’d basically be building a 
> 100% custom side project, which would be fine, but you’re missing a 
> lot of the benefits of Metron that way). If you do go down that route 
> I would strong recommend having the output of your streaming jobs feed 
> back into a Metron sensor. To be honest though, you’re much better off 
> training in batch and scoring / inferring via the Model as a Service 
> approach.
>
> Simon
>
>
>> On 6 Dec 2017, at 07:45, moshe jarusalem <tuutdo@gmail.com 
>> <ma...@gmail.com>> wrote:
>>
>> Hi All,
>> Would you please suggest some documentation about machine learning 
>> libraries can be used in metron architecture? and how ? any examples 
>> appretiated.
>>
>> regards,
>>
>

-- 
*Martin Andreoni *

PhD. Candidate at GTA/LIP6

UFRJ/UPMC

www.gta.ufrj.br/~martin <http://www.gta.ufrj.br/%7Emartin>


Re: machine learning libraries supported

Posted by Simon Elliston Ball <si...@simonellistonball.com>.
I would recommend starting out with something like Spark, but the short answer is that anything that will run inside a yarn container, so the answer is most ML libraries. 

Using Spark to train models on the historical store is a good bet, and then using the trained models with model as a service.

See https://github.com/apache/metron/tree/master/metron-analytics/metron-maas-service <https://github.com/apache/metron/tree/master/metron-analytics/metron-maas-service> for information on models and some sample boilerplate for deploying your own python based models. 

You could as some have suggested use spark streaming, but to be honest, the spark ML models are not well suited to streaming use cases, and you would be very much breaking the metron flow rather than benefitting from elements like MaaS (you’d basically be building a 100% custom side project, which would be fine, but you’re missing a lot of the benefits of Metron that way). If you do go down that route I would strong recommend having the output of your streaming jobs feed back into a Metron sensor. To be honest though, you’re much better off training in batch and scoring / inferring via the Model as a Service approach. 

Simon


> On 6 Dec 2017, at 07:45, moshe jarusalem <tu...@gmail.com> wrote:
> 
> Hi All,
> Would you please suggest some documentation about machine learning libraries can be used in metron architecture? and how ? any examples appretiated.
> 
> regards,
>