You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@heron.apache.org by Saikat Kanjilal <sx...@hotmail.com> on 2018/05/07 21:31:55 UTC

[DISCUSS] A design proposal for incorporating machine learning algorithms into heron

Hello Dev community,

I have created the initial API design documentation around building storm topologies around a set of machine learning streaming algorithms here: https://docs.google.com/document/d/1LrO7XRcMxJoMM83wjRd-Ov74VAaomA_mXOAhCStgGng/edit?usp=sharing, this is very much a work in progress but I wanted to start getting early  feedback from the community as its a lot of complex operations representing a streaming ml pipeline using heron.   This design leverages apache samoa to figure out which algorithms to focus on in bringing into heron.

Thank you Karthik Ramasamy for your mentoring on this, the goal will be to represent all the algorithms in phase 1 as storm topologies and then to evolve this to building a streamlet based architecture would really appreciate some feedback from the community

While you guys are commenting on the initial approach I will : 1) finish the design for the rest of the algorithms for phase 1 2) start the design for building out a heron streamlet based architecture to run on top of the storm based topologies.

Look forward to a productive discussion around the design

Re: [DISCUSS] A design proposal for incorporating machine learning algorithms into heron

Posted by Ning Wang <wa...@gmail.com>.

My thoughts:

1. sounds good!
2. I feel it might be better to be separated so we can focus on one problem
each time.
3. depending on how hard it is to add in future I feel.
4. not sure.


On Wed, May 9, 2018 at 7:39 AM, Saikat Kanjilal <sx...@hotmail.com> wrote:

> FYI for those that dont know about Michaelangelo: https://eng.uber.com/
> michelangelo/
>
> [http://eng.uber.com/wp-content/uploads/2017/09/Facebook.png]<https://eng.
> uber.com/michelangelo/>
>
> Meet Michelangelo: Uber's Machine Learning Platform<https://eng.uber.com/
> michelangelo/>
> eng.uber.com
> Uber Engineering introduces Michelangelo, our machine
> learning-as-a-service system that enables teams to easily build, deploy,
> and operate ML solutions at scale.
>
>
>
>
> ________________________________
> From: Saikat Kanjilal <sx...@hotmail.com>
> Sent: Wednesday, May 9, 2018 7:35 AM
> To: dev@heron.incubator.apache.org; Karthik Ramasamy
> Subject: Re: [DISCUSS] A design proposal for incorporating machine
> learning algorithms into heron
>
> Hi Folks,
>
> I was thinking about how to drive this initiative and had some ideas
> around execution, would love some feedback:
>
> 1) While the discussion is happening around the design I was thinking of
> building a little prototype with one of the algorithms , the prototype will
> be a first cut representation of the design where we represent one
> algorithm into a storm topology, when I look at the list of algorithms that
> we're thinking about bringing over from samoa (https://samoa.incubator.
> apache.org/documentation/SAMOA-and-Machine-Learning.html) the distributed
> stream clustering looks the most valuable for a prototype, thoughts
> Apache SAMOA and Machine Learning<https://samoa.incubator.apache.org/
> documentation/SAMOA-and-Machine-Learning.html>
> samoa.incubator.apache.org
> Apache SAMOA and Machine Learning. SAMOA’s main goal is to help developers
> to create easily machine learning algorithms on top of any distributed
> stream processing engine.
>
>
>
>
> Apache SAMOA and Machine Learning<https://samoa.incubator.apache.org/
> documentation/SAMOA-and-Machine-Learning.html>
> Apache SAMOA and Machine Learning<https://samoa.incubator.apache.org/
> documentation/SAMOA-and-Machine-Learning.html>
> samoa.incubator.apache.org
> Apache SAMOA and Machine Learning. SAMOA’s main goal is to help developers
> to create easily machine learning algorithms on top of any distributed
> stream processing engine.
>
>
>
> samoa.incubator.apache.org
> Apache SAMOA and Machine Learning. SAMOA’s main goal is to help developers
> to create easily machine learning algorithms on top of any distributed
> stream processing engine.
>
>
> 2) I would like to leverage some of the ideas in MichaelAngelo as well as
> my previous experience in building a tool that versions, deploys and
> associates ML models with newly arriving windows of data, in actuality I
> feel like this is a completely orthogonal initiative that we also need to
> design out, should this be part of the design doc at this point, thoughts?
>
> 3) Should we address security in streaming machine learning models for the
> first release?
>
> 4) The design doc mentions a GenericMLOutputModelSink, I was thinking this
> is like a factory method in that has underlying representations of various
> sinks that already exist that I'm hoping to leverage, see here:
> https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/
> bk_storm-component-guide/content/ch_storm-connectors.html
>
>
>
> @Karthik Ramasamy<ma...@streaml.io> et all, would love to get
> thoughts on how we proceed with this initiative at this point, in the
> meantime I will get started with 1 to test out the feasibility of this
> design.
>
> Regards
>
> Chapter 5. Moving Data Into and Out of Apache Storm Using ...<
> https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.
> 4/bk_storm-component-guide/content/ch_storm-connectors.html>
> docs.hortonworks.com
> This chapter focuses on moving data into and out of Apache Storm through
> the use of spouts and bolts. Spouts read data from external sources to
> ingest data into a topology.
>
>
>
>
>
>
> ________________________________
> From: Saikat Kanjilal <sx...@hotmail.com>
> Sent: Monday, May 7, 2018 2:31 PM
> To: dev@heron.incubator.apache.org
> Subject: [DISCUSS] A design proposal for incorporating machine learning
> algorithms into heron
>
>
> Hello Dev community,
>
> I have created the initial API design documentation around building storm
> topologies around a set of machine learning streaming algorithms here:
> https://docs.google.com/document/d/1LrO7XRcMxJoMM83wjRd-
> Ov74VAaomA_mXOAhCStgGng/edit?usp=sharing, this is very much a work in
> progress but I wanted to start getting early  feedback from the community
> as its a lot of complex operations representing a streaming ml pipeline
> using heron.   This design leverages apache samoa to figure out which
> algorithms to focus on in bringing into heron.
>
> Thank you Karthik Ramasamy for your mentoring on this, the goal will be to
> represent all the algorithms in phase 1 as storm topologies and then to
> evolve this to building a streamlet based architecture would really
> appreciate some feedback from the community
>
> While you guys are commenting on the initial approach I will : 1) finish
> the design for the rest of the algorithms for phase 1 2) start the design
> for building out a heron streamlet based architecture to run on top of the
> storm based topologies.
>
> Look forward to a productive discussion around the design
>
>

Re: [DISCUSS] A design proposal for incorporating machine learning algorithms into heron

Posted by Saikat Kanjilal <sx...@hotmail.com>.

FYI for those that dont know about Michaelangelo: https://eng.uber.com/michelangelo/

[http://eng.uber.com/wp-content/uploads/2017/09/Facebook.png]<https://eng.uber.com/michelangelo/>

Meet Michelangelo: Uber's Machine Learning Platform<https://eng.uber.com/michelangelo/>
eng.uber.com
Uber Engineering introduces Michelangelo, our machine learning-as-a-service system that enables teams to easily build, deploy, and operate ML solutions at scale.




________________________________
From: Saikat Kanjilal <sx...@hotmail.com>
Sent: Wednesday, May 9, 2018 7:35 AM
To: dev@heron.incubator.apache.org; Karthik Ramasamy
Subject: Re: [DISCUSS] A design proposal for incorporating machine learning algorithms into heron

Hi Folks,

I was thinking about how to drive this initiative and had some ideas around execution, would love some feedback:

1) While the discussion is happening around the design I was thinking of building a little prototype with one of the algorithms , the prototype will be a first cut representation of the design where we represent one algorithm into a storm topology, when I look at the list of algorithms that we're thinking about bringing over from samoa (https://samoa.incubator.apache.org/documentation/SAMOA-and-Machine-Learning.html) the distributed stream clustering looks the most valuable for a prototype, thoughts
Apache SAMOA and Machine Learning<https://samoa.incubator.apache.org/documentation/SAMOA-and-Machine-Learning.html>
samoa.incubator.apache.org
Apache SAMOA and Machine Learning. SAMOA’s main goal is to help developers to create easily machine learning algorithms on top of any distributed stream processing engine.




Apache SAMOA and Machine Learning<https://samoa.incubator.apache.org/documentation/SAMOA-and-Machine-Learning.html>
Apache SAMOA and Machine Learning<https://samoa.incubator.apache.org/documentation/SAMOA-and-Machine-Learning.html>
samoa.incubator.apache.org
Apache SAMOA and Machine Learning. SAMOA’s main goal is to help developers to create easily machine learning algorithms on top of any distributed stream processing engine.



samoa.incubator.apache.org
Apache SAMOA and Machine Learning. SAMOA’s main goal is to help developers to create easily machine learning algorithms on top of any distributed stream processing engine.


2) I would like to leverage some of the ideas in MichaelAngelo as well as my previous experience in building a tool that versions, deploys and associates ML models with newly arriving windows of data, in actuality I feel like this is a completely orthogonal initiative that we also need to design out, should this be part of the design doc at this point, thoughts?

3) Should we address security in streaming machine learning models for the first release?

4) The design doc mentions a GenericMLOutputModelSink, I was thinking this is like a factory method in that has underlying representations of various sinks that already exist that I'm hoping to leverage, see here: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_storm-component-guide/content/ch_storm-connectors.html



@Karthik Ramasamy<ma...@streaml.io> et all, would love to get thoughts on how we proceed with this initiative at this point, in the meantime I will get started with 1 to test out the feasibility of this design.

Regards

Chapter 5. Moving Data Into and Out of Apache Storm Using ...<https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_storm-component-guide/content/ch_storm-connectors.html>
docs.hortonworks.com
This chapter focuses on moving data into and out of Apache Storm through the use of spouts and bolts. Spouts read data from external sources to ingest data into a topology.






________________________________
From: Saikat Kanjilal <sx...@hotmail.com>
Sent: Monday, May 7, 2018 2:31 PM
To: dev@heron.incubator.apache.org
Subject: [DISCUSS] A design proposal for incorporating machine learning algorithms into heron


Hello Dev community,

I have created the initial API design documentation around building storm topologies around a set of machine learning streaming algorithms here: https://docs.google.com/document/d/1LrO7XRcMxJoMM83wjRd-Ov74VAaomA_mXOAhCStgGng/edit?usp=sharing, this is very much a work in progress but I wanted to start getting early  feedback from the community as its a lot of complex operations representing a streaming ml pipeline using heron.   This design leverages apache samoa to figure out which algorithms to focus on in bringing into heron.

Thank you Karthik Ramasamy for your mentoring on this, the goal will be to represent all the algorithms in phase 1 as storm topologies and then to evolve this to building a streamlet based architecture would really appreciate some feedback from the community

While you guys are commenting on the initial approach I will : 1) finish the design for the rest of the algorithms for phase 1 2) start the design for building out a heron streamlet based architecture to run on top of the storm based topologies.

Look forward to a productive discussion around the design

Re: [DISCUSS] A design proposal for incorporating machine learning algorithms into heron

Posted by Saikat Kanjilal <sx...@hotmail.com>.

Hi Folks,

I was thinking about how to drive this initiative and had some ideas around execution, would love some feedback:

1) While the discussion is happening around the design I was thinking of building a little prototype with one of the algorithms , the prototype will be a first cut representation of the design where we represent one algorithm into a storm topology, when I look at the list of algorithms that we're thinking about bringing over from samoa (https://samoa.incubator.apache.org/documentation/SAMOA-and-Machine-Learning.html) the distributed stream clustering looks the most valuable for a prototype, thoughts

Apache SAMOA and Machine Learning<https://samoa.incubator.apache.org/documentation/SAMOA-and-Machine-Learning.html>
samoa.incubator.apache.org
Apache SAMOA and Machine Learning. SAMOA’s main goal is to help developers to create easily machine learning algorithms on top of any distributed stream processing engine.


2) I would like to leverage some of the ideas in MichaelAngelo as well as my previous experience in building a tool that versions, deploys and associates ML models with newly arriving windows of data, in actuality I feel like this is a completely orthogonal initiative that we also need to design out, should this be part of the design doc at this point, thoughts?

3) Should we address security in streaming machine learning models for the first release?

4) The design doc mentions a GenericMLOutputModelSink, I was thinking this is like a factory method in that has underlying representations of various sinks that already exist that I'm hoping to leverage, see here: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_storm-component-guide/content/ch_storm-connectors.html



@Karthik Ramasamy<ma...@streaml.io> et all, would love to get thoughts on how we proceed with this initiative at this point, in the meantime I will get started with 1 to test out the feasibility of this design.

Regards

Chapter 5. Moving Data Into and Out of Apache Storm Using ...<https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_storm-component-guide/content/ch_storm-connectors.html>
docs.hortonworks.com
This chapter focuses on moving data into and out of Apache Storm through the use of spouts and bolts. Spouts read data from external sources to ingest data into a topology.






________________________________
From: Saikat Kanjilal <sx...@hotmail.com>
Sent: Monday, May 7, 2018 2:31 PM
To: dev@heron.incubator.apache.org
Subject: [DISCUSS] A design proposal for incorporating machine learning algorithms into heron


Hello Dev community,

I have created the initial API design documentation around building storm topologies around a set of machine learning streaming algorithms here: https://docs.google.com/document/d/1LrO7XRcMxJoMM83wjRd-Ov74VAaomA_mXOAhCStgGng/edit?usp=sharing, this is very much a work in progress but I wanted to start getting early  feedback from the community as its a lot of complex operations representing a streaming ml pipeline using heron.   This design leverages apache samoa to figure out which algorithms to focus on in bringing into heron.

Thank you Karthik Ramasamy for your mentoring on this, the goal will be to represent all the algorithms in phase 1 as storm topologies and then to evolve this to building a streamlet based architecture would really appreciate some feedback from the community

While you guys are commenting on the initial approach I will : 1) finish the design for the rest of the algorithms for phase 1 2) start the design for building out a heron streamlet based architecture to run on top of the storm based topologies.

Look forward to a productive discussion around the design