You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@beam.apache.org by Kam Kasravi <ka...@gmail.com> on 2016/05/13 16:54:13 UTC

machine learning API, common models

Hi

A number of readers have made comments on this topic recently. We have
created a document that does some analysis of common ML models and related
APIs. We hope this can drive an approach that will result in an API,
compatibility matrix and involvement from the same groups that are
implementing transformation runners (spark, flink, etc). We welcome
comments here or in the document itself.

https://docs.google.com/document/d/17cRZk_yqHm3C0fljivjN66MbLkeKS1yjo4PBECHb-xA/edit?usp=sharing

RE: machine learning API, common models

Posted by "Kavulya, Soila P" <so...@intel.com>.

Hi Suneel,

The document is a work-in-progress to solicit feedback on the API so feel free to add to it.

Based on discussions with Tyler, the plan was to start by defining the types of data structures and transforms that we need to support in the high-level API (without a default implementation). Once that is done, we would then add lower-level ML algorithm support (e.g. iterative).

Soila

-----Original Message-----
From: Suneel Marthi [mailto:smarthi@apache.org] 
Sent: Tuesday, May 17, 2016 7:26 AM
To: mahout <de...@mahout.apache.org>; dev@beam.incubator.apache.org
Subject: Re: machine learning API, common models

I am curious as to why Oryx 2.0 and Mahout have been excluded from this doc. Any reasons?
Both the projects have good customer base and are being used in production.

On Tue, May 17, 2016 at 10:01 AM, Suneel Marthi <sm...@apache.org> wrote:

> Thanks Simone for pointing this out.
>
> On the Apache Mahout project we have distributed linear algebra with 
> R-like semantics that can be executed on Spark/Flink/H2O.
>
> @Kam: the document u point out is old and outdated, the most 
> up-to-date reference to the Samsara api is the book - 'Apache Mahout: 
> Beyond MapReduce". (shameless marketing here on behalf of fellow 
> committers :) )
>
> We added Flink DataSet API in the recent Mahout 0.12.0 release (April 
> 11,
> 2016) and has been called out in my talk at ApacheBigData in Vancouver 
> last week.
>
> The Mahout community would definitely be interested in being involved 
> with this and sharing notes.
>
> IMHO, the focus should be first on building a good linalg foundations 
> before embarking on building algos and pipelines. Adding @dlyubimov to this.
>
>
>
> ---------- Forwarded message ----------
> From: Simone Robutti <si...@radicalbit.io>
> Date: Tue, May 17, 2016 at 9:48 AM
> Subject: Fwd: machine learning API, common models
> To: Suneel Marthi <sm...@apache.org>
>
>
>
> ---------- Forwarded message ----------
> From: Kavulya, Soila P <so...@intel.com>
> Date: 2016-05-17 1:53 GMT+02:00
> Subject: RE: machine learning API, common models
> To: "dev@beam.incubator.apache.org" <de...@beam.incubator.apache.org>
>
>
> Thanks Simone,
>
> You have raised a valid concern about how different frameworks will 
> have different implementations and parameter semantics for the same 
> algorithm. I agree that it is important to keep this in mind. 
> Hopefully, through this exercise, we will identify a good set of 
> common ML abstractions across different frameworks.
>
> Feel free to edit the document. We had limited the first pass of the 
> comparison matrix to the machine learning pipeline APIs, but we can 
> extend it to include other ML building blocks like linear algebra 
> operations, and APIs for optimizers like gradient descent.
>
> Soila
>
> -----Original Message-----
> From: Kam Kasravi [mailto:kamkasravi@gmail.com]
> Sent: Monday, May 16, 2016 8:22 AM
> To: dev@beam.incubator.apache.org
> Subject: Re: machine learning API, common models
>
> Thanks Simone - yes I had read your concerns on dev and I think 
> they're well founded.
> Thanks for the samsura reference - I've been looking at the 
> spark/scala bindings 
> http://mahout.apache.org/users/sparkbindings/ScalaSparkBindings.pdf.
>
> I think we should expand the document to include linear algebraic ops 
> or least pay due diligence to it. If you're doing anything on the 
> flink side in this regard let us or feel free to suggest edits/updates to the document.
>
> Thanks
> Kam
>
> On Mon, May 16, 2016 at 6:05 AM, Simone Robutti < 
> simone.robutti@radicalbit.io> wrote:
>
> > Hello,
> >
> > I'm Simone and I just began contributing to Flink ML (actually on 
> > the distributed linalg part). I already expressed my concerns about 
> > the idea of an high level API relying on specific frameworks'
> implementations:
> > different implementations produce different results and may vary in 
> > quality. Also the semantics of parameters may change from one 
> > implementation to the other. This could hinder portability and 
> > transparency. I believe these problems could be handled paying the 
> > due attention to the details of every single implementation but I 
> > invite you not to underestimate these problems.
> >
> > On the other hand the API in itself looks good to me. From my side, 
> > I hope to fill some of the gaps in Flink you underlined in the 
> > comparison
> matrix.
> >
> > Talking about matrices, proper matrices this time, I believe it 
> > would be useful to include in this API support for linear algebra operations.
> > Something similar is already present in Mahout's Samsara and it 
> > looks really good but clearly a similar implementation on Beam would 
> > be way more interesting and powerful.
> >
> > My 2 cents,
> >
> > Simone
> >
> >
> > 2016-05-14 4:53 GMT+02:00 Kavulya, Soila P <so...@intel.com>:
> >
> > > Hi Tyler,
> > >
> > > Thank you so much for your feedback. I agree that starting with 
> > > the high-level API is a good direction. We are interested in 
> > > Python because
> > it
> > > is the language that our data scientists are most familiar with. I 
> > > think starting with Java would be the best approach, because the 
> > > Python API can be a thin wrapper for Java API.
> > >
> > > In Spark, the Scala, Java and Python APIs are identical. Flink 
> > > does not have a Python API for ML pipelines at present.
> > >
> > > Could you point me to the updated runner API?
> > >
> > > Soila
> > >
> > > -----Original Message-----
> > > From: Tyler Akidau [mailto:takidau@google.com.INVALID]
> > > Sent: Friday, May 13, 2016 6:34 PM
> > > To: dev@beam.incubator.apache.org
> > > Subject: Re: machine learning API, common models
> > >
> > > Hi Kam & Soila,
> > >
> > > Thanks a lot for writing this up. I ran the doc past some of the 
> > > folks who've been doing ML work here at Google, and they were 
> > > generally happy with the distillation of common methods in the doc.
> > > I'd be curious to
> > hear
> > > what folks on the Flink- and Spark- runner sides think.
> > >
> > > To me, this seems like a good direction for a high-level API.
> > > Presumably, once a high-level API is in place, we could begin 
> > > looking at what it
> > would
> > > take to add lower-level ML algorithm support (e.g. iterative) to 
> > > the Beam Model. Is this essentially what you're thinking?
> > >
> > > Some more specific questions/comments:
> > >
> > >    - Presumably you'd want to tackle this in Java first, since 
> > > that's
> the
> > >    only language we currently support? Given that half of your 
> > > examples are in
> > >    Python, I'm also assuming Python will be interesting once it's 
> > > available.
> > >
> > >    - Along those lines, what languages are represented in the
> capability
> > >    matrix? E.g. is Spark ML support as detailed there identical across
> > >    Java/Scala and Python?
> > >
> > >    - Have you thought about how this would tie in at the runner level,
> > >    particularly given the updated Runner API changes that are coming?
> I'm
> > >    assuming they'd be provided as composite transforms that (for
> > > now)
> > would
> > >    have no default implementation, given the lack of low-level 
> > > primitives for
> > >    ML algorithms, but am curious what your thoughts are there.
> > >
> > >    - I still don't fully understand how incremental updates due to
> model
> > >    drift would tie in at the API level. There's a comment thread 
> > > in the
> > doc
> > >    still open tracking this, so no need to comment here additionally.
> > Just
> > >    pointing it out as one of the things that stands out as 
> > > potentially having
> > >    API-level impacts to me that doesn't seem 100% fleshed out in 
> > > the doc yet
> > >    (thought that admittedly may just be my limited understanding 
> > > at this point
> > >    :-).
> > >
> > > -Tyler
> > >
> > >
> > >
> > >
> > > On Fri, May 13, 2016 at 10:48 AM Kam Kasravi 
> > > <ka...@gmail.com>
> > wrote:
> > >
> > > > Hi Tyler - my bad. Comments should be enabled now.
> > > >
> > > > On Fri, May 13, 2016 at 10:45 AM, Tyler Akidau 
> > > > <takidau@google.com.invalid
> > > > >
> > > > wrote:
> > > >
> > > > > Thanks a lot, Kam. Can you please enable comment access on the doc?
> > > > > I
> > > > seem
> > > > > to have view access only.
> > > > >
> > > > > -Tyler
> > > > >
> > > > > On Fri, May 13, 2016 at 9:54 AM Kam Kasravi 
> > > > > <ka...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Hi
> > > > > >
> > > > > > A number of readers have made comments on this topic recently.
> > > > > > We have created a document that does some analysis of common 
> > > > > > ML models and
> > > > > related
> > > > > > APIs. We hope this can drive an approach that will result in 
> > > > > > an API, compatibility matrix and involvement from the same 
> > > > > > groups that are implementing transformation runners (spark,
> flink, etc).
> > > > > > We welcome comments here or in the document itself.
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > https://docs.google.com/document/d/17cRZk_yqHm3C0fljivjN66MbLkeK
> > > > S1
> > > > yjo4
> > > > PBECHb-xA/edit?usp=sharing
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>

Re: machine learning API, common models

Posted by Suneel Marthi <sm...@apache.org>.

I am curious as to why Oryx 2.0 and Mahout have been excluded from this
doc. Any reasons?
Both the projects have good customer base and are being used in production.

On Tue, May 17, 2016 at 10:01 AM, Suneel Marthi <sm...@apache.org> wrote:

> Thanks Simone for pointing this out.
>
> On the Apache Mahout project we have distributed linear algebra with
> R-like semantics that can be executed on Spark/Flink/H2O.
>
> @Kam: the document u point out is old and outdated, the most up-to-date
> reference to the Samsara api is the book - 'Apache Mahout: Beyond
> MapReduce". (shameless marketing here on behalf of fellow committers :) )
>
> We added Flink DataSet API in the recent Mahout 0.12.0 release (April 11,
> 2016) and has been called out in my talk at ApacheBigData in Vancouver last
> week.
>
> The Mahout community would definitely be interested in being involved with
> this and sharing notes.
>
> IMHO, the focus should be first on building a good linalg foundations
> before embarking on building algos and pipelines. Adding @dlyubimov to this.
>
>
>
> ---------- Forwarded message ----------
> From: Simone Robutti <si...@radicalbit.io>
> Date: Tue, May 17, 2016 at 9:48 AM
> Subject: Fwd: machine learning API, common models
> To: Suneel Marthi <sm...@apache.org>
>
>
>
> ---------- Forwarded message ----------
> From: Kavulya, Soila P <so...@intel.com>
> Date: 2016-05-17 1:53 GMT+02:00
> Subject: RE: machine learning API, common models
> To: "dev@beam.incubator.apache.org" <de...@beam.incubator.apache.org>
>
>
> Thanks Simone,
>
> You have raised a valid concern about how different frameworks will have
> different implementations and parameter semantics for the same algorithm. I
> agree that it is important to keep this in mind. Hopefully, through this
> exercise, we will identify a good set of common ML abstractions across
> different frameworks.
>
> Feel free to edit the document. We had limited the first pass of the
> comparison matrix to the machine learning pipeline APIs, but we can extend
> it to include other ML building blocks like linear algebra operations, and
> APIs for optimizers like gradient descent.
>
> Soila
>
> -----Original Message-----
> From: Kam Kasravi [mailto:kamkasravi@gmail.com]
> Sent: Monday, May 16, 2016 8:22 AM
> To: dev@beam.incubator.apache.org
> Subject: Re: machine learning API, common models
>
> Thanks Simone - yes I had read your concerns on dev and I think they're
> well founded.
> Thanks for the samsura reference - I've been looking at the spark/scala
> bindings
> http://mahout.apache.org/users/sparkbindings/ScalaSparkBindings.pdf.
>
> I think we should expand the document to include linear algebraic ops or
> least pay due diligence to it. If you're doing anything on the flink side
> in this regard let us or feel free to suggest edits/updates to the document.
>
> Thanks
> Kam
>
> On Mon, May 16, 2016 at 6:05 AM, Simone Robutti <
> simone.robutti@radicalbit.io> wrote:
>
> > Hello,
> >
> > I'm Simone and I just began contributing to Flink ML (actually on the
> > distributed linalg part). I already expressed my concerns about the
> > idea of an high level API relying on specific frameworks'
> implementations:
> > different implementations produce different results and may vary in
> > quality. Also the semantics of parameters may change from one
> > implementation to the other. This could hinder portability and
> > transparency. I believe these problems could be handled paying the due
> > attention to the details of every single implementation but I invite
> > you not to underestimate these problems.
> >
> > On the other hand the API in itself looks good to me. From my side, I
> > hope to fill some of the gaps in Flink you underlined in the comparison
> matrix.
> >
> > Talking about matrices, proper matrices this time, I believe it would
> > be useful to include in this API support for linear algebra operations.
> > Something similar is already present in Mahout's Samsara and it looks
> > really good but clearly a similar implementation on Beam would be way
> > more interesting and powerful.
> >
> > My 2 cents,
> >
> > Simone
> >
> >
> > 2016-05-14 4:53 GMT+02:00 Kavulya, Soila P <so...@intel.com>:
> >
> > > Hi Tyler,
> > >
> > > Thank you so much for your feedback. I agree that starting with the
> > > high-level API is a good direction. We are interested in Python
> > > because
> > it
> > > is the language that our data scientists are most familiar with. I
> > > think starting with Java would be the best approach, because the
> > > Python API can be a thin wrapper for Java API.
> > >
> > > In Spark, the Scala, Java and Python APIs are identical. Flink does
> > > not have a Python API for ML pipelines at present.
> > >
> > > Could you point me to the updated runner API?
> > >
> > > Soila
> > >
> > > -----Original Message-----
> > > From: Tyler Akidau [mailto:takidau@google.com.INVALID]
> > > Sent: Friday, May 13, 2016 6:34 PM
> > > To: dev@beam.incubator.apache.org
> > > Subject: Re: machine learning API, common models
> > >
> > > Hi Kam & Soila,
> > >
> > > Thanks a lot for writing this up. I ran the doc past some of the
> > > folks who've been doing ML work here at Google, and they were
> > > generally happy with the distillation of common methods in the doc.
> > > I'd be curious to
> > hear
> > > what folks on the Flink- and Spark- runner sides think.
> > >
> > > To me, this seems like a good direction for a high-level API.
> > > Presumably, once a high-level API is in place, we could begin
> > > looking at what it
> > would
> > > take to add lower-level ML algorithm support (e.g. iterative) to the
> > > Beam Model. Is this essentially what you're thinking?
> > >
> > > Some more specific questions/comments:
> > >
> > >    - Presumably you'd want to tackle this in Java first, since that's
> the
> > >    only language we currently support? Given that half of your
> > > examples are in
> > >    Python, I'm also assuming Python will be interesting once it's
> > > available.
> > >
> > >    - Along those lines, what languages are represented in the
> capability
> > >    matrix? E.g. is Spark ML support as detailed there identical across
> > >    Java/Scala and Python?
> > >
> > >    - Have you thought about how this would tie in at the runner level,
> > >    particularly given the updated Runner API changes that are coming?
> I'm
> > >    assuming they'd be provided as composite transforms that (for
> > > now)
> > would
> > >    have no default implementation, given the lack of low-level
> > > primitives for
> > >    ML algorithms, but am curious what your thoughts are there.
> > >
> > >    - I still don't fully understand how incremental updates due to
> model
> > >    drift would tie in at the API level. There's a comment thread in
> > > the
> > doc
> > >    still open tracking this, so no need to comment here additionally.
> > Just
> > >    pointing it out as one of the things that stands out as
> > > potentially having
> > >    API-level impacts to me that doesn't seem 100% fleshed out in the
> > > doc yet
> > >    (thought that admittedly may just be my limited understanding at
> > > this point
> > >    :-).
> > >
> > > -Tyler
> > >
> > >
> > >
> > >
> > > On Fri, May 13, 2016 at 10:48 AM Kam Kasravi <ka...@gmail.com>
> > wrote:
> > >
> > > > Hi Tyler - my bad. Comments should be enabled now.
> > > >
> > > > On Fri, May 13, 2016 at 10:45 AM, Tyler Akidau
> > > > <takidau@google.com.invalid
> > > > >
> > > > wrote:
> > > >
> > > > > Thanks a lot, Kam. Can you please enable comment access on the doc?
> > > > > I
> > > > seem
> > > > > to have view access only.
> > > > >
> > > > > -Tyler
> > > > >
> > > > > On Fri, May 13, 2016 at 9:54 AM Kam Kasravi
> > > > > <ka...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Hi
> > > > > >
> > > > > > A number of readers have made comments on this topic recently.
> > > > > > We have created a document that does some analysis of common
> > > > > > ML models and
> > > > > related
> > > > > > APIs. We hope this can drive an approach that will result in
> > > > > > an API, compatibility matrix and involvement from the same
> > > > > > groups that are implementing transformation runners (spark,
> flink, etc).
> > > > > > We welcome comments here or in the document itself.
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > https://docs.google.com/document/d/17cRZk_yqHm3C0fljivjN66MbLkeKS1
> > > > yjo4
> > > > PBECHb-xA/edit?usp=sharing
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>

Re: machine learning API, common models

Posted by Suneel Marthi <sm...@apache.org>.

I am curious as to why Oryx 2.0 and Mahout have been excluded from this
doc. Any reasons?
Both the projects have good customer base and are being used in production.

On Tue, May 17, 2016 at 10:01 AM, Suneel Marthi <sm...@apache.org> wrote:

> Thanks Simone for pointing this out.
>
> On the Apache Mahout project we have distributed linear algebra with
> R-like semantics that can be executed on Spark/Flink/H2O.
>
> @Kam: the document u point out is old and outdated, the most up-to-date
> reference to the Samsara api is the book - 'Apache Mahout: Beyond
> MapReduce". (shameless marketing here on behalf of fellow committers :) )
>
> We added Flink DataSet API in the recent Mahout 0.12.0 release (April 11,
> 2016) and has been called out in my talk at ApacheBigData in Vancouver last
> week.
>
> The Mahout community would definitely be interested in being involved with
> this and sharing notes.
>
> IMHO, the focus should be first on building a good linalg foundations
> before embarking on building algos and pipelines. Adding @dlyubimov to this.
>
>
>
> ---------- Forwarded message ----------
> From: Simone Robutti <si...@radicalbit.io>
> Date: Tue, May 17, 2016 at 9:48 AM
> Subject: Fwd: machine learning API, common models
> To: Suneel Marthi <sm...@apache.org>
>
>
>
> ---------- Forwarded message ----------
> From: Kavulya, Soila P <so...@intel.com>
> Date: 2016-05-17 1:53 GMT+02:00
> Subject: RE: machine learning API, common models
> To: "dev@beam.incubator.apache.org" <de...@beam.incubator.apache.org>
>
>
> Thanks Simone,
>
> You have raised a valid concern about how different frameworks will have
> different implementations and parameter semantics for the same algorithm. I
> agree that it is important to keep this in mind. Hopefully, through this
> exercise, we will identify a good set of common ML abstractions across
> different frameworks.
>
> Feel free to edit the document. We had limited the first pass of the
> comparison matrix to the machine learning pipeline APIs, but we can extend
> it to include other ML building blocks like linear algebra operations, and
> APIs for optimizers like gradient descent.
>
> Soila
>
> -----Original Message-----
> From: Kam Kasravi [mailto:kamkasravi@gmail.com]
> Sent: Monday, May 16, 2016 8:22 AM
> To: dev@beam.incubator.apache.org
> Subject: Re: machine learning API, common models
>
> Thanks Simone - yes I had read your concerns on dev and I think they're
> well founded.
> Thanks for the samsura reference - I've been looking at the spark/scala
> bindings
> http://mahout.apache.org/users/sparkbindings/ScalaSparkBindings.pdf.
>
> I think we should expand the document to include linear algebraic ops or
> least pay due diligence to it. If you're doing anything on the flink side
> in this regard let us or feel free to suggest edits/updates to the document.
>
> Thanks
> Kam
>
> On Mon, May 16, 2016 at 6:05 AM, Simone Robutti <
> simone.robutti@radicalbit.io> wrote:
>
> > Hello,
> >
> > I'm Simone and I just began contributing to Flink ML (actually on the
> > distributed linalg part). I already expressed my concerns about the
> > idea of an high level API relying on specific frameworks'
> implementations:
> > different implementations produce different results and may vary in
> > quality. Also the semantics of parameters may change from one
> > implementation to the other. This could hinder portability and
> > transparency. I believe these problems could be handled paying the due
> > attention to the details of every single implementation but I invite
> > you not to underestimate these problems.
> >
> > On the other hand the API in itself looks good to me. From my side, I
> > hope to fill some of the gaps in Flink you underlined in the comparison
> matrix.
> >
> > Talking about matrices, proper matrices this time, I believe it would
> > be useful to include in this API support for linear algebra operations.
> > Something similar is already present in Mahout's Samsara and it looks
> > really good but clearly a similar implementation on Beam would be way
> > more interesting and powerful.
> >
> > My 2 cents,
> >
> > Simone
> >
> >
> > 2016-05-14 4:53 GMT+02:00 Kavulya, Soila P <so...@intel.com>:
> >
> > > Hi Tyler,
> > >
> > > Thank you so much for your feedback. I agree that starting with the
> > > high-level API is a good direction. We are interested in Python
> > > because
> > it
> > > is the language that our data scientists are most familiar with. I
> > > think starting with Java would be the best approach, because the
> > > Python API can be a thin wrapper for Java API.
> > >
> > > In Spark, the Scala, Java and Python APIs are identical. Flink does
> > > not have a Python API for ML pipelines at present.
> > >
> > > Could you point me to the updated runner API?
> > >
> > > Soila
> > >
> > > -----Original Message-----
> > > From: Tyler Akidau [mailto:takidau@google.com.INVALID]
> > > Sent: Friday, May 13, 2016 6:34 PM
> > > To: dev@beam.incubator.apache.org
> > > Subject: Re: machine learning API, common models
> > >
> > > Hi Kam & Soila,
> > >
> > > Thanks a lot for writing this up. I ran the doc past some of the
> > > folks who've been doing ML work here at Google, and they were
> > > generally happy with the distillation of common methods in the doc.
> > > I'd be curious to
> > hear
> > > what folks on the Flink- and Spark- runner sides think.
> > >
> > > To me, this seems like a good direction for a high-level API.
> > > Presumably, once a high-level API is in place, we could begin
> > > looking at what it
> > would
> > > take to add lower-level ML algorithm support (e.g. iterative) to the
> > > Beam Model. Is this essentially what you're thinking?
> > >
> > > Some more specific questions/comments:
> > >
> > >    - Presumably you'd want to tackle this in Java first, since that's
> the
> > >    only language we currently support? Given that half of your
> > > examples are in
> > >    Python, I'm also assuming Python will be interesting once it's
> > > available.
> > >
> > >    - Along those lines, what languages are represented in the
> capability
> > >    matrix? E.g. is Spark ML support as detailed there identical across
> > >    Java/Scala and Python?
> > >
> > >    - Have you thought about how this would tie in at the runner level,
> > >    particularly given the updated Runner API changes that are coming?
> I'm
> > >    assuming they'd be provided as composite transforms that (for
> > > now)
> > would
> > >    have no default implementation, given the lack of low-level
> > > primitives for
> > >    ML algorithms, but am curious what your thoughts are there.
> > >
> > >    - I still don't fully understand how incremental updates due to
> model
> > >    drift would tie in at the API level. There's a comment thread in
> > > the
> > doc
> > >    still open tracking this, so no need to comment here additionally.
> > Just
> > >    pointing it out as one of the things that stands out as
> > > potentially having
> > >    API-level impacts to me that doesn't seem 100% fleshed out in the
> > > doc yet
> > >    (thought that admittedly may just be my limited understanding at
> > > this point
> > >    :-).
> > >
> > > -Tyler
> > >
> > >
> > >
> > >
> > > On Fri, May 13, 2016 at 10:48 AM Kam Kasravi <ka...@gmail.com>
> > wrote:
> > >
> > > > Hi Tyler - my bad. Comments should be enabled now.
> > > >
> > > > On Fri, May 13, 2016 at 10:45 AM, Tyler Akidau
> > > > <takidau@google.com.invalid
> > > > >
> > > > wrote:
> > > >
> > > > > Thanks a lot, Kam. Can you please enable comment access on the doc?
> > > > > I
> > > > seem
> > > > > to have view access only.
> > > > >
> > > > > -Tyler
> > > > >
> > > > > On Fri, May 13, 2016 at 9:54 AM Kam Kasravi
> > > > > <ka...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Hi
> > > > > >
> > > > > > A number of readers have made comments on this topic recently.
> > > > > > We have created a document that does some analysis of common
> > > > > > ML models and
> > > > > related
> > > > > > APIs. We hope this can drive an approach that will result in
> > > > > > an API, compatibility matrix and involvement from the same
> > > > > > groups that are implementing transformation runners (spark,
> flink, etc).
> > > > > > We welcome comments here or in the document itself.
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > https://docs.google.com/document/d/17cRZk_yqHm3C0fljivjN66MbLkeKS1
> > > > yjo4
> > > > PBECHb-xA/edit?usp=sharing
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>

Re: Fwd: machine learning API, common models

Posted by Simone Robutti <si...@radicalbit.io>.

+1

2016-05-27 17:18 GMT+02:00 Kam Kasravi <ka...@gmail.com>:

> Hi Beam ML community
>
> Based on comments from a number of you and some discussion we've had here
> we thought we would suggest the following direction:
>
>    - Begin with primitive operations common and critical to most all ML
>    algorithms. These primitive operators would include:
>       - linear algebra operations - borrowing from established libraries
>       like samsara.
>       - iterative processing - also central to ML where replay of datasets
>       is easy to specific as well as thresholds or halting criteria. This
>       coordinates well with FlinkML's current approach and base API's.
>       - possibly new broadcast mechanisms not normally available within BSP
>       frameworks such as Beam.
>    - Normalize dataset and parameters that differ across current major ML
>    libraries that offer the same types of models.
>    - Favor a native ML implementation rather than a thin wrapper in order
>    to provide consistency across runners. This will also allow the Beam ML
> to
>    maximize quality and consistency issues across runners.
>    - Support for languages also supported in the Beam runners (java,
>    python, scala).
>    - Implement several common ML algorithms using the low level primitives
>    on one of more available Runners to validate both the low level API's
> and
>    possible improvements on the high level API.
>
> Skikit-learn pipelines and existing portable libraries like xgboost4j will
> be valuable to model the high-level APIs - for example how xgboost4j
> currently integrates with spark and flink.
>
> We welcome further comments and further refinements in approach.
>
> On Sun, May 22, 2016 at 7:43 PM, Henry Saputra <he...@gmail.com>
> wrote:
>
> > @Frances:
> >
> > that would be probably the way to go IF we decide to have ML in Beam.
> >
> > @Simone:
> >
> > I am definitely love to see Beam introduce ML model APIs to abstract and
> > unifiy all "dataflow" runner frameworks, such as with Flink ML and Spark
> > ML.
> >
> > However, as you mentioned before, the target audience would be focus on
> > distributed or ML engineers as you have mentioned.
> > But I could see we have to then make some out of box ML algorithms (model
> > train and fine tune) in addition to test the model and APIs.
> >
> > The expectation would be that these models to be "production" ready, in
> > which most cases will be used by Data Scientists via some configurations,
> > since they won't and most can't use Java language.
> >
> > I would love to see instead more on integration with existing ML
> frameworks
> > like XGBoost [1], Mahout Samsara [2], or DL4J [3] for ML APIs and models
> in
> > Beam.
> >
> > Thoughts and comments are definitely welcomed =)
> >
> > - Henry
> >
> > [1] https://github.com/dmlc/xgboost
> > [2]
> https://mahout.apache.org/users/environment/out-of-core-reference.html
> > [3] http://deeplearning4j.org
> > <http://deeplearning4j.org/image-data-pipeline.html#record>
> >
> >
> > On Sat, May 21, 2016 at 2:01 AM, Simone Robutti <
> > simone.robutti@radicalbit.io> wrote:
> >
> > > I think these APIs won't be used by Data Scientists (R, Python) but by
> > > Machine Learning Engineers (Scala, Java or C++ in different
> environments)
> > > and as a ML Engineer it makes a lot of sense to me to have such an API
> if
> > > I'm using Beam. It would make a lot more sense to implement algorithms
> > > directly in Beam but that will come in the future, I hope.
> > >
> > > 2016-05-21 0:35 GMT+02:00 Henry Saputra <he...@gmail.com>:
> > >
> > > > I am a bit concern about adding ML model APIs to Beam because the
> > > fluctuate
> > > > nature of ML landscape and also in reality, most data scientists tend
> > to
> > > > use Python and R most the work with existing model definition.
> > > >
> > > > Even though you could say something like Spark ML is popular, it is
> > > merely
> > > > because it is involving Apache Spark rather than quality of the ML
> > module
> > > > itself.
> > > >
> > > > The pipeline and most of the tooling are inspired by scikit-learn,
> and
> > > > hence it is relying on familiarity of the library to attract
> > developers.
> > > >
> > > > My question is whether fully end to end ML APIs is needed as part of
> > core
> > > > Beam APIs.
> > > >
> > > > - Henry
> > > >
> > > > On Thu, May 19, 2016 at 5:46 AM, Jianfeng Qian <
> > qianjianfeng@outlook.com
> > > >
> > > > wrote:
> > > >
> > > > > Hi,
> > > > > I am quite interested about this proposal.
> > > > > it is great to consider a lot of machine learning projects.
> > > > > Currently, most algorithms of spark mllib are batch processing,
> while
> > > > > oryx2 and streamDM focus on real-time machine learning.
> > > > > And Flink works with SAMOA team to integrate stream mining
> > algorithms,
> > > > too.
> > > > > So I wonder is that possible to design A flexible SDK which allow
> > user
> > > > > to call different third party packages or their own algorithms?
> > > > >
> > > > > Best,
> > > > > Jianfeng
> > > > >
> > > > > On 2016年05月17日 22:01, Suneel Marthi wrote:
> > > > > > Thanks Simone for pointing this out.
> > > > > >
> > > > > > On the Apache Mahout project we have distributed linear algebra
> > with
> > > > > R-like
> > > > > > semantics that can be executed on Spark/Flink/H2O.
> > > > > >
> > > > > > @Kam: the document u point out is old and outdated, the most
> > > up-to-date
> > > > > > reference to the Samsara api is the book - 'Apache Mahout: Beyond
> > > > > > MapReduce". (shameless marketing here on behalf of fellow
> > committers
> > > > :) )
> > > > > >
> > > > > > We added Flink DataSet API in the recent Mahout 0.12.0 release
> > (April
> > > > 11,
> > > > > > 2016) and has been called out in my talk at ApacheBigData in
> > > Vancouver
> > > > > last
> > > > > > week.
> > > > > >
> > > > > > The Mahout community would definitely be interested in being
> > involved
> > > > > with
> > > > > > this and sharing notes.
> > > > > >
> > > > > > IMHO, the focus should be first on building a good linalg
> > foundations
> > > > > > before embarking on building algos and pipelines. Adding
> @dlyubimov
> > > to
> > > > > this.
> > > > > >
> > > > > >
> > > > > >
> > > > > > ---------- Forwarded message ----------
> > > > > > From: Simone Robutti <si...@radicalbit.io>
> > > > > > Date: Tue, May 17, 2016 at 9:48 AM
> > > > > > Subject: Fwd: machine learning API, common models
> > > > > > To: Suneel Marthi <sm...@apache.org>
> > > > > >
> > > > > >
> > > > > >
> > > > > > ---------- Forwarded message ----------
> > > > > > From: Kavulya, Soila P <so...@intel.com>
> > > > > > Date: 2016-05-17 1:53 GMT+02:00
> > > > > > Subject: RE: machine learning API, common models
> > > > > > To: "dev@beam.incubator.apache.org" <
> dev@beam.incubator.apache.org
> > >
> > > > > >
> > > > > >
> > > > > > Thanks Simone,
> > > > > >
> > > > > > You have raised a valid concern about how different frameworks
> will
> > > > have
> > > > > > different implementations and parameter semantics for the same
> > > > > algorithm. I
> > > > > > agree that it is important to keep this in mind. Hopefully,
> through
> > > > this
> > > > > > exercise, we will identify a good set of common ML abstractions
> > > across
> > > > > > different frameworks.
> > > > > >
> > > > > > Feel free to edit the document. We had limited the first pass of
> > the
> > > > > > comparison matrix to the machine learning pipeline APIs, but we
> can
> > > > > extend
> > > > > > it to include other ML building blocks like linear algebra
> > > operations,
> > > > > and
> > > > > > APIs for optimizers like gradient descent.
> > > > > >
> > > > > > Soila
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: Kam Kasravi [mailto:kamkasravi@gmail.com]
> > > > > > Sent: Monday, May 16, 2016 8:22 AM
> > > > > > To: dev@beam.incubator.apache.org
> > > > > > Subject: Re: machine learning API, common models
> > > > > >
> > > > > > Thanks Simone - yes I had read your concerns on dev and I think
> > > they're
> > > > > > well founded.
> > > > > > Thanks for the samsura reference - I've been looking at the
> > > spark/scala
> > > > > > bindings
> > > > >
> http://mahout.apache.org/users/sparkbindings/ScalaSparkBindings.pdf
> > > > > > .
> > > > > >
> > > > > > I think we should expand the document to include linear algebraic
> > ops
> > > > or
> > > > > > least pay due diligence to it. If you're doing anything on the
> > flink
> > > > side
> > > > > > in this regard let us or feel free to suggest edits/updates to
> the
> > > > > document.
> > > > > >
> > > > > > Thanks
> > > > > > Kam
> > > > > >
> > > > > > On Mon, May 16, 2016 at 6:05 AM, Simone Robutti <
> > > > > > simone.robutti@radicalbit.io> wrote:
> > > > > >
> > > > > >> Hello,
> > > > > >>
> > > > > >> I'm Simone and I just began contributing to Flink ML (actually
> on
> > > the
> > > > > >> distributed linalg part). I already expressed my concerns about
> > the
> > > > > >> idea of an high level API relying on specific frameworks'
> > > > > implementations:
> > > > > >> different implementations produce different results and may vary
> > in
> > > > > >> quality. Also the semantics of parameters may change from one
> > > > > >> implementation to the other. This could hinder portability and
> > > > > >> transparency. I believe these problems could be handled paying
> the
> > > due
> > > > > >> attention to the details of every single implementation but I
> > invite
> > > > > >> you not to underestimate these problems.
> > > > > >>
> > > > > >> On the other hand the API in itself looks good to me. From my
> > side,
> > > I
> > > > > >> hope to fill some of the gaps in Flink you underlined in the
> > > > comparison
> > > > > > matrix.
> > > > > >> Talking about matrices, proper matrices this time, I believe it
> > > would
> > > > > >> be useful to include in this API support for linear algebra
> > > > operations.
> > > > > >> Something similar is already present in Mahout's Samsara and it
> > > looks
> > > > > >> really good but clearly a similar implementation on Beam would
> be
> > > way
> > > > > >> more interesting and powerful.
> > > > > >>
> > > > > >> My 2 cents,
> > > > > >>
> > > > > >> Simone
> > > > > >>
> > > > > >>
> > > > > >> 2016-05-14 4:53 GMT+02:00 Kavulya, Soila P <
> > > soila.p.kavulya@intel.com
> > > > >:
> > > > > >>
> > > > > >>> Hi Tyler,
> > > > > >>>
> > > > > >>> Thank you so much for your feedback. I agree that starting with
> > the
> > > > > >>> high-level API is a good direction. We are interested in Python
> > > > > >>> because
> > > > > >> it
> > > > > >>> is the language that our data scientists are most familiar
> with.
> > I
> > > > > >>> think starting with Java would be the best approach, because
> the
> > > > > >>> Python API can be a thin wrapper for Java API.
> > > > > >>>
> > > > > >>> In Spark, the Scala, Java and Python APIs are identical. Flink
> > does
> > > > > >>> not have a Python API for ML pipelines at present.
> > > > > >>>
> > > > > >>> Could you point me to the updated runner API?
> > > > > >>>
> > > > > >>> Soila
> > > > > >>>
> > > > > >>> -----Original Message-----
> > > > > >>> From: Tyler Akidau [mailto:takidau@google.com.INVALID]
> > > > > >>> Sent: Friday, May 13, 2016 6:34 PM
> > > > > >>> To: dev@beam.incubator.apache.org
> > > > > >>> Subject: Re: machine learning API, common models
> > > > > >>>
> > > > > >>> Hi Kam & Soila,
> > > > > >>>
> > > > > >>> Thanks a lot for writing this up. I ran the doc past some of
> the
> > > > > >>> folks who've been doing ML work here at Google, and they were
> > > > > >>> generally happy with the distillation of common methods in the
> > doc.
> > > > > >>> I'd be curious to
> > > > > >> hear
> > > > > >>> what folks on the Flink- and Spark- runner sides think.
> > > > > >>>
> > > > > >>> To me, this seems like a good direction for a high-level API.
> > > > > >>> Presumably, once a high-level API is in place, we could begin
> > > > > >>> looking at what it
> > > > > >> would
> > > > > >>> take to add lower-level ML algorithm support (e.g. iterative)
> to
> > > the
> > > > > >>> Beam Model. Is this essentially what you're thinking?
> > > > > >>>
> > > > > >>> Some more specific questions/comments:
> > > > > >>>
> > > > > >>>     - Presumably you'd want to tackle this in Java first, since
> > > > that's
> > > > > > the
> > > > > >>>     only language we currently support? Given that half of your
> > > > > >>> examples are in
> > > > > >>>     Python, I'm also assuming Python will be interesting once
> > it's
> > > > > >>> available.
> > > > > >>>
> > > > > >>>     - Along those lines, what languages are represented in the
> > > > > capability
> > > > > >>>     matrix? E.g. is Spark ML support as detailed there
> identical
> > > > across
> > > > > >>>     Java/Scala and Python?
> > > > > >>>
> > > > > >>>     - Have you thought about how this would tie in at the
> runner
> > > > level,
> > > > > >>>     particularly given the updated Runner API changes that are
> > > > coming?
> > > > > > I'm
> > > > > >>>     assuming they'd be provided as composite transforms that
> (for
> > > > > >>> now)
> > > > > >> would
> > > > > >>>     have no default implementation, given the lack of low-level
> > > > > >>> primitives for
> > > > > >>>     ML algorithms, but am curious what your thoughts are there.
> > > > > >>>
> > > > > >>>     - I still don't fully understand how incremental updates
> due
> > to
> > > > > model
> > > > > >>>     drift would tie in at the API level. There's a comment
> thread
> > > in
> > > > > >>> the
> > > > > >> doc
> > > > > >>>     still open tracking this, so no need to comment here
> > > > additionally.
> > > > > >> Just
> > > > > >>>     pointing it out as one of the things that stands out as
> > > > > >>> potentially having
> > > > > >>>     API-level impacts to me that doesn't seem 100% fleshed out
> in
> > > the
> > > > > >>> doc yet
> > > > > >>>     (thought that admittedly may just be my limited
> understanding
> > > at
> > > > > >>> this point
> > > > > >>>     :-).
> > > > > >>>
> > > > > >>> -Tyler
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>> On Fri, May 13, 2016 at 10:48 AM Kam Kasravi <
> > kamkasravi@gmail.com
> > > >
> > > > > >> wrote:
> > > > > >>>> Hi Tyler - my bad. Comments should be enabled now.
> > > > > >>>>
> > > > > >>>> On Fri, May 13, 2016 at 10:45 AM, Tyler Akidau
> > > > > >>>> <takidau@google.com.invalid
> > > > > >>>> wrote:
> > > > > >>>>
> > > > > >>>>> Thanks a lot, Kam. Can you please enable comment access on
> the
> > > doc?
> > > > > >>>>> I
> > > > > >>>> seem
> > > > > >>>>> to have view access only.
> > > > > >>>>>
> > > > > >>>>> -Tyler
> > > > > >>>>>
> > > > > >>>>> On Fri, May 13, 2016 at 9:54 AM Kam Kasravi
> > > > > >>>>> <ka...@gmail.com>
> > > > > >>>> wrote:
> > > > > >>>>>> Hi
> > > > > >>>>>>
> > > > > >>>>>> A number of readers have made comments on this topic
> recently.
> > > > > >>>>>> We have created a document that does some analysis of common
> > > > > >>>>>> ML models and
> > > > > >>>>> related
> > > > > >>>>>> APIs. We hope this can drive an approach that will result in
> > > > > >>>>>> an API, compatibility matrix and involvement from the same
> > > > > >>>>>> groups that are implementing transformation runners (spark,
> > > > > > flink, etc).
> > > > > >>>>>> We welcome comments here or in the document itself.
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>
> > > https://docs.google.com/document/d/17cRZk_yqHm3C0fljivjN66MbLkeKS1
> > > > > >>>> yjo4
> > > > > >>>> PBECHb-xA/edit?usp=sharing
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: Fwd: machine learning API, common models

Posted by Kam Kasravi <ka...@gmail.com>.

Hi Beam ML community

Based on comments from a number of you and some discussion we've had here
we thought we would suggest the following direction:

   - Begin with primitive operations common and critical to most all ML
   algorithms. These primitive operators would include:
      - linear algebra operations - borrowing from established libraries
      like samsara.
      - iterative processing - also central to ML where replay of datasets
      is easy to specific as well as thresholds or halting criteria. This
      coordinates well with FlinkML's current approach and base API's.
      - possibly new broadcast mechanisms not normally available within BSP
      frameworks such as Beam.
   - Normalize dataset and parameters that differ across current major ML
   libraries that offer the same types of models.
   - Favor a native ML implementation rather than a thin wrapper in order
   to provide consistency across runners. This will also allow the Beam ML to
   maximize quality and consistency issues across runners.
   - Support for languages also supported in the Beam runners (java,
   python, scala).
   - Implement several common ML algorithms using the low level primitives
   on one of more available Runners to validate both the low level API's and
   possible improvements on the high level API.

Skikit-learn pipelines and existing portable libraries like xgboost4j will
be valuable to model the high-level APIs - for example how xgboost4j
currently integrates with spark and flink.

We welcome further comments and further refinements in approach.

On Sun, May 22, 2016 at 7:43 PM, Henry Saputra <he...@gmail.com>
wrote:

> @Frances:
>
> that would be probably the way to go IF we decide to have ML in Beam.
>
> @Simone:
>
> I am definitely love to see Beam introduce ML model APIs to abstract and
> unifiy all "dataflow" runner frameworks, such as with Flink ML and Spark
> ML.
>
> However, as you mentioned before, the target audience would be focus on
> distributed or ML engineers as you have mentioned.
> But I could see we have to then make some out of box ML algorithms (model
> train and fine tune) in addition to test the model and APIs.
>
> The expectation would be that these models to be "production" ready, in
> which most cases will be used by Data Scientists via some configurations,
> since they won't and most can't use Java language.
>
> I would love to see instead more on integration with existing ML frameworks
> like XGBoost [1], Mahout Samsara [2], or DL4J [3] for ML APIs and models in
> Beam.
>
> Thoughts and comments are definitely welcomed =)
>
> - Henry
>
> [1] https://github.com/dmlc/xgboost
> [2] https://mahout.apache.org/users/environment/out-of-core-reference.html
> [3] http://deeplearning4j.org
> <http://deeplearning4j.org/image-data-pipeline.html#record>
>
>
> On Sat, May 21, 2016 at 2:01 AM, Simone Robutti <
> simone.robutti@radicalbit.io> wrote:
>
> > I think these APIs won't be used by Data Scientists (R, Python) but by
> > Machine Learning Engineers (Scala, Java or C++ in different environments)
> > and as a ML Engineer it makes a lot of sense to me to have such an API if
> > I'm using Beam. It would make a lot more sense to implement algorithms
> > directly in Beam but that will come in the future, I hope.
> >
> > 2016-05-21 0:35 GMT+02:00 Henry Saputra <he...@gmail.com>:
> >
> > > I am a bit concern about adding ML model APIs to Beam because the
> > fluctuate
> > > nature of ML landscape and also in reality, most data scientists tend
> to
> > > use Python and R most the work with existing model definition.
> > >
> > > Even though you could say something like Spark ML is popular, it is
> > merely
> > > because it is involving Apache Spark rather than quality of the ML
> module
> > > itself.
> > >
> > > The pipeline and most of the tooling are inspired by scikit-learn, and
> > > hence it is relying on familiarity of the library to attract
> developers.
> > >
> > > My question is whether fully end to end ML APIs is needed as part of
> core
> > > Beam APIs.
> > >
> > > - Henry
> > >
> > > On Thu, May 19, 2016 at 5:46 AM, Jianfeng Qian <
> qianjianfeng@outlook.com
> > >
> > > wrote:
> > >
> > > > Hi,
> > > > I am quite interested about this proposal.
> > > > it is great to consider a lot of machine learning projects.
> > > > Currently, most algorithms of spark mllib are batch processing, while
> > > > oryx2 and streamDM focus on real-time machine learning.
> > > > And Flink works with SAMOA team to integrate stream mining
> algorithms,
> > > too.
> > > > So I wonder is that possible to design A flexible SDK which allow
> user
> > > > to call different third party packages or their own algorithms?
> > > >
> > > > Best,
> > > > Jianfeng
> > > >
> > > > On 2016年05月17日 22:01, Suneel Marthi wrote:
> > > > > Thanks Simone for pointing this out.
> > > > >
> > > > > On the Apache Mahout project we have distributed linear algebra
> with
> > > > R-like
> > > > > semantics that can be executed on Spark/Flink/H2O.
> > > > >
> > > > > @Kam: the document u point out is old and outdated, the most
> > up-to-date
> > > > > reference to the Samsara api is the book - 'Apache Mahout: Beyond
> > > > > MapReduce". (shameless marketing here on behalf of fellow
> committers
> > > :) )
> > > > >
> > > > > We added Flink DataSet API in the recent Mahout 0.12.0 release
> (April
> > > 11,
> > > > > 2016) and has been called out in my talk at ApacheBigData in
> > Vancouver
> > > > last
> > > > > week.
> > > > >
> > > > > The Mahout community would definitely be interested in being
> involved
> > > > with
> > > > > this and sharing notes.
> > > > >
> > > > > IMHO, the focus should be first on building a good linalg
> foundations
> > > > > before embarking on building algos and pipelines. Adding @dlyubimov
> > to
> > > > this.
> > > > >
> > > > >
> > > > >
> > > > > ---------- Forwarded message ----------
> > > > > From: Simone Robutti <si...@radicalbit.io>
> > > > > Date: Tue, May 17, 2016 at 9:48 AM
> > > > > Subject: Fwd: machine learning API, common models
> > > > > To: Suneel Marthi <sm...@apache.org>
> > > > >
> > > > >
> > > > >
> > > > > ---------- Forwarded message ----------
> > > > > From: Kavulya, Soila P <so...@intel.com>
> > > > > Date: 2016-05-17 1:53 GMT+02:00
> > > > > Subject: RE: machine learning API, common models
> > > > > To: "dev@beam.incubator.apache.org" <dev@beam.incubator.apache.org
> >
> > > > >
> > > > >
> > > > > Thanks Simone,
> > > > >
> > > > > You have raised a valid concern about how different frameworks will
> > > have
> > > > > different implementations and parameter semantics for the same
> > > > algorithm. I
> > > > > agree that it is important to keep this in mind. Hopefully, through
> > > this
> > > > > exercise, we will identify a good set of common ML abstractions
> > across
> > > > > different frameworks.
> > > > >
> > > > > Feel free to edit the document. We had limited the first pass of
> the
> > > > > comparison matrix to the machine learning pipeline APIs, but we can
> > > > extend
> > > > > it to include other ML building blocks like linear algebra
> > operations,
> > > > and
> > > > > APIs for optimizers like gradient descent.
> > > > >
> > > > > Soila
> > > > >
> > > > > -----Original Message-----
> > > > > From: Kam Kasravi [mailto:kamkasravi@gmail.com]
> > > > > Sent: Monday, May 16, 2016 8:22 AM
> > > > > To: dev@beam.incubator.apache.org
> > > > > Subject: Re: machine learning API, common models
> > > > >
> > > > > Thanks Simone - yes I had read your concerns on dev and I think
> > they're
> > > > > well founded.
> > > > > Thanks for the samsura reference - I've been looking at the
> > spark/scala
> > > > > bindings
> > > > http://mahout.apache.org/users/sparkbindings/ScalaSparkBindings.pdf
> > > > > .
> > > > >
> > > > > I think we should expand the document to include linear algebraic
> ops
> > > or
> > > > > least pay due diligence to it. If you're doing anything on the
> flink
> > > side
> > > > > in this regard let us or feel free to suggest edits/updates to the
> > > > document.
> > > > >
> > > > > Thanks
> > > > > Kam
> > > > >
> > > > > On Mon, May 16, 2016 at 6:05 AM, Simone Robutti <
> > > > > simone.robutti@radicalbit.io> wrote:
> > > > >
> > > > >> Hello,
> > > > >>
> > > > >> I'm Simone and I just began contributing to Flink ML (actually on
> > the
> > > > >> distributed linalg part). I already expressed my concerns about
> the
> > > > >> idea of an high level API relying on specific frameworks'
> > > > implementations:
> > > > >> different implementations produce different results and may vary
> in
> > > > >> quality. Also the semantics of parameters may change from one
> > > > >> implementation to the other. This could hinder portability and
> > > > >> transparency. I believe these problems could be handled paying the
> > due
> > > > >> attention to the details of every single implementation but I
> invite
> > > > >> you not to underestimate these problems.
> > > > >>
> > > > >> On the other hand the API in itself looks good to me. From my
> side,
> > I
> > > > >> hope to fill some of the gaps in Flink you underlined in the
> > > comparison
> > > > > matrix.
> > > > >> Talking about matrices, proper matrices this time, I believe it
> > would
> > > > >> be useful to include in this API support for linear algebra
> > > operations.
> > > > >> Something similar is already present in Mahout's Samsara and it
> > looks
> > > > >> really good but clearly a similar implementation on Beam would be
> > way
> > > > >> more interesting and powerful.
> > > > >>
> > > > >> My 2 cents,
> > > > >>
> > > > >> Simone
> > > > >>
> > > > >>
> > > > >> 2016-05-14 4:53 GMT+02:00 Kavulya, Soila P <
> > soila.p.kavulya@intel.com
> > > >:
> > > > >>
> > > > >>> Hi Tyler,
> > > > >>>
> > > > >>> Thank you so much for your feedback. I agree that starting with
> the
> > > > >>> high-level API is a good direction. We are interested in Python
> > > > >>> because
> > > > >> it
> > > > >>> is the language that our data scientists are most familiar with.
> I
> > > > >>> think starting with Java would be the best approach, because the
> > > > >>> Python API can be a thin wrapper for Java API.
> > > > >>>
> > > > >>> In Spark, the Scala, Java and Python APIs are identical. Flink
> does
> > > > >>> not have a Python API for ML pipelines at present.
> > > > >>>
> > > > >>> Could you point me to the updated runner API?
> > > > >>>
> > > > >>> Soila
> > > > >>>
> > > > >>> -----Original Message-----
> > > > >>> From: Tyler Akidau [mailto:takidau@google.com.INVALID]
> > > > >>> Sent: Friday, May 13, 2016 6:34 PM
> > > > >>> To: dev@beam.incubator.apache.org
> > > > >>> Subject: Re: machine learning API, common models
> > > > >>>
> > > > >>> Hi Kam & Soila,
> > > > >>>
> > > > >>> Thanks a lot for writing this up. I ran the doc past some of the
> > > > >>> folks who've been doing ML work here at Google, and they were
> > > > >>> generally happy with the distillation of common methods in the
> doc.
> > > > >>> I'd be curious to
> > > > >> hear
> > > > >>> what folks on the Flink- and Spark- runner sides think.
> > > > >>>
> > > > >>> To me, this seems like a good direction for a high-level API.
> > > > >>> Presumably, once a high-level API is in place, we could begin
> > > > >>> looking at what it
> > > > >> would
> > > > >>> take to add lower-level ML algorithm support (e.g. iterative) to
> > the
> > > > >>> Beam Model. Is this essentially what you're thinking?
> > > > >>>
> > > > >>> Some more specific questions/comments:
> > > > >>>
> > > > >>>     - Presumably you'd want to tackle this in Java first, since
> > > that's
> > > > > the
> > > > >>>     only language we currently support? Given that half of your
> > > > >>> examples are in
> > > > >>>     Python, I'm also assuming Python will be interesting once
> it's
> > > > >>> available.
> > > > >>>
> > > > >>>     - Along those lines, what languages are represented in the
> > > > capability
> > > > >>>     matrix? E.g. is Spark ML support as detailed there identical
> > > across
> > > > >>>     Java/Scala and Python?
> > > > >>>
> > > > >>>     - Have you thought about how this would tie in at the runner
> > > level,
> > > > >>>     particularly given the updated Runner API changes that are
> > > coming?
> > > > > I'm
> > > > >>>     assuming they'd be provided as composite transforms that (for
> > > > >>> now)
> > > > >> would
> > > > >>>     have no default implementation, given the lack of low-level
> > > > >>> primitives for
> > > > >>>     ML algorithms, but am curious what your thoughts are there.
> > > > >>>
> > > > >>>     - I still don't fully understand how incremental updates due
> to
> > > > model
> > > > >>>     drift would tie in at the API level. There's a comment thread
> > in
> > > > >>> the
> > > > >> doc
> > > > >>>     still open tracking this, so no need to comment here
> > > additionally.
> > > > >> Just
> > > > >>>     pointing it out as one of the things that stands out as
> > > > >>> potentially having
> > > > >>>     API-level impacts to me that doesn't seem 100% fleshed out in
> > the
> > > > >>> doc yet
> > > > >>>     (thought that admittedly may just be my limited understanding
> > at
> > > > >>> this point
> > > > >>>     :-).
> > > > >>>
> > > > >>> -Tyler
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>> On Fri, May 13, 2016 at 10:48 AM Kam Kasravi <
> kamkasravi@gmail.com
> > >
> > > > >> wrote:
> > > > >>>> Hi Tyler - my bad. Comments should be enabled now.
> > > > >>>>
> > > > >>>> On Fri, May 13, 2016 at 10:45 AM, Tyler Akidau
> > > > >>>> <takidau@google.com.invalid
> > > > >>>> wrote:
> > > > >>>>
> > > > >>>>> Thanks a lot, Kam. Can you please enable comment access on the
> > doc?
> > > > >>>>> I
> > > > >>>> seem
> > > > >>>>> to have view access only.
> > > > >>>>>
> > > > >>>>> -Tyler
> > > > >>>>>
> > > > >>>>> On Fri, May 13, 2016 at 9:54 AM Kam Kasravi
> > > > >>>>> <ka...@gmail.com>
> > > > >>>> wrote:
> > > > >>>>>> Hi
> > > > >>>>>>
> > > > >>>>>> A number of readers have made comments on this topic recently.
> > > > >>>>>> We have created a document that does some analysis of common
> > > > >>>>>> ML models and
> > > > >>>>> related
> > > > >>>>>> APIs. We hope this can drive an approach that will result in
> > > > >>>>>> an API, compatibility matrix and involvement from the same
> > > > >>>>>> groups that are implementing transformation runners (spark,
> > > > > flink, etc).
> > > > >>>>>> We welcome comments here or in the document itself.
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>
> > https://docs.google.com/document/d/17cRZk_yqHm3C0fljivjN66MbLkeKS1
> > > > >>>> yjo4
> > > > >>>> PBECHb-xA/edit?usp=sharing
> > > >
> > > >
> > >
> >
>

Re: Fwd: machine learning API, common models

Posted by Henry Saputra <he...@gmail.com>.

@Frances:

that would be probably the way to go IF we decide to have ML in Beam.

@Simone:

I am definitely love to see Beam introduce ML model APIs to abstract and
unifiy all "dataflow" runner frameworks, such as with Flink ML and Spark ML.

However, as you mentioned before, the target audience would be focus on
distributed or ML engineers as you have mentioned.
But I could see we have to then make some out of box ML algorithms (model
train and fine tune) in addition to test the model and APIs.

The expectation would be that these models to be "production" ready, in
which most cases will be used by Data Scientists via some configurations,
since they won't and most can't use Java language.

I would love to see instead more on integration with existing ML frameworks
like XGBoost [1], Mahout Samsara [2], or DL4J [3] for ML APIs and models in
Beam.

Thoughts and comments are definitely welcomed =)

- Henry

[1] https://github.com/dmlc/xgboost
[2] https://mahout.apache.org/users/environment/out-of-core-reference.html
[3] http://deeplearning4j.org
<http://deeplearning4j.org/image-data-pipeline.html#record>


On Sat, May 21, 2016 at 2:01 AM, Simone Robutti <
simone.robutti@radicalbit.io> wrote:

> I think these APIs won't be used by Data Scientists (R, Python) but by
> Machine Learning Engineers (Scala, Java or C++ in different environments)
> and as a ML Engineer it makes a lot of sense to me to have such an API if
> I'm using Beam. It would make a lot more sense to implement algorithms
> directly in Beam but that will come in the future, I hope.
>
> 2016-05-21 0:35 GMT+02:00 Henry Saputra <he...@gmail.com>:
>
> > I am a bit concern about adding ML model APIs to Beam because the
> fluctuate
> > nature of ML landscape and also in reality, most data scientists tend to
> > use Python and R most the work with existing model definition.
> >
> > Even though you could say something like Spark ML is popular, it is
> merely
> > because it is involving Apache Spark rather than quality of the ML module
> > itself.
> >
> > The pipeline and most of the tooling are inspired by scikit-learn, and
> > hence it is relying on familiarity of the library to attract developers.
> >
> > My question is whether fully end to end ML APIs is needed as part of core
> > Beam APIs.
> >
> > - Henry
> >
> > On Thu, May 19, 2016 at 5:46 AM, Jianfeng Qian <qianjianfeng@outlook.com
> >
> > wrote:
> >
> > > Hi,
> > > I am quite interested about this proposal.
> > > it is great to consider a lot of machine learning projects.
> > > Currently, most algorithms of spark mllib are batch processing, while
> > > oryx2 and streamDM focus on real-time machine learning.
> > > And Flink works with SAMOA team to integrate stream mining algorithms,
> > too.
> > > So I wonder is that possible to design A flexible SDK which allow user
> > > to call different third party packages or their own algorithms?
> > >
> > > Best,
> > > Jianfeng
> > >
> > > On 2016年05月17日 22:01, Suneel Marthi wrote:
> > > > Thanks Simone for pointing this out.
> > > >
> > > > On the Apache Mahout project we have distributed linear algebra with
> > > R-like
> > > > semantics that can be executed on Spark/Flink/H2O.
> > > >
> > > > @Kam: the document u point out is old and outdated, the most
> up-to-date
> > > > reference to the Samsara api is the book - 'Apache Mahout: Beyond
> > > > MapReduce". (shameless marketing here on behalf of fellow committers
> > :) )
> > > >
> > > > We added Flink DataSet API in the recent Mahout 0.12.0 release (April
> > 11,
> > > > 2016) and has been called out in my talk at ApacheBigData in
> Vancouver
> > > last
> > > > week.
> > > >
> > > > The Mahout community would definitely be interested in being involved
> > > with
> > > > this and sharing notes.
> > > >
> > > > IMHO, the focus should be first on building a good linalg foundations
> > > > before embarking on building algos and pipelines. Adding @dlyubimov
> to
> > > this.
> > > >
> > > >
> > > >
> > > > ---------- Forwarded message ----------
> > > > From: Simone Robutti <si...@radicalbit.io>
> > > > Date: Tue, May 17, 2016 at 9:48 AM
> > > > Subject: Fwd: machine learning API, common models
> > > > To: Suneel Marthi <sm...@apache.org>
> > > >
> > > >
> > > >
> > > > ---------- Forwarded message ----------
> > > > From: Kavulya, Soila P <so...@intel.com>
> > > > Date: 2016-05-17 1:53 GMT+02:00
> > > > Subject: RE: machine learning API, common models
> > > > To: "dev@beam.incubator.apache.org" <de...@beam.incubator.apache.org>
> > > >
> > > >
> > > > Thanks Simone,
> > > >
> > > > You have raised a valid concern about how different frameworks will
> > have
> > > > different implementations and parameter semantics for the same
> > > algorithm. I
> > > > agree that it is important to keep this in mind. Hopefully, through
> > this
> > > > exercise, we will identify a good set of common ML abstractions
> across
> > > > different frameworks.
> > > >
> > > > Feel free to edit the document. We had limited the first pass of the
> > > > comparison matrix to the machine learning pipeline APIs, but we can
> > > extend
> > > > it to include other ML building blocks like linear algebra
> operations,
> > > and
> > > > APIs for optimizers like gradient descent.
> > > >
> > > > Soila
> > > >
> > > > -----Original Message-----
> > > > From: Kam Kasravi [mailto:kamkasravi@gmail.com]
> > > > Sent: Monday, May 16, 2016 8:22 AM
> > > > To: dev@beam.incubator.apache.org
> > > > Subject: Re: machine learning API, common models
> > > >
> > > > Thanks Simone - yes I had read your concerns on dev and I think
> they're
> > > > well founded.
> > > > Thanks for the samsura reference - I've been looking at the
> spark/scala
> > > > bindings
> > > http://mahout.apache.org/users/sparkbindings/ScalaSparkBindings.pdf
> > > > .
> > > >
> > > > I think we should expand the document to include linear algebraic ops
> > or
> > > > least pay due diligence to it. If you're doing anything on the flink
> > side
> > > > in this regard let us or feel free to suggest edits/updates to the
> > > document.
> > > >
> > > > Thanks
> > > > Kam
> > > >
> > > > On Mon, May 16, 2016 at 6:05 AM, Simone Robutti <
> > > > simone.robutti@radicalbit.io> wrote:
> > > >
> > > >> Hello,
> > > >>
> > > >> I'm Simone and I just began contributing to Flink ML (actually on
> the
> > > >> distributed linalg part). I already expressed my concerns about the
> > > >> idea of an high level API relying on specific frameworks'
> > > implementations:
> > > >> different implementations produce different results and may vary in
> > > >> quality. Also the semantics of parameters may change from one
> > > >> implementation to the other. This could hinder portability and
> > > >> transparency. I believe these problems could be handled paying the
> due
> > > >> attention to the details of every single implementation but I invite
> > > >> you not to underestimate these problems.
> > > >>
> > > >> On the other hand the API in itself looks good to me. From my side,
> I
> > > >> hope to fill some of the gaps in Flink you underlined in the
> > comparison
> > > > matrix.
> > > >> Talking about matrices, proper matrices this time, I believe it
> would
> > > >> be useful to include in this API support for linear algebra
> > operations.
> > > >> Something similar is already present in Mahout's Samsara and it
> looks
> > > >> really good but clearly a similar implementation on Beam would be
> way
> > > >> more interesting and powerful.
> > > >>
> > > >> My 2 cents,
> > > >>
> > > >> Simone
> > > >>
> > > >>
> > > >> 2016-05-14 4:53 GMT+02:00 Kavulya, Soila P <
> soila.p.kavulya@intel.com
> > >:
> > > >>
> > > >>> Hi Tyler,
> > > >>>
> > > >>> Thank you so much for your feedback. I agree that starting with the
> > > >>> high-level API is a good direction. We are interested in Python
> > > >>> because
> > > >> it
> > > >>> is the language that our data scientists are most familiar with. I
> > > >>> think starting with Java would be the best approach, because the
> > > >>> Python API can be a thin wrapper for Java API.
> > > >>>
> > > >>> In Spark, the Scala, Java and Python APIs are identical. Flink does
> > > >>> not have a Python API for ML pipelines at present.
> > > >>>
> > > >>> Could you point me to the updated runner API?
> > > >>>
> > > >>> Soila
> > > >>>
> > > >>> -----Original Message-----
> > > >>> From: Tyler Akidau [mailto:takidau@google.com.INVALID]
> > > >>> Sent: Friday, May 13, 2016 6:34 PM
> > > >>> To: dev@beam.incubator.apache.org
> > > >>> Subject: Re: machine learning API, common models
> > > >>>
> > > >>> Hi Kam & Soila,
> > > >>>
> > > >>> Thanks a lot for writing this up. I ran the doc past some of the
> > > >>> folks who've been doing ML work here at Google, and they were
> > > >>> generally happy with the distillation of common methods in the doc.
> > > >>> I'd be curious to
> > > >> hear
> > > >>> what folks on the Flink- and Spark- runner sides think.
> > > >>>
> > > >>> To me, this seems like a good direction for a high-level API.
> > > >>> Presumably, once a high-level API is in place, we could begin
> > > >>> looking at what it
> > > >> would
> > > >>> take to add lower-level ML algorithm support (e.g. iterative) to
> the
> > > >>> Beam Model. Is this essentially what you're thinking?
> > > >>>
> > > >>> Some more specific questions/comments:
> > > >>>
> > > >>>     - Presumably you'd want to tackle this in Java first, since
> > that's
> > > > the
> > > >>>     only language we currently support? Given that half of your
> > > >>> examples are in
> > > >>>     Python, I'm also assuming Python will be interesting once it's
> > > >>> available.
> > > >>>
> > > >>>     - Along those lines, what languages are represented in the
> > > capability
> > > >>>     matrix? E.g. is Spark ML support as detailed there identical
> > across
> > > >>>     Java/Scala and Python?
> > > >>>
> > > >>>     - Have you thought about how this would tie in at the runner
> > level,
> > > >>>     particularly given the updated Runner API changes that are
> > coming?
> > > > I'm
> > > >>>     assuming they'd be provided as composite transforms that (for
> > > >>> now)
> > > >> would
> > > >>>     have no default implementation, given the lack of low-level
> > > >>> primitives for
> > > >>>     ML algorithms, but am curious what your thoughts are there.
> > > >>>
> > > >>>     - I still don't fully understand how incremental updates due to
> > > model
> > > >>>     drift would tie in at the API level. There's a comment thread
> in
> > > >>> the
> > > >> doc
> > > >>>     still open tracking this, so no need to comment here
> > additionally.
> > > >> Just
> > > >>>     pointing it out as one of the things that stands out as
> > > >>> potentially having
> > > >>>     API-level impacts to me that doesn't seem 100% fleshed out in
> the
> > > >>> doc yet
> > > >>>     (thought that admittedly may just be my limited understanding
> at
> > > >>> this point
> > > >>>     :-).
> > > >>>
> > > >>> -Tyler
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> On Fri, May 13, 2016 at 10:48 AM Kam Kasravi <kamkasravi@gmail.com
> >
> > > >> wrote:
> > > >>>> Hi Tyler - my bad. Comments should be enabled now.
> > > >>>>
> > > >>>> On Fri, May 13, 2016 at 10:45 AM, Tyler Akidau
> > > >>>> <takidau@google.com.invalid
> > > >>>> wrote:
> > > >>>>
> > > >>>>> Thanks a lot, Kam. Can you please enable comment access on the
> doc?
> > > >>>>> I
> > > >>>> seem
> > > >>>>> to have view access only.
> > > >>>>>
> > > >>>>> -Tyler
> > > >>>>>
> > > >>>>> On Fri, May 13, 2016 at 9:54 AM Kam Kasravi
> > > >>>>> <ka...@gmail.com>
> > > >>>> wrote:
> > > >>>>>> Hi
> > > >>>>>>
> > > >>>>>> A number of readers have made comments on this topic recently.
> > > >>>>>> We have created a document that does some analysis of common
> > > >>>>>> ML models and
> > > >>>>> related
> > > >>>>>> APIs. We hope this can drive an approach that will result in
> > > >>>>>> an API, compatibility matrix and involvement from the same
> > > >>>>>> groups that are implementing transformation runners (spark,
> > > > flink, etc).
> > > >>>>>> We welcome comments here or in the document itself.
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>
> https://docs.google.com/document/d/17cRZk_yqHm3C0fljivjN66MbLkeKS1
> > > >>>> yjo4
> > > >>>> PBECHb-xA/edit?usp=sharing
> > >
> > >
> >
>

Re: Fwd: machine learning API, common models

Posted by Simone Robutti <si...@radicalbit.io>.

I think these APIs won't be used by Data Scientists (R, Python) but by
Machine Learning Engineers (Scala, Java or C++ in different environments)
and as a ML Engineer it makes a lot of sense to me to have such an API if
I'm using Beam. It would make a lot more sense to implement algorithms
directly in Beam but that will come in the future, I hope.

2016-05-21 0:35 GMT+02:00 Henry Saputra <he...@gmail.com>:

> I am a bit concern about adding ML model APIs to Beam because the fluctuate
> nature of ML landscape and also in reality, most data scientists tend to
> use Python and R most the work with existing model definition.
>
> Even though you could say something like Spark ML is popular, it is merely
> because it is involving Apache Spark rather than quality of the ML module
> itself.
>
> The pipeline and most of the tooling are inspired by scikit-learn, and
> hence it is relying on familiarity of the library to attract developers.
>
> My question is whether fully end to end ML APIs is needed as part of core
> Beam APIs.
>
> - Henry
>
> On Thu, May 19, 2016 at 5:46 AM, Jianfeng Qian <qi...@outlook.com>
> wrote:
>
> > Hi,
> > I am quite interested about this proposal.
> > it is great to consider a lot of machine learning projects.
> > Currently, most algorithms of spark mllib are batch processing, while
> > oryx2 and streamDM focus on real-time machine learning.
> > And Flink works with SAMOA team to integrate stream mining algorithms,
> too.
> > So I wonder is that possible to design A flexible SDK which allow user
> > to call different third party packages or their own algorithms?
> >
> > Best,
> > Jianfeng
> >
> > On 2016年05月17日 22:01, Suneel Marthi wrote:
> > > Thanks Simone for pointing this out.
> > >
> > > On the Apache Mahout project we have distributed linear algebra with
> > R-like
> > > semantics that can be executed on Spark/Flink/H2O.
> > >
> > > @Kam: the document u point out is old and outdated, the most up-to-date
> > > reference to the Samsara api is the book - 'Apache Mahout: Beyond
> > > MapReduce". (shameless marketing here on behalf of fellow committers
> :) )
> > >
> > > We added Flink DataSet API in the recent Mahout 0.12.0 release (April
> 11,
> > > 2016) and has been called out in my talk at ApacheBigData in Vancouver
> > last
> > > week.
> > >
> > > The Mahout community would definitely be interested in being involved
> > with
> > > this and sharing notes.
> > >
> > > IMHO, the focus should be first on building a good linalg foundations
> > > before embarking on building algos and pipelines. Adding @dlyubimov to
> > this.
> > >
> > >
> > >
> > > ---------- Forwarded message ----------
> > > From: Simone Robutti <si...@radicalbit.io>
> > > Date: Tue, May 17, 2016 at 9:48 AM
> > > Subject: Fwd: machine learning API, common models
> > > To: Suneel Marthi <sm...@apache.org>
> > >
> > >
> > >
> > > ---------- Forwarded message ----------
> > > From: Kavulya, Soila P <so...@intel.com>
> > > Date: 2016-05-17 1:53 GMT+02:00
> > > Subject: RE: machine learning API, common models
> > > To: "dev@beam.incubator.apache.org" <de...@beam.incubator.apache.org>
> > >
> > >
> > > Thanks Simone,
> > >
> > > You have raised a valid concern about how different frameworks will
> have
> > > different implementations and parameter semantics for the same
> > algorithm. I
> > > agree that it is important to keep this in mind. Hopefully, through
> this
> > > exercise, we will identify a good set of common ML abstractions across
> > > different frameworks.
> > >
> > > Feel free to edit the document. We had limited the first pass of the
> > > comparison matrix to the machine learning pipeline APIs, but we can
> > extend
> > > it to include other ML building blocks like linear algebra operations,
> > and
> > > APIs for optimizers like gradient descent.
> > >
> > > Soila
> > >
> > > -----Original Message-----
> > > From: Kam Kasravi [mailto:kamkasravi@gmail.com]
> > > Sent: Monday, May 16, 2016 8:22 AM
> > > To: dev@beam.incubator.apache.org
> > > Subject: Re: machine learning API, common models
> > >
> > > Thanks Simone - yes I had read your concerns on dev and I think they're
> > > well founded.
> > > Thanks for the samsura reference - I've been looking at the spark/scala
> > > bindings
> > http://mahout.apache.org/users/sparkbindings/ScalaSparkBindings.pdf
> > > .
> > >
> > > I think we should expand the document to include linear algebraic ops
> or
> > > least pay due diligence to it. If you're doing anything on the flink
> side
> > > in this regard let us or feel free to suggest edits/updates to the
> > document.
> > >
> > > Thanks
> > > Kam
> > >
> > > On Mon, May 16, 2016 at 6:05 AM, Simone Robutti <
> > > simone.robutti@radicalbit.io> wrote:
> > >
> > >> Hello,
> > >>
> > >> I'm Simone and I just began contributing to Flink ML (actually on the
> > >> distributed linalg part). I already expressed my concerns about the
> > >> idea of an high level API relying on specific frameworks'
> > implementations:
> > >> different implementations produce different results and may vary in
> > >> quality. Also the semantics of parameters may change from one
> > >> implementation to the other. This could hinder portability and
> > >> transparency. I believe these problems could be handled paying the due
> > >> attention to the details of every single implementation but I invite
> > >> you not to underestimate these problems.
> > >>
> > >> On the other hand the API in itself looks good to me. From my side, I
> > >> hope to fill some of the gaps in Flink you underlined in the
> comparison
> > > matrix.
> > >> Talking about matrices, proper matrices this time, I believe it would
> > >> be useful to include in this API support for linear algebra
> operations.
> > >> Something similar is already present in Mahout's Samsara and it looks
> > >> really good but clearly a similar implementation on Beam would be way
> > >> more interesting and powerful.
> > >>
> > >> My 2 cents,
> > >>
> > >> Simone
> > >>
> > >>
> > >> 2016-05-14 4:53 GMT+02:00 Kavulya, Soila P <soila.p.kavulya@intel.com
> >:
> > >>
> > >>> Hi Tyler,
> > >>>
> > >>> Thank you so much for your feedback. I agree that starting with the
> > >>> high-level API is a good direction. We are interested in Python
> > >>> because
> > >> it
> > >>> is the language that our data scientists are most familiar with. I
> > >>> think starting with Java would be the best approach, because the
> > >>> Python API can be a thin wrapper for Java API.
> > >>>
> > >>> In Spark, the Scala, Java and Python APIs are identical. Flink does
> > >>> not have a Python API for ML pipelines at present.
> > >>>
> > >>> Could you point me to the updated runner API?
> > >>>
> > >>> Soila
> > >>>
> > >>> -----Original Message-----
> > >>> From: Tyler Akidau [mailto:takidau@google.com.INVALID]
> > >>> Sent: Friday, May 13, 2016 6:34 PM
> > >>> To: dev@beam.incubator.apache.org
> > >>> Subject: Re: machine learning API, common models
> > >>>
> > >>> Hi Kam & Soila,
> > >>>
> > >>> Thanks a lot for writing this up. I ran the doc past some of the
> > >>> folks who've been doing ML work here at Google, and they were
> > >>> generally happy with the distillation of common methods in the doc.
> > >>> I'd be curious to
> > >> hear
> > >>> what folks on the Flink- and Spark- runner sides think.
> > >>>
> > >>> To me, this seems like a good direction for a high-level API.
> > >>> Presumably, once a high-level API is in place, we could begin
> > >>> looking at what it
> > >> would
> > >>> take to add lower-level ML algorithm support (e.g. iterative) to the
> > >>> Beam Model. Is this essentially what you're thinking?
> > >>>
> > >>> Some more specific questions/comments:
> > >>>
> > >>>     - Presumably you'd want to tackle this in Java first, since
> that's
> > > the
> > >>>     only language we currently support? Given that half of your
> > >>> examples are in
> > >>>     Python, I'm also assuming Python will be interesting once it's
> > >>> available.
> > >>>
> > >>>     - Along those lines, what languages are represented in the
> > capability
> > >>>     matrix? E.g. is Spark ML support as detailed there identical
> across
> > >>>     Java/Scala and Python?
> > >>>
> > >>>     - Have you thought about how this would tie in at the runner
> level,
> > >>>     particularly given the updated Runner API changes that are
> coming?
> > > I'm
> > >>>     assuming they'd be provided as composite transforms that (for
> > >>> now)
> > >> would
> > >>>     have no default implementation, given the lack of low-level
> > >>> primitives for
> > >>>     ML algorithms, but am curious what your thoughts are there.
> > >>>
> > >>>     - I still don't fully understand how incremental updates due to
> > model
> > >>>     drift would tie in at the API level. There's a comment thread in
> > >>> the
> > >> doc
> > >>>     still open tracking this, so no need to comment here
> additionally.
> > >> Just
> > >>>     pointing it out as one of the things that stands out as
> > >>> potentially having
> > >>>     API-level impacts to me that doesn't seem 100% fleshed out in the
> > >>> doc yet
> > >>>     (thought that admittedly may just be my limited understanding at
> > >>> this point
> > >>>     :-).
> > >>>
> > >>> -Tyler
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Fri, May 13, 2016 at 10:48 AM Kam Kasravi <ka...@gmail.com>
> > >> wrote:
> > >>>> Hi Tyler - my bad. Comments should be enabled now.
> > >>>>
> > >>>> On Fri, May 13, 2016 at 10:45 AM, Tyler Akidau
> > >>>> <takidau@google.com.invalid
> > >>>> wrote:
> > >>>>
> > >>>>> Thanks a lot, Kam. Can you please enable comment access on the doc?
> > >>>>> I
> > >>>> seem
> > >>>>> to have view access only.
> > >>>>>
> > >>>>> -Tyler
> > >>>>>
> > >>>>> On Fri, May 13, 2016 at 9:54 AM Kam Kasravi
> > >>>>> <ka...@gmail.com>
> > >>>> wrote:
> > >>>>>> Hi
> > >>>>>>
> > >>>>>> A number of readers have made comments on this topic recently.
> > >>>>>> We have created a document that does some analysis of common
> > >>>>>> ML models and
> > >>>>> related
> > >>>>>> APIs. We hope this can drive an approach that will result in
> > >>>>>> an API, compatibility matrix and involvement from the same
> > >>>>>> groups that are implementing transformation runners (spark,
> > > flink, etc).
> > >>>>>> We welcome comments here or in the document itself.
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>> https://docs.google.com/document/d/17cRZk_yqHm3C0fljivjN66MbLkeKS1
> > >>>> yjo4
> > >>>> PBECHb-xA/edit?usp=sharing
> >
> >
>

Re: Fwd: machine learning API, common models

Posted by Frances Perry <fj...@google.com.INVALID>.

We could have a module with a library of PTransforms (similar to the join
library in extensions) -- so it wouldn't be part of the core / required SDK.



On Fri, May 20, 2016 at 3:35 PM, Henry Saputra <he...@gmail.com>
wrote:

> I am a bit concern about adding ML model APIs to Beam because the fluctuate
> nature of ML landscape and also in reality, most data scientists tend to
> use Python and R most the work with existing model definition.
>
> Even though you could say something like Spark ML is popular, it is merely
> because it is involving Apache Spark rather than quality of the ML module
> itself.
>
> The pipeline and most of the tooling are inspired by scikit-learn, and
> hence it is relying on familiarity of the library to attract developers.
>
> My question is whether fully end to end ML APIs is needed as part of core
> Beam APIs.
>
> - Henry
>
> On Thu, May 19, 2016 at 5:46 AM, Jianfeng Qian <qi...@outlook.com>
> wrote:
>
> > Hi,
> > I am quite interested about this proposal.
> > it is great to consider a lot of machine learning projects.
> > Currently, most algorithms of spark mllib are batch processing, while
> > oryx2 and streamDM focus on real-time machine learning.
> > And Flink works with SAMOA team to integrate stream mining algorithms,
> too.
> > So I wonder is that possible to design A flexible SDK which allow user
> > to call different third party packages or their own algorithms?
> >
> > Best,
> > Jianfeng
> >
> > On 2016年05月17日 22:01, Suneel Marthi wrote:
> > > Thanks Simone for pointing this out.
> > >
> > > On the Apache Mahout project we have distributed linear algebra with
> > R-like
> > > semantics that can be executed on Spark/Flink/H2O.
> > >
> > > @Kam: the document u point out is old and outdated, the most up-to-date
> > > reference to the Samsara api is the book - 'Apache Mahout: Beyond
> > > MapReduce". (shameless marketing here on behalf of fellow committers
> :) )
> > >
> > > We added Flink DataSet API in the recent Mahout 0.12.0 release (April
> 11,
> > > 2016) and has been called out in my talk at ApacheBigData in Vancouver
> > last
> > > week.
> > >
> > > The Mahout community would definitely be interested in being involved
> > with
> > > this and sharing notes.
> > >
> > > IMHO, the focus should be first on building a good linalg foundations
> > > before embarking on building algos and pipelines. Adding @dlyubimov to
> > this.
> > >
> > >
> > >
> > > ---------- Forwarded message ----------
> > > From: Simone Robutti <si...@radicalbit.io>
> > > Date: Tue, May 17, 2016 at 9:48 AM
> > > Subject: Fwd: machine learning API, common models
> > > To: Suneel Marthi <sm...@apache.org>
> > >
> > >
> > >
> > > ---------- Forwarded message ----------
> > > From: Kavulya, Soila P <so...@intel.com>
> > > Date: 2016-05-17 1:53 GMT+02:00
> > > Subject: RE: machine learning API, common models
> > > To: "dev@beam.incubator.apache.org" <de...@beam.incubator.apache.org>
> > >
> > >
> > > Thanks Simone,
> > >
> > > You have raised a valid concern about how different frameworks will
> have
> > > different implementations and parameter semantics for the same
> > algorithm. I
> > > agree that it is important to keep this in mind. Hopefully, through
> this
> > > exercise, we will identify a good set of common ML abstractions across
> > > different frameworks.
> > >
> > > Feel free to edit the document. We had limited the first pass of the
> > > comparison matrix to the machine learning pipeline APIs, but we can
> > extend
> > > it to include other ML building blocks like linear algebra operations,
> > and
> > > APIs for optimizers like gradient descent.
> > >
> > > Soila
> > >
> > > -----Original Message-----
> > > From: Kam Kasravi [mailto:kamkasravi@gmail.com]
> > > Sent: Monday, May 16, 2016 8:22 AM
> > > To: dev@beam.incubator.apache.org
> > > Subject: Re: machine learning API, common models
> > >
> > > Thanks Simone - yes I had read your concerns on dev and I think they're
> > > well founded.
> > > Thanks for the samsura reference - I've been looking at the spark/scala
> > > bindings
> > http://mahout.apache.org/users/sparkbindings/ScalaSparkBindings.pdf
> > > .
> > >
> > > I think we should expand the document to include linear algebraic ops
> or
> > > least pay due diligence to it. If you're doing anything on the flink
> side
> > > in this regard let us or feel free to suggest edits/updates to the
> > document.
> > >
> > > Thanks
> > > Kam
> > >
> > > On Mon, May 16, 2016 at 6:05 AM, Simone Robutti <
> > > simone.robutti@radicalbit.io> wrote:
> > >
> > >> Hello,
> > >>
> > >> I'm Simone and I just began contributing to Flink ML (actually on the
> > >> distributed linalg part). I already expressed my concerns about the
> > >> idea of an high level API relying on specific frameworks'
> > implementations:
> > >> different implementations produce different results and may vary in
> > >> quality. Also the semantics of parameters may change from one
> > >> implementation to the other. This could hinder portability and
> > >> transparency. I believe these problems could be handled paying the due
> > >> attention to the details of every single implementation but I invite
> > >> you not to underestimate these problems.
> > >>
> > >> On the other hand the API in itself looks good to me. From my side, I
> > >> hope to fill some of the gaps in Flink you underlined in the
> comparison
> > > matrix.
> > >> Talking about matrices, proper matrices this time, I believe it would
> > >> be useful to include in this API support for linear algebra
> operations.
> > >> Something similar is already present in Mahout's Samsara and it looks
> > >> really good but clearly a similar implementation on Beam would be way
> > >> more interesting and powerful.
> > >>
> > >> My 2 cents,
> > >>
> > >> Simone
> > >>
> > >>
> > >> 2016-05-14 4:53 GMT+02:00 Kavulya, Soila P <soila.p.kavulya@intel.com
> >:
> > >>
> > >>> Hi Tyler,
> > >>>
> > >>> Thank you so much for your feedback. I agree that starting with the
> > >>> high-level API is a good direction. We are interested in Python
> > >>> because
> > >> it
> > >>> is the language that our data scientists are most familiar with. I
> > >>> think starting with Java would be the best approach, because the
> > >>> Python API can be a thin wrapper for Java API.
> > >>>
> > >>> In Spark, the Scala, Java and Python APIs are identical. Flink does
> > >>> not have a Python API for ML pipelines at present.
> > >>>
> > >>> Could you point me to the updated runner API?
> > >>>
> > >>> Soila
> > >>>
> > >>> -----Original Message-----
> > >>> From: Tyler Akidau [mailto:takidau@google.com.INVALID]
> > >>> Sent: Friday, May 13, 2016 6:34 PM
> > >>> To: dev@beam.incubator.apache.org
> > >>> Subject: Re: machine learning API, common models
> > >>>
> > >>> Hi Kam & Soila,
> > >>>
> > >>> Thanks a lot for writing this up. I ran the doc past some of the
> > >>> folks who've been doing ML work here at Google, and they were
> > >>> generally happy with the distillation of common methods in the doc.
> > >>> I'd be curious to
> > >> hear
> > >>> what folks on the Flink- and Spark- runner sides think.
> > >>>
> > >>> To me, this seems like a good direction for a high-level API.
> > >>> Presumably, once a high-level API is in place, we could begin
> > >>> looking at what it
> > >> would
> > >>> take to add lower-level ML algorithm support (e.g. iterative) to the
> > >>> Beam Model. Is this essentially what you're thinking?
> > >>>
> > >>> Some more specific questions/comments:
> > >>>
> > >>>     - Presumably you'd want to tackle this in Java first, since
> that's
> > > the
> > >>>     only language we currently support? Given that half of your
> > >>> examples are in
> > >>>     Python, I'm also assuming Python will be interesting once it's
> > >>> available.
> > >>>
> > >>>     - Along those lines, what languages are represented in the
> > capability
> > >>>     matrix? E.g. is Spark ML support as detailed there identical
> across
> > >>>     Java/Scala and Python?
> > >>>
> > >>>     - Have you thought about how this would tie in at the runner
> level,
> > >>>     particularly given the updated Runner API changes that are
> coming?
> > > I'm
> > >>>     assuming they'd be provided as composite transforms that (for
> > >>> now)
> > >> would
> > >>>     have no default implementation, given the lack of low-level
> > >>> primitives for
> > >>>     ML algorithms, but am curious what your thoughts are there.
> > >>>
> > >>>     - I still don't fully understand how incremental updates due to
> > model
> > >>>     drift would tie in at the API level. There's a comment thread in
> > >>> the
> > >> doc
> > >>>     still open tracking this, so no need to comment here
> additionally.
> > >> Just
> > >>>     pointing it out as one of the things that stands out as
> > >>> potentially having
> > >>>     API-level impacts to me that doesn't seem 100% fleshed out in the
> > >>> doc yet
> > >>>     (thought that admittedly may just be my limited understanding at
> > >>> this point
> > >>>     :-).
> > >>>
> > >>> -Tyler
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Fri, May 13, 2016 at 10:48 AM Kam Kasravi <ka...@gmail.com>
> > >> wrote:
> > >>>> Hi Tyler - my bad. Comments should be enabled now.
> > >>>>
> > >>>> On Fri, May 13, 2016 at 10:45 AM, Tyler Akidau
> > >>>> <takidau@google.com.invalid
> > >>>> wrote:
> > >>>>
> > >>>>> Thanks a lot, Kam. Can you please enable comment access on the doc?
> > >>>>> I
> > >>>> seem
> > >>>>> to have view access only.
> > >>>>>
> > >>>>> -Tyler
> > >>>>>
> > >>>>> On Fri, May 13, 2016 at 9:54 AM Kam Kasravi
> > >>>>> <ka...@gmail.com>
> > >>>> wrote:
> > >>>>>> Hi
> > >>>>>>
> > >>>>>> A number of readers have made comments on this topic recently.
> > >>>>>> We have created a document that does some analysis of common
> > >>>>>> ML models and
> > >>>>> related
> > >>>>>> APIs. We hope this can drive an approach that will result in
> > >>>>>> an API, compatibility matrix and involvement from the same
> > >>>>>> groups that are implementing transformation runners (spark,
> > > flink, etc).
> > >>>>>> We welcome comments here or in the document itself.
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>> https://docs.google.com/document/d/17cRZk_yqHm3C0fljivjN66MbLkeKS1
> > >>>> yjo4
> > >>>> PBECHb-xA/edit?usp=sharing
> >
> >
>

Re: Fwd: machine learning API, common models

Posted by Henry Saputra <he...@gmail.com>.

I am a bit concern about adding ML model APIs to Beam because the fluctuate
nature of ML landscape and also in reality, most data scientists tend to
use Python and R most the work with existing model definition.

Even though you could say something like Spark ML is popular, it is merely
because it is involving Apache Spark rather than quality of the ML module
itself.

The pipeline and most of the tooling are inspired by scikit-learn, and
hence it is relying on familiarity of the library to attract developers.

My question is whether fully end to end ML APIs is needed as part of core
Beam APIs.

- Henry

On Thu, May 19, 2016 at 5:46 AM, Jianfeng Qian <qi...@outlook.com>
wrote:

> Hi,
> I am quite interested about this proposal.
> it is great to consider a lot of machine learning projects.
> Currently, most algorithms of spark mllib are batch processing, while
> oryx2 and streamDM focus on real-time machine learning.
> And Flink works with SAMOA team to integrate stream mining algorithms, too.
> So I wonder is that possible to design A flexible SDK which allow user
> to call different third party packages or their own algorithms?
>
> Best,
> Jianfeng
>
> On 2016年05月17日 22:01, Suneel Marthi wrote:
> > Thanks Simone for pointing this out.
> >
> > On the Apache Mahout project we have distributed linear algebra with
> R-like
> > semantics that can be executed on Spark/Flink/H2O.
> >
> > @Kam: the document u point out is old and outdated, the most up-to-date
> > reference to the Samsara api is the book - 'Apache Mahout: Beyond
> > MapReduce". (shameless marketing here on behalf of fellow committers :) )
> >
> > We added Flink DataSet API in the recent Mahout 0.12.0 release (April 11,
> > 2016) and has been called out in my talk at ApacheBigData in Vancouver
> last
> > week.
> >
> > The Mahout community would definitely be interested in being involved
> with
> > this and sharing notes.
> >
> > IMHO, the focus should be first on building a good linalg foundations
> > before embarking on building algos and pipelines. Adding @dlyubimov to
> this.
> >
> >
> >
> > ---------- Forwarded message ----------
> > From: Simone Robutti <si...@radicalbit.io>
> > Date: Tue, May 17, 2016 at 9:48 AM
> > Subject: Fwd: machine learning API, common models
> > To: Suneel Marthi <sm...@apache.org>
> >
> >
> >
> > ---------- Forwarded message ----------
> > From: Kavulya, Soila P <so...@intel.com>
> > Date: 2016-05-17 1:53 GMT+02:00
> > Subject: RE: machine learning API, common models
> > To: "dev@beam.incubator.apache.org" <de...@beam.incubator.apache.org>
> >
> >
> > Thanks Simone,
> >
> > You have raised a valid concern about how different frameworks will have
> > different implementations and parameter semantics for the same
> algorithm. I
> > agree that it is important to keep this in mind. Hopefully, through this
> > exercise, we will identify a good set of common ML abstractions across
> > different frameworks.
> >
> > Feel free to edit the document. We had limited the first pass of the
> > comparison matrix to the machine learning pipeline APIs, but we can
> extend
> > it to include other ML building blocks like linear algebra operations,
> and
> > APIs for optimizers like gradient descent.
> >
> > Soila
> >
> > -----Original Message-----
> > From: Kam Kasravi [mailto:kamkasravi@gmail.com]
> > Sent: Monday, May 16, 2016 8:22 AM
> > To: dev@beam.incubator.apache.org
> > Subject: Re: machine learning API, common models
> >
> > Thanks Simone - yes I had read your concerns on dev and I think they're
> > well founded.
> > Thanks for the samsura reference - I've been looking at the spark/scala
> > bindings
> http://mahout.apache.org/users/sparkbindings/ScalaSparkBindings.pdf
> > .
> >
> > I think we should expand the document to include linear algebraic ops or
> > least pay due diligence to it. If you're doing anything on the flink side
> > in this regard let us or feel free to suggest edits/updates to the
> document.
> >
> > Thanks
> > Kam
> >
> > On Mon, May 16, 2016 at 6:05 AM, Simone Robutti <
> > simone.robutti@radicalbit.io> wrote:
> >
> >> Hello,
> >>
> >> I'm Simone and I just began contributing to Flink ML (actually on the
> >> distributed linalg part). I already expressed my concerns about the
> >> idea of an high level API relying on specific frameworks'
> implementations:
> >> different implementations produce different results and may vary in
> >> quality. Also the semantics of parameters may change from one
> >> implementation to the other. This could hinder portability and
> >> transparency. I believe these problems could be handled paying the due
> >> attention to the details of every single implementation but I invite
> >> you not to underestimate these problems.
> >>
> >> On the other hand the API in itself looks good to me. From my side, I
> >> hope to fill some of the gaps in Flink you underlined in the comparison
> > matrix.
> >> Talking about matrices, proper matrices this time, I believe it would
> >> be useful to include in this API support for linear algebra operations.
> >> Something similar is already present in Mahout's Samsara and it looks
> >> really good but clearly a similar implementation on Beam would be way
> >> more interesting and powerful.
> >>
> >> My 2 cents,
> >>
> >> Simone
> >>
> >>
> >> 2016-05-14 4:53 GMT+02:00 Kavulya, Soila P <so...@intel.com>:
> >>
> >>> Hi Tyler,
> >>>
> >>> Thank you so much for your feedback. I agree that starting with the
> >>> high-level API is a good direction. We are interested in Python
> >>> because
> >> it
> >>> is the language that our data scientists are most familiar with. I
> >>> think starting with Java would be the best approach, because the
> >>> Python API can be a thin wrapper for Java API.
> >>>
> >>> In Spark, the Scala, Java and Python APIs are identical. Flink does
> >>> not have a Python API for ML pipelines at present.
> >>>
> >>> Could you point me to the updated runner API?
> >>>
> >>> Soila
> >>>
> >>> -----Original Message-----
> >>> From: Tyler Akidau [mailto:takidau@google.com.INVALID]
> >>> Sent: Friday, May 13, 2016 6:34 PM
> >>> To: dev@beam.incubator.apache.org
> >>> Subject: Re: machine learning API, common models
> >>>
> >>> Hi Kam & Soila,
> >>>
> >>> Thanks a lot for writing this up. I ran the doc past some of the
> >>> folks who've been doing ML work here at Google, and they were
> >>> generally happy with the distillation of common methods in the doc.
> >>> I'd be curious to
> >> hear
> >>> what folks on the Flink- and Spark- runner sides think.
> >>>
> >>> To me, this seems like a good direction for a high-level API.
> >>> Presumably, once a high-level API is in place, we could begin
> >>> looking at what it
> >> would
> >>> take to add lower-level ML algorithm support (e.g. iterative) to the
> >>> Beam Model. Is this essentially what you're thinking?
> >>>
> >>> Some more specific questions/comments:
> >>>
> >>>     - Presumably you'd want to tackle this in Java first, since that's
> > the
> >>>     only language we currently support? Given that half of your
> >>> examples are in
> >>>     Python, I'm also assuming Python will be interesting once it's
> >>> available.
> >>>
> >>>     - Along those lines, what languages are represented in the
> capability
> >>>     matrix? E.g. is Spark ML support as detailed there identical across
> >>>     Java/Scala and Python?
> >>>
> >>>     - Have you thought about how this would tie in at the runner level,
> >>>     particularly given the updated Runner API changes that are coming?
> > I'm
> >>>     assuming they'd be provided as composite transforms that (for
> >>> now)
> >> would
> >>>     have no default implementation, given the lack of low-level
> >>> primitives for
> >>>     ML algorithms, but am curious what your thoughts are there.
> >>>
> >>>     - I still don't fully understand how incremental updates due to
> model
> >>>     drift would tie in at the API level. There's a comment thread in
> >>> the
> >> doc
> >>>     still open tracking this, so no need to comment here additionally.
> >> Just
> >>>     pointing it out as one of the things that stands out as
> >>> potentially having
> >>>     API-level impacts to me that doesn't seem 100% fleshed out in the
> >>> doc yet
> >>>     (thought that admittedly may just be my limited understanding at
> >>> this point
> >>>     :-).
> >>>
> >>> -Tyler
> >>>
> >>>
> >>>
> >>>
> >>> On Fri, May 13, 2016 at 10:48 AM Kam Kasravi <ka...@gmail.com>
> >> wrote:
> >>>> Hi Tyler - my bad. Comments should be enabled now.
> >>>>
> >>>> On Fri, May 13, 2016 at 10:45 AM, Tyler Akidau
> >>>> <takidau@google.com.invalid
> >>>> wrote:
> >>>>
> >>>>> Thanks a lot, Kam. Can you please enable comment access on the doc?
> >>>>> I
> >>>> seem
> >>>>> to have view access only.
> >>>>>
> >>>>> -Tyler
> >>>>>
> >>>>> On Fri, May 13, 2016 at 9:54 AM Kam Kasravi
> >>>>> <ka...@gmail.com>
> >>>> wrote:
> >>>>>> Hi
> >>>>>>
> >>>>>> A number of readers have made comments on this topic recently.
> >>>>>> We have created a document that does some analysis of common
> >>>>>> ML models and
> >>>>> related
> >>>>>> APIs. We hope this can drive an approach that will result in
> >>>>>> an API, compatibility matrix and involvement from the same
> >>>>>> groups that are implementing transformation runners (spark,
> > flink, etc).
> >>>>>> We welcome comments here or in the document itself.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>> https://docs.google.com/document/d/17cRZk_yqHm3C0fljivjN66MbLkeKS1
> >>>> yjo4
> >>>> PBECHb-xA/edit?usp=sharing
>
>

Re: Fwd: machine learning API, common models

Posted by Jianfeng Qian <qi...@outlook.com>.

Hi,
I am quite interested about this proposal.
it is great to consider a lot of machine learning projects.
Currently, most algorithms of spark mllib are batch processing, while  
oryx2 and streamDM focus on real-time machine learning.
And Flink works with SAMOA team to integrate stream mining algorithms, too.
So I wonder is that possible to design A flexible SDK which allow user 
to call different third party packages or their own algorithms?

Best,
Jianfeng

On 2016年05月17日 22:01, Suneel Marthi wrote:
> Thanks Simone for pointing this out.
>
> On the Apache Mahout project we have distributed linear algebra with R-like
> semantics that can be executed on Spark/Flink/H2O.
>
> @Kam: the document u point out is old and outdated, the most up-to-date
> reference to the Samsara api is the book - 'Apache Mahout: Beyond
> MapReduce". (shameless marketing here on behalf of fellow committers :) )
>
> We added Flink DataSet API in the recent Mahout 0.12.0 release (April 11,
> 2016) and has been called out in my talk at ApacheBigData in Vancouver last
> week.
>
> The Mahout community would definitely be interested in being involved with
> this and sharing notes.
>
> IMHO, the focus should be first on building a good linalg foundations
> before embarking on building algos and pipelines. Adding @dlyubimov to this.
>
>
>
> ---------- Forwarded message ----------
> From: Simone Robutti <si...@radicalbit.io>
> Date: Tue, May 17, 2016 at 9:48 AM
> Subject: Fwd: machine learning API, common models
> To: Suneel Marthi <sm...@apache.org>
>
>
>
> ---------- Forwarded message ----------
> From: Kavulya, Soila P <so...@intel.com>
> Date: 2016-05-17 1:53 GMT+02:00
> Subject: RE: machine learning API, common models
> To: "dev@beam.incubator.apache.org" <de...@beam.incubator.apache.org>
>
>
> Thanks Simone,
>
> You have raised a valid concern about how different frameworks will have
> different implementations and parameter semantics for the same algorithm. I
> agree that it is important to keep this in mind. Hopefully, through this
> exercise, we will identify a good set of common ML abstractions across
> different frameworks.
>
> Feel free to edit the document. We had limited the first pass of the
> comparison matrix to the machine learning pipeline APIs, but we can extend
> it to include other ML building blocks like linear algebra operations, and
> APIs for optimizers like gradient descent.
>
> Soila
>
> -----Original Message-----
> From: Kam Kasravi [mailto:kamkasravi@gmail.com]
> Sent: Monday, May 16, 2016 8:22 AM
> To: dev@beam.incubator.apache.org
> Subject: Re: machine learning API, common models
>
> Thanks Simone - yes I had read your concerns on dev and I think they're
> well founded.
> Thanks for the samsura reference - I've been looking at the spark/scala
> bindings http://mahout.apache.org/users/sparkbindings/ScalaSparkBindings.pdf
> .
>
> I think we should expand the document to include linear algebraic ops or
> least pay due diligence to it. If you're doing anything on the flink side
> in this regard let us or feel free to suggest edits/updates to the document.
>
> Thanks
> Kam
>
> On Mon, May 16, 2016 at 6:05 AM, Simone Robutti <
> simone.robutti@radicalbit.io> wrote:
>
>> Hello,
>>
>> I'm Simone and I just began contributing to Flink ML (actually on the
>> distributed linalg part). I already expressed my concerns about the
>> idea of an high level API relying on specific frameworks' implementations:
>> different implementations produce different results and may vary in
>> quality. Also the semantics of parameters may change from one
>> implementation to the other. This could hinder portability and
>> transparency. I believe these problems could be handled paying the due
>> attention to the details of every single implementation but I invite
>> you not to underestimate these problems.
>>
>> On the other hand the API in itself looks good to me. From my side, I
>> hope to fill some of the gaps in Flink you underlined in the comparison
> matrix.
>> Talking about matrices, proper matrices this time, I believe it would
>> be useful to include in this API support for linear algebra operations.
>> Something similar is already present in Mahout's Samsara and it looks
>> really good but clearly a similar implementation on Beam would be way
>> more interesting and powerful.
>>
>> My 2 cents,
>>
>> Simone
>>
>>
>> 2016-05-14 4:53 GMT+02:00 Kavulya, Soila P <so...@intel.com>:
>>
>>> Hi Tyler,
>>>
>>> Thank you so much for your feedback. I agree that starting with the
>>> high-level API is a good direction. We are interested in Python
>>> because
>> it
>>> is the language that our data scientists are most familiar with. I
>>> think starting with Java would be the best approach, because the
>>> Python API can be a thin wrapper for Java API.
>>>
>>> In Spark, the Scala, Java and Python APIs are identical. Flink does
>>> not have a Python API for ML pipelines at present.
>>>
>>> Could you point me to the updated runner API?
>>>
>>> Soila
>>>
>>> -----Original Message-----
>>> From: Tyler Akidau [mailto:takidau@google.com.INVALID]
>>> Sent: Friday, May 13, 2016 6:34 PM
>>> To: dev@beam.incubator.apache.org
>>> Subject: Re: machine learning API, common models
>>>
>>> Hi Kam & Soila,
>>>
>>> Thanks a lot for writing this up. I ran the doc past some of the
>>> folks who've been doing ML work here at Google, and they were
>>> generally happy with the distillation of common methods in the doc.
>>> I'd be curious to
>> hear
>>> what folks on the Flink- and Spark- runner sides think.
>>>
>>> To me, this seems like a good direction for a high-level API.
>>> Presumably, once a high-level API is in place, we could begin
>>> looking at what it
>> would
>>> take to add lower-level ML algorithm support (e.g. iterative) to the
>>> Beam Model. Is this essentially what you're thinking?
>>>
>>> Some more specific questions/comments:
>>>
>>>     - Presumably you'd want to tackle this in Java first, since that's
> the
>>>     only language we currently support? Given that half of your
>>> examples are in
>>>     Python, I'm also assuming Python will be interesting once it's
>>> available.
>>>
>>>     - Along those lines, what languages are represented in the capability
>>>     matrix? E.g. is Spark ML support as detailed there identical across
>>>     Java/Scala and Python?
>>>
>>>     - Have you thought about how this would tie in at the runner level,
>>>     particularly given the updated Runner API changes that are coming?
> I'm
>>>     assuming they'd be provided as composite transforms that (for
>>> now)
>> would
>>>     have no default implementation, given the lack of low-level
>>> primitives for
>>>     ML algorithms, but am curious what your thoughts are there.
>>>
>>>     - I still don't fully understand how incremental updates due to model
>>>     drift would tie in at the API level. There's a comment thread in
>>> the
>> doc
>>>     still open tracking this, so no need to comment here additionally.
>> Just
>>>     pointing it out as one of the things that stands out as
>>> potentially having
>>>     API-level impacts to me that doesn't seem 100% fleshed out in the
>>> doc yet
>>>     (thought that admittedly may just be my limited understanding at
>>> this point
>>>     :-).
>>>
>>> -Tyler
>>>
>>>
>>>
>>>
>>> On Fri, May 13, 2016 at 10:48 AM Kam Kasravi <ka...@gmail.com>
>> wrote:
>>>> Hi Tyler - my bad. Comments should be enabled now.
>>>>
>>>> On Fri, May 13, 2016 at 10:45 AM, Tyler Akidau
>>>> <takidau@google.com.invalid
>>>> wrote:
>>>>
>>>>> Thanks a lot, Kam. Can you please enable comment access on the doc?
>>>>> I
>>>> seem
>>>>> to have view access only.
>>>>>
>>>>> -Tyler
>>>>>
>>>>> On Fri, May 13, 2016 at 9:54 AM Kam Kasravi
>>>>> <ka...@gmail.com>
>>>> wrote:
>>>>>> Hi
>>>>>>
>>>>>> A number of readers have made comments on this topic recently.
>>>>>> We have created a document that does some analysis of common
>>>>>> ML models and
>>>>> related
>>>>>> APIs. We hope this can drive an approach that will result in
>>>>>> an API, compatibility matrix and involvement from the same
>>>>>> groups that are implementing transformation runners (spark,
> flink, etc).
>>>>>> We welcome comments here or in the document itself.
>>>>>>
>>>>>>
>>>>>>
>>>> https://docs.google.com/document/d/17cRZk_yqHm3C0fljivjN66MbLkeKS1
>>>> yjo4
>>>> PBECHb-xA/edit?usp=sharing

Fwd: machine learning API, common models

Posted by Suneel Marthi <sm...@apache.org>.

Thanks Simone for pointing this out.

On the Apache Mahout project we have distributed linear algebra with R-like
semantics that can be executed on Spark/Flink/H2O.

@Kam: the document u point out is old and outdated, the most up-to-date
reference to the Samsara api is the book - 'Apache Mahout: Beyond
MapReduce". (shameless marketing here on behalf of fellow committers :) )

We added Flink DataSet API in the recent Mahout 0.12.0 release (April 11,
2016) and has been called out in my talk at ApacheBigData in Vancouver last
week.

The Mahout community would definitely be interested in being involved with
this and sharing notes.

IMHO, the focus should be first on building a good linalg foundations
before embarking on building algos and pipelines. Adding @dlyubimov to this.



---------- Forwarded message ----------
From: Simone Robutti <si...@radicalbit.io>
Date: Tue, May 17, 2016 at 9:48 AM
Subject: Fwd: machine learning API, common models
To: Suneel Marthi <sm...@apache.org>



---------- Forwarded message ----------
From: Kavulya, Soila P <so...@intel.com>
Date: 2016-05-17 1:53 GMT+02:00
Subject: RE: machine learning API, common models
To: "dev@beam.incubator.apache.org" <de...@beam.incubator.apache.org>


Thanks Simone,

You have raised a valid concern about how different frameworks will have
different implementations and parameter semantics for the same algorithm. I
agree that it is important to keep this in mind. Hopefully, through this
exercise, we will identify a good set of common ML abstractions across
different frameworks.

Feel free to edit the document. We had limited the first pass of the
comparison matrix to the machine learning pipeline APIs, but we can extend
it to include other ML building blocks like linear algebra operations, and
APIs for optimizers like gradient descent.

Soila

-----Original Message-----
From: Kam Kasravi [mailto:kamkasravi@gmail.com]
Sent: Monday, May 16, 2016 8:22 AM
To: dev@beam.incubator.apache.org
Subject: Re: machine learning API, common models

Thanks Simone - yes I had read your concerns on dev and I think they're
well founded.
Thanks for the samsura reference - I've been looking at the spark/scala
bindings http://mahout.apache.org/users/sparkbindings/ScalaSparkBindings.pdf
.

I think we should expand the document to include linear algebraic ops or
least pay due diligence to it. If you're doing anything on the flink side
in this regard let us or feel free to suggest edits/updates to the document.

Thanks
Kam

On Mon, May 16, 2016 at 6:05 AM, Simone Robutti <
simone.robutti@radicalbit.io> wrote:

> Hello,
>
> I'm Simone and I just began contributing to Flink ML (actually on the
> distributed linalg part). I already expressed my concerns about the
> idea of an high level API relying on specific frameworks' implementations:
> different implementations produce different results and may vary in
> quality. Also the semantics of parameters may change from one
> implementation to the other. This could hinder portability and
> transparency. I believe these problems could be handled paying the due
> attention to the details of every single implementation but I invite
> you not to underestimate these problems.
>
> On the other hand the API in itself looks good to me. From my side, I
> hope to fill some of the gaps in Flink you underlined in the comparison
matrix.
>
> Talking about matrices, proper matrices this time, I believe it would
> be useful to include in this API support for linear algebra operations.
> Something similar is already present in Mahout's Samsara and it looks
> really good but clearly a similar implementation on Beam would be way
> more interesting and powerful.
>
> My 2 cents,
>
> Simone
>
>
> 2016-05-14 4:53 GMT+02:00 Kavulya, Soila P <so...@intel.com>:
>
> > Hi Tyler,
> >
> > Thank you so much for your feedback. I agree that starting with the
> > high-level API is a good direction. We are interested in Python
> > because
> it
> > is the language that our data scientists are most familiar with. I
> > think starting with Java would be the best approach, because the
> > Python API can be a thin wrapper for Java API.
> >
> > In Spark, the Scala, Java and Python APIs are identical. Flink does
> > not have a Python API for ML pipelines at present.
> >
> > Could you point me to the updated runner API?
> >
> > Soila
> >
> > -----Original Message-----
> > From: Tyler Akidau [mailto:takidau@google.com.INVALID]
> > Sent: Friday, May 13, 2016 6:34 PM
> > To: dev@beam.incubator.apache.org
> > Subject: Re: machine learning API, common models
> >
> > Hi Kam & Soila,
> >
> > Thanks a lot for writing this up. I ran the doc past some of the
> > folks who've been doing ML work here at Google, and they were
> > generally happy with the distillation of common methods in the doc.
> > I'd be curious to
> hear
> > what folks on the Flink- and Spark- runner sides think.
> >
> > To me, this seems like a good direction for a high-level API.
> > Presumably, once a high-level API is in place, we could begin
> > looking at what it
> would
> > take to add lower-level ML algorithm support (e.g. iterative) to the
> > Beam Model. Is this essentially what you're thinking?
> >
> > Some more specific questions/comments:
> >
> >    - Presumably you'd want to tackle this in Java first, since that's
the
> >    only language we currently support? Given that half of your
> > examples are in
> >    Python, I'm also assuming Python will be interesting once it's
> > available.
> >
> >    - Along those lines, what languages are represented in the capability
> >    matrix? E.g. is Spark ML support as detailed there identical across
> >    Java/Scala and Python?
> >
> >    - Have you thought about how this would tie in at the runner level,
> >    particularly given the updated Runner API changes that are coming?
I'm
> >    assuming they'd be provided as composite transforms that (for
> > now)
> would
> >    have no default implementation, given the lack of low-level
> > primitives for
> >    ML algorithms, but am curious what your thoughts are there.
> >
> >    - I still don't fully understand how incremental updates due to model
> >    drift would tie in at the API level. There's a comment thread in
> > the
> doc
> >    still open tracking this, so no need to comment here additionally.
> Just
> >    pointing it out as one of the things that stands out as
> > potentially having
> >    API-level impacts to me that doesn't seem 100% fleshed out in the
> > doc yet
> >    (thought that admittedly may just be my limited understanding at
> > this point
> >    :-).
> >
> > -Tyler
> >
> >
> >
> >
> > On Fri, May 13, 2016 at 10:48 AM Kam Kasravi <ka...@gmail.com>
> wrote:
> >
> > > Hi Tyler - my bad. Comments should be enabled now.
> > >
> > > On Fri, May 13, 2016 at 10:45 AM, Tyler Akidau
> > > <takidau@google.com.invalid
> > > >
> > > wrote:
> > >
> > > > Thanks a lot, Kam. Can you please enable comment access on the doc?
> > > > I
> > > seem
> > > > to have view access only.
> > > >
> > > > -Tyler
> > > >
> > > > On Fri, May 13, 2016 at 9:54 AM Kam Kasravi
> > > > <ka...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi
> > > > >
> > > > > A number of readers have made comments on this topic recently.
> > > > > We have created a document that does some analysis of common
> > > > > ML models and
> > > > related
> > > > > APIs. We hope this can drive an approach that will result in
> > > > > an API, compatibility matrix and involvement from the same
> > > > > groups that are implementing transformation runners (spark,
flink, etc).
> > > > > We welcome comments here or in the document itself.
> > > > >
> > > > >
> > > > >
> > > >
> > > https://docs.google.com/document/d/17cRZk_yqHm3C0fljivjN66MbLkeKS1
> > > yjo4
> > > PBECHb-xA/edit?usp=sharing
> > > > >
> > > >
> > >
> >
>

Fwd: machine learning API, common models

Posted by Suneel Marthi <sm...@apache.org>.

Thanks Simone for pointing this out.

On the Apache Mahout project we have distributed linear algebra with R-like
semantics that can be executed on Spark/Flink/H2O.

@Kam: the document u point out is old and outdated, the most up-to-date
reference to the Samsara api is the book - 'Apache Mahout: Beyond
MapReduce". (shameless marketing here on behalf of fellow committers :) )

We added Flink DataSet API in the recent Mahout 0.12.0 release (April 11,
2016) and has been called out in my talk at ApacheBigData in Vancouver last
week.

The Mahout community would definitely be interested in being involved with
this and sharing notes.

IMHO, the focus should be first on building a good linalg foundations
before embarking on building algos and pipelines. Adding @dlyubimov to this.



---------- Forwarded message ----------
From: Simone Robutti <si...@radicalbit.io>
Date: Tue, May 17, 2016 at 9:48 AM
Subject: Fwd: machine learning API, common models
To: Suneel Marthi <sm...@apache.org>



---------- Forwarded message ----------
From: Kavulya, Soila P <so...@intel.com>
Date: 2016-05-17 1:53 GMT+02:00
Subject: RE: machine learning API, common models
To: "dev@beam.incubator.apache.org" <de...@beam.incubator.apache.org>


Thanks Simone,

You have raised a valid concern about how different frameworks will have
different implementations and parameter semantics for the same algorithm. I
agree that it is important to keep this in mind. Hopefully, through this
exercise, we will identify a good set of common ML abstractions across
different frameworks.

Feel free to edit the document. We had limited the first pass of the
comparison matrix to the machine learning pipeline APIs, but we can extend
it to include other ML building blocks like linear algebra operations, and
APIs for optimizers like gradient descent.

Soila

-----Original Message-----
From: Kam Kasravi [mailto:kamkasravi@gmail.com]
Sent: Monday, May 16, 2016 8:22 AM
To: dev@beam.incubator.apache.org
Subject: Re: machine learning API, common models

Thanks Simone - yes I had read your concerns on dev and I think they're
well founded.
Thanks for the samsura reference - I've been looking at the spark/scala
bindings http://mahout.apache.org/users/sparkbindings/ScalaSparkBindings.pdf
.

I think we should expand the document to include linear algebraic ops or
least pay due diligence to it. If you're doing anything on the flink side
in this regard let us or feel free to suggest edits/updates to the document.

Thanks
Kam

On Mon, May 16, 2016 at 6:05 AM, Simone Robutti <
simone.robutti@radicalbit.io> wrote:

> Hello,
>
> I'm Simone and I just began contributing to Flink ML (actually on the
> distributed linalg part). I already expressed my concerns about the
> idea of an high level API relying on specific frameworks' implementations:
> different implementations produce different results and may vary in
> quality. Also the semantics of parameters may change from one
> implementation to the other. This could hinder portability and
> transparency. I believe these problems could be handled paying the due
> attention to the details of every single implementation but I invite
> you not to underestimate these problems.
>
> On the other hand the API in itself looks good to me. From my side, I
> hope to fill some of the gaps in Flink you underlined in the comparison
matrix.
>
> Talking about matrices, proper matrices this time, I believe it would
> be useful to include in this API support for linear algebra operations.
> Something similar is already present in Mahout's Samsara and it looks
> really good but clearly a similar implementation on Beam would be way
> more interesting and powerful.
>
> My 2 cents,
>
> Simone
>
>
> 2016-05-14 4:53 GMT+02:00 Kavulya, Soila P <so...@intel.com>:
>
> > Hi Tyler,
> >
> > Thank you so much for your feedback. I agree that starting with the
> > high-level API is a good direction. We are interested in Python
> > because
> it
> > is the language that our data scientists are most familiar with. I
> > think starting with Java would be the best approach, because the
> > Python API can be a thin wrapper for Java API.
> >
> > In Spark, the Scala, Java and Python APIs are identical. Flink does
> > not have a Python API for ML pipelines at present.
> >
> > Could you point me to the updated runner API?
> >
> > Soila
> >
> > -----Original Message-----
> > From: Tyler Akidau [mailto:takidau@google.com.INVALID]
> > Sent: Friday, May 13, 2016 6:34 PM
> > To: dev@beam.incubator.apache.org
> > Subject: Re: machine learning API, common models
> >
> > Hi Kam & Soila,
> >
> > Thanks a lot for writing this up. I ran the doc past some of the
> > folks who've been doing ML work here at Google, and they were
> > generally happy with the distillation of common methods in the doc.
> > I'd be curious to
> hear
> > what folks on the Flink- and Spark- runner sides think.
> >
> > To me, this seems like a good direction for a high-level API.
> > Presumably, once a high-level API is in place, we could begin
> > looking at what it
> would
> > take to add lower-level ML algorithm support (e.g. iterative) to the
> > Beam Model. Is this essentially what you're thinking?
> >
> > Some more specific questions/comments:
> >
> >    - Presumably you'd want to tackle this in Java first, since that's
the
> >    only language we currently support? Given that half of your
> > examples are in
> >    Python, I'm also assuming Python will be interesting once it's
> > available.
> >
> >    - Along those lines, what languages are represented in the capability
> >    matrix? E.g. is Spark ML support as detailed there identical across
> >    Java/Scala and Python?
> >
> >    - Have you thought about how this would tie in at the runner level,
> >    particularly given the updated Runner API changes that are coming?
I'm
> >    assuming they'd be provided as composite transforms that (for
> > now)
> would
> >    have no default implementation, given the lack of low-level
> > primitives for
> >    ML algorithms, but am curious what your thoughts are there.
> >
> >    - I still don't fully understand how incremental updates due to model
> >    drift would tie in at the API level. There's a comment thread in
> > the
> doc
> >    still open tracking this, so no need to comment here additionally.
> Just
> >    pointing it out as one of the things that stands out as
> > potentially having
> >    API-level impacts to me that doesn't seem 100% fleshed out in the
> > doc yet
> >    (thought that admittedly may just be my limited understanding at
> > this point
> >    :-).
> >
> > -Tyler
> >
> >
> >
> >
> > On Fri, May 13, 2016 at 10:48 AM Kam Kasravi <ka...@gmail.com>
> wrote:
> >
> > > Hi Tyler - my bad. Comments should be enabled now.
> > >
> > > On Fri, May 13, 2016 at 10:45 AM, Tyler Akidau
> > > <takidau@google.com.invalid
> > > >
> > > wrote:
> > >
> > > > Thanks a lot, Kam. Can you please enable comment access on the doc?
> > > > I
> > > seem
> > > > to have view access only.
> > > >
> > > > -Tyler
> > > >
> > > > On Fri, May 13, 2016 at 9:54 AM Kam Kasravi
> > > > <ka...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi
> > > > >
> > > > > A number of readers have made comments on this topic recently.
> > > > > We have created a document that does some analysis of common
> > > > > ML models and
> > > > related
> > > > > APIs. We hope this can drive an approach that will result in
> > > > > an API, compatibility matrix and involvement from the same
> > > > > groups that are implementing transformation runners (spark,
flink, etc).
> > > > > We welcome comments here or in the document itself.
> > > > >
> > > > >
> > > > >
> > > >
> > > https://docs.google.com/document/d/17cRZk_yqHm3C0fljivjN66MbLkeKS1
> > > yjo4
> > > PBECHb-xA/edit?usp=sharing
> > > > >
> > > >
> > >
> >
>

RE: machine learning API, common models

Posted by "Kavulya, Soila P" <so...@intel.com>.

Thanks Simone,

You have raised a valid concern about how different frameworks will have different implementations and parameter semantics for the same algorithm. I agree that it is important to keep this in mind. Hopefully, through this exercise, we will identify a good set of common ML abstractions across different frameworks.

Feel free to edit the document. We had limited the first pass of the comparison matrix to the machine learning pipeline APIs, but we can extend it to include other ML building blocks like linear algebra operations, and APIs for optimizers like gradient descent. 

Soila

-----Original Message-----
From: Kam Kasravi [mailto:kamkasravi@gmail.com] 
Sent: Monday, May 16, 2016 8:22 AM
To: dev@beam.incubator.apache.org
Subject: Re: machine learning API, common models

Thanks Simone - yes I had read your concerns on dev and I think they're well founded.
Thanks for the samsura reference - I've been looking at the spark/scala bindings http://mahout.apache.org/users/sparkbindings/ScalaSparkBindings.pdf.

I think we should expand the document to include linear algebraic ops or least pay due diligence to it. If you're doing anything on the flink side in this regard let us or feel free to suggest edits/updates to the document.

Thanks
Kam

On Mon, May 16, 2016 at 6:05 AM, Simone Robutti < simone.robutti@radicalbit.io> wrote:

> Hello,
>
> I'm Simone and I just began contributing to Flink ML (actually on the 
> distributed linalg part). I already expressed my concerns about the 
> idea of an high level API relying on specific frameworks' implementations:
> different implementations produce different results and may vary in 
> quality. Also the semantics of parameters may change from one 
> implementation to the other. This could hinder portability and 
> transparency. I believe these problems could be handled paying the due 
> attention to the details of every single implementation but I invite 
> you not to underestimate these problems.
>
> On the other hand the API in itself looks good to me. From my side, I 
> hope to fill some of the gaps in Flink you underlined in the comparison matrix.
>
> Talking about matrices, proper matrices this time, I believe it would 
> be useful to include in this API support for linear algebra operations.
> Something similar is already present in Mahout's Samsara and it looks 
> really good but clearly a similar implementation on Beam would be way 
> more interesting and powerful.
>
> My 2 cents,
>
> Simone
>
>
> 2016-05-14 4:53 GMT+02:00 Kavulya, Soila P <so...@intel.com>:
>
> > Hi Tyler,
> >
> > Thank you so much for your feedback. I agree that starting with the 
> > high-level API is a good direction. We are interested in Python 
> > because
> it
> > is the language that our data scientists are most familiar with. I 
> > think starting with Java would be the best approach, because the 
> > Python API can be a thin wrapper for Java API.
> >
> > In Spark, the Scala, Java and Python APIs are identical. Flink does 
> > not have a Python API for ML pipelines at present.
> >
> > Could you point me to the updated runner API?
> >
> > Soila
> >
> > -----Original Message-----
> > From: Tyler Akidau [mailto:takidau@google.com.INVALID]
> > Sent: Friday, May 13, 2016 6:34 PM
> > To: dev@beam.incubator.apache.org
> > Subject: Re: machine learning API, common models
> >
> > Hi Kam & Soila,
> >
> > Thanks a lot for writing this up. I ran the doc past some of the 
> > folks who've been doing ML work here at Google, and they were 
> > generally happy with the distillation of common methods in the doc. 
> > I'd be curious to
> hear
> > what folks on the Flink- and Spark- runner sides think.
> >
> > To me, this seems like a good direction for a high-level API. 
> > Presumably, once a high-level API is in place, we could begin 
> > looking at what it
> would
> > take to add lower-level ML algorithm support (e.g. iterative) to the 
> > Beam Model. Is this essentially what you're thinking?
> >
> > Some more specific questions/comments:
> >
> >    - Presumably you'd want to tackle this in Java first, since that's the
> >    only language we currently support? Given that half of your 
> > examples are in
> >    Python, I'm also assuming Python will be interesting once it's 
> > available.
> >
> >    - Along those lines, what languages are represented in the capability
> >    matrix? E.g. is Spark ML support as detailed there identical across
> >    Java/Scala and Python?
> >
> >    - Have you thought about how this would tie in at the runner level,
> >    particularly given the updated Runner API changes that are coming? I'm
> >    assuming they'd be provided as composite transforms that (for 
> > now)
> would
> >    have no default implementation, given the lack of low-level 
> > primitives for
> >    ML algorithms, but am curious what your thoughts are there.
> >
> >    - I still don't fully understand how incremental updates due to model
> >    drift would tie in at the API level. There's a comment thread in 
> > the
> doc
> >    still open tracking this, so no need to comment here additionally.
> Just
> >    pointing it out as one of the things that stands out as 
> > potentially having
> >    API-level impacts to me that doesn't seem 100% fleshed out in the 
> > doc yet
> >    (thought that admittedly may just be my limited understanding at 
> > this point
> >    :-).
> >
> > -Tyler
> >
> >
> >
> >
> > On Fri, May 13, 2016 at 10:48 AM Kam Kasravi <ka...@gmail.com>
> wrote:
> >
> > > Hi Tyler - my bad. Comments should be enabled now.
> > >
> > > On Fri, May 13, 2016 at 10:45 AM, Tyler Akidau 
> > > <takidau@google.com.invalid
> > > >
> > > wrote:
> > >
> > > > Thanks a lot, Kam. Can you please enable comment access on the doc?
> > > > I
> > > seem
> > > > to have view access only.
> > > >
> > > > -Tyler
> > > >
> > > > On Fri, May 13, 2016 at 9:54 AM Kam Kasravi 
> > > > <ka...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi
> > > > >
> > > > > A number of readers have made comments on this topic recently. 
> > > > > We have created a document that does some analysis of common 
> > > > > ML models and
> > > > related
> > > > > APIs. We hope this can drive an approach that will result in 
> > > > > an API, compatibility matrix and involvement from the same 
> > > > > groups that are implementing transformation runners (spark, flink, etc).
> > > > > We welcome comments here or in the document itself.
> > > > >
> > > > >
> > > > >
> > > >
> > > https://docs.google.com/document/d/17cRZk_yqHm3C0fljivjN66MbLkeKS1
> > > yjo4
> > > PBECHb-xA/edit?usp=sharing
> > > > >
> > > >
> > >
> >
>

Re: machine learning API, common models

Posted by Kam Kasravi <ka...@gmail.com>.

Thanks Simone - yes I had read your concerns on dev and I think they're
well founded.
Thanks for the samsura reference - I've been looking at the spark/scala
bindings
http://mahout.apache.org/users/sparkbindings/ScalaSparkBindings.pdf.

I think we should expand the document to include linear algebraic ops or
least pay
due diligence to it. If you're doing anything on the flink side in this
regard let us or
feel free to suggest edits/updates to the document.

Thanks
Kam

On Mon, May 16, 2016 at 6:05 AM, Simone Robutti <
simone.robutti@radicalbit.io> wrote:

> Hello,
>
> I'm Simone and I just began contributing to Flink ML (actually on the
> distributed linalg part). I already expressed my concerns about the idea of
> an high level API relying on specific frameworks' implementations:
> different implementations produce different results and may vary in
> quality. Also the semantics of parameters may change from one
> implementation to the other. This could hinder portability and
> transparency. I believe these problems could be handled paying the due
> attention to the details of every single implementation but I invite you
> not to underestimate these problems.
>
> On the other hand the API in itself looks good to me. From my side, I hope
> to fill some of the gaps in Flink you underlined in the comparison matrix.
>
> Talking about matrices, proper matrices this time, I believe it would be
> useful to include in this API support for linear algebra operations.
> Something similar is already present in Mahout's Samsara and it looks
> really good but clearly a similar implementation on Beam would be way more
> interesting and powerful.
>
> My 2 cents,
>
> Simone
>
>
> 2016-05-14 4:53 GMT+02:00 Kavulya, Soila P <so...@intel.com>:
>
> > Hi Tyler,
> >
> > Thank you so much for your feedback. I agree that starting with the
> > high-level API is a good direction. We are interested in Python because
> it
> > is the language that our data scientists are most familiar with. I think
> > starting with Java would be the best approach, because the Python API can
> > be a thin wrapper for Java API.
> >
> > In Spark, the Scala, Java and Python APIs are identical. Flink does not
> > have a Python API for ML pipelines at present.
> >
> > Could you point me to the updated runner API?
> >
> > Soila
> >
> > -----Original Message-----
> > From: Tyler Akidau [mailto:takidau@google.com.INVALID]
> > Sent: Friday, May 13, 2016 6:34 PM
> > To: dev@beam.incubator.apache.org
> > Subject: Re: machine learning API, common models
> >
> > Hi Kam & Soila,
> >
> > Thanks a lot for writing this up. I ran the doc past some of the folks
> > who've been doing ML work here at Google, and they were generally happy
> > with the distillation of common methods in the doc. I'd be curious to
> hear
> > what folks on the Flink- and Spark- runner sides think.
> >
> > To me, this seems like a good direction for a high-level API. Presumably,
> > once a high-level API is in place, we could begin looking at what it
> would
> > take to add lower-level ML algorithm support (e.g. iterative) to the Beam
> > Model. Is this essentially what you're thinking?
> >
> > Some more specific questions/comments:
> >
> >    - Presumably you'd want to tackle this in Java first, since that's the
> >    only language we currently support? Given that half of your examples
> > are in
> >    Python, I'm also assuming Python will be interesting once it's
> > available.
> >
> >    - Along those lines, what languages are represented in the capability
> >    matrix? E.g. is Spark ML support as detailed there identical across
> >    Java/Scala and Python?
> >
> >    - Have you thought about how this would tie in at the runner level,
> >    particularly given the updated Runner API changes that are coming? I'm
> >    assuming they'd be provided as composite transforms that (for now)
> would
> >    have no default implementation, given the lack of low-level primitives
> > for
> >    ML algorithms, but am curious what your thoughts are there.
> >
> >    - I still don't fully understand how incremental updates due to model
> >    drift would tie in at the API level. There's a comment thread in the
> doc
> >    still open tracking this, so no need to comment here additionally.
> Just
> >    pointing it out as one of the things that stands out as potentially
> > having
> >    API-level impacts to me that doesn't seem 100% fleshed out in the doc
> > yet
> >    (thought that admittedly may just be my limited understanding at this
> > point
> >    :-).
> >
> > -Tyler
> >
> >
> >
> >
> > On Fri, May 13, 2016 at 10:48 AM Kam Kasravi <ka...@gmail.com>
> wrote:
> >
> > > Hi Tyler - my bad. Comments should be enabled now.
> > >
> > > On Fri, May 13, 2016 at 10:45 AM, Tyler Akidau
> > > <takidau@google.com.invalid
> > > >
> > > wrote:
> > >
> > > > Thanks a lot, Kam. Can you please enable comment access on the doc?
> > > > I
> > > seem
> > > > to have view access only.
> > > >
> > > > -Tyler
> > > >
> > > > On Fri, May 13, 2016 at 9:54 AM Kam Kasravi <ka...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi
> > > > >
> > > > > A number of readers have made comments on this topic recently. We
> > > > > have created a document that does some analysis of common ML
> > > > > models and
> > > > related
> > > > > APIs. We hope this can drive an approach that will result in an
> > > > > API, compatibility matrix and involvement from the same groups
> > > > > that are implementing transformation runners (spark, flink, etc).
> > > > > We welcome comments here or in the document itself.
> > > > >
> > > > >
> > > > >
> > > >
> > > https://docs.google.com/document/d/17cRZk_yqHm3C0fljivjN66MbLkeKS1yjo4
> > > PBECHb-xA/edit?usp=sharing
> > > > >
> > > >
> > >
> >
>

Re: machine learning API, common models

Posted by Simone Robutti <si...@radicalbit.io>.

Hello,

I'm Simone and I just began contributing to Flink ML (actually on the
distributed linalg part). I already expressed my concerns about the idea of
an high level API relying on specific frameworks' implementations:
different implementations produce different results and may vary in
quality. Also the semantics of parameters may change from one
implementation to the other. This could hinder portability and
transparency. I believe these problems could be handled paying the due
attention to the details of every single implementation but I invite you
not to underestimate these problems.

On the other hand the API in itself looks good to me. From my side, I hope
to fill some of the gaps in Flink you underlined in the comparison matrix.

Talking about matrices, proper matrices this time, I believe it would be
useful to include in this API support for linear algebra operations.
Something similar is already present in Mahout's Samsara and it looks
really good but clearly a similar implementation on Beam would be way more
interesting and powerful.

My 2 cents,

Simone


2016-05-14 4:53 GMT+02:00 Kavulya, Soila P <so...@intel.com>:

> Hi Tyler,
>
> Thank you so much for your feedback. I agree that starting with the
> high-level API is a good direction. We are interested in Python because it
> is the language that our data scientists are most familiar with. I think
> starting with Java would be the best approach, because the Python API can
> be a thin wrapper for Java API.
>
> In Spark, the Scala, Java and Python APIs are identical. Flink does not
> have a Python API for ML pipelines at present.
>
> Could you point me to the updated runner API?
>
> Soila
>
> -----Original Message-----
> From: Tyler Akidau [mailto:takidau@google.com.INVALID]
> Sent: Friday, May 13, 2016 6:34 PM
> To: dev@beam.incubator.apache.org
> Subject: Re: machine learning API, common models
>
> Hi Kam & Soila,
>
> Thanks a lot for writing this up. I ran the doc past some of the folks
> who've been doing ML work here at Google, and they were generally happy
> with the distillation of common methods in the doc. I'd be curious to hear
> what folks on the Flink- and Spark- runner sides think.
>
> To me, this seems like a good direction for a high-level API. Presumably,
> once a high-level API is in place, we could begin looking at what it would
> take to add lower-level ML algorithm support (e.g. iterative) to the Beam
> Model. Is this essentially what you're thinking?
>
> Some more specific questions/comments:
>
>    - Presumably you'd want to tackle this in Java first, since that's the
>    only language we currently support? Given that half of your examples
> are in
>    Python, I'm also assuming Python will be interesting once it's
> available.
>
>    - Along those lines, what languages are represented in the capability
>    matrix? E.g. is Spark ML support as detailed there identical across
>    Java/Scala and Python?
>
>    - Have you thought about how this would tie in at the runner level,
>    particularly given the updated Runner API changes that are coming? I'm
>    assuming they'd be provided as composite transforms that (for now) would
>    have no default implementation, given the lack of low-level primitives
> for
>    ML algorithms, but am curious what your thoughts are there.
>
>    - I still don't fully understand how incremental updates due to model
>    drift would tie in at the API level. There's a comment thread in the doc
>    still open tracking this, so no need to comment here additionally. Just
>    pointing it out as one of the things that stands out as potentially
> having
>    API-level impacts to me that doesn't seem 100% fleshed out in the doc
> yet
>    (thought that admittedly may just be my limited understanding at this
> point
>    :-).
>
> -Tyler
>
>
>
>
> On Fri, May 13, 2016 at 10:48 AM Kam Kasravi <ka...@gmail.com> wrote:
>
> > Hi Tyler - my bad. Comments should be enabled now.
> >
> > On Fri, May 13, 2016 at 10:45 AM, Tyler Akidau
> > <takidau@google.com.invalid
> > >
> > wrote:
> >
> > > Thanks a lot, Kam. Can you please enable comment access on the doc?
> > > I
> > seem
> > > to have view access only.
> > >
> > > -Tyler
> > >
> > > On Fri, May 13, 2016 at 9:54 AM Kam Kasravi <ka...@gmail.com>
> > wrote:
> > >
> > > > Hi
> > > >
> > > > A number of readers have made comments on this topic recently. We
> > > > have created a document that does some analysis of common ML
> > > > models and
> > > related
> > > > APIs. We hope this can drive an approach that will result in an
> > > > API, compatibility matrix and involvement from the same groups
> > > > that are implementing transformation runners (spark, flink, etc).
> > > > We welcome comments here or in the document itself.
> > > >
> > > >
> > > >
> > >
> > https://docs.google.com/document/d/17cRZk_yqHm3C0fljivjN66MbLkeKS1yjo4
> > PBECHb-xA/edit?usp=sharing
> > > >
> > >
> >
>

Re: machine learning API, common models

Posted by Tyler Akidau <ta...@google.com.INVALID>.

On Sat, May 14, 2016 at 4:53 AM Kavulya, Soila P <so...@intel.com>
wrote:

> Hi Tyler,
>
> Thank you so much for your feedback. I agree that starting with the
> high-level API is a good direction. We are interested in Python because it
> is the language that our data scientists are most familiar with. I think
> starting with Java would be the best approach, because the Python API can
> be a thin wrapper for Java API.
>
> In Spark, the Scala, Java and Python APIs are identical. Flink does not
> have a Python API for ML pipelines at present.
>
> Could you point me to the updated runner API?
>

Sorry for the delay; I've been traveling. The runner API proposal is here:
https://docs.google.com/document/d/1bao-5B6uBuf-kwH1meenAuXXS0c9cBQ1B2J59I3FiyI/edit

-Tyler


>
> Soila
>
> -----Original Message-----
> From: Tyler Akidau [mailto:takidau@google.com.INVALID]
> Sent: Friday, May 13, 2016 6:34 PM
> To: dev@beam.incubator.apache.org
> Subject: Re: machine learning API, common models
>
> Hi Kam & Soila,
>
> Thanks a lot for writing this up. I ran the doc past some of the folks
> who've been doing ML work here at Google, and they were generally happy
> with the distillation of common methods in the doc. I'd be curious to hear
> what folks on the Flink- and Spark- runner sides think.
>
> To me, this seems like a good direction for a high-level API. Presumably,
> once a high-level API is in place, we could begin looking at what it would
> take to add lower-level ML algorithm support (e.g. iterative) to the Beam
> Model. Is this essentially what you're thinking?
>
> Some more specific questions/comments:
>
>    - Presumably you'd want to tackle this in Java first, since that's the
>    only language we currently support? Given that half of your examples
> are in
>    Python, I'm also assuming Python will be interesting once it's
> available.
>
>    - Along those lines, what languages are represented in the capability
>    matrix? E.g. is Spark ML support as detailed there identical across
>    Java/Scala and Python?
>
>    - Have you thought about how this would tie in at the runner level,
>    particularly given the updated Runner API changes that are coming? I'm
>    assuming they'd be provided as composite transforms that (for now) would
>    have no default implementation, given the lack of low-level primitives
> for
>    ML algorithms, but am curious what your thoughts are there.
>
>    - I still don't fully understand how incremental updates due to model
>    drift would tie in at the API level. There's a comment thread in the doc
>    still open tracking this, so no need to comment here additionally. Just
>    pointing it out as one of the things that stands out as potentially
> having
>    API-level impacts to me that doesn't seem 100% fleshed out in the doc
> yet
>    (thought that admittedly may just be my limited understanding at this
> point
>    :-).
>
> -Tyler
>
>
>
>
> On Fri, May 13, 2016 at 10:48 AM Kam Kasravi <ka...@gmail.com> wrote:
>
> > Hi Tyler - my bad. Comments should be enabled now.
> >
> > On Fri, May 13, 2016 at 10:45 AM, Tyler Akidau
> > <takidau@google.com.invalid
> > >
> > wrote:
> >
> > > Thanks a lot, Kam. Can you please enable comment access on the doc?
> > > I
> > seem
> > > to have view access only.
> > >
> > > -Tyler
> > >
> > > On Fri, May 13, 2016 at 9:54 AM Kam Kasravi <ka...@gmail.com>
> > wrote:
> > >
> > > > Hi
> > > >
> > > > A number of readers have made comments on this topic recently. We
> > > > have created a document that does some analysis of common ML
> > > > models and
> > > related
> > > > APIs. We hope this can drive an approach that will result in an
> > > > API, compatibility matrix and involvement from the same groups
> > > > that are implementing transformation runners (spark, flink, etc).
> > > > We welcome comments here or in the document itself.
> > > >
> > > >
> > > >
> > >
> > https://docs.google.com/document/d/17cRZk_yqHm3C0fljivjN66MbLkeKS1yjo4
> > PBECHb-xA/edit?usp=sharing
> > > >
> > >
> >
>

RE: machine learning API, common models

Posted by "Kavulya, Soila P" <so...@intel.com>.

Hi Tyler,

Thank you so much for your feedback. I agree that starting with the high-level API is a good direction. We are interested in Python because it is the language that our data scientists are most familiar with. I think starting with Java would be the best approach, because the Python API can be a thin wrapper for Java API.

In Spark, the Scala, Java and Python APIs are identical. Flink does not have a Python API for ML pipelines at present.

Could you point me to the updated runner API?

Soila

-----Original Message-----
From: Tyler Akidau [mailto:takidau@google.com.INVALID] 
Sent: Friday, May 13, 2016 6:34 PM
To: dev@beam.incubator.apache.org
Subject: Re: machine learning API, common models

Hi Kam & Soila,

Thanks a lot for writing this up. I ran the doc past some of the folks who've been doing ML work here at Google, and they were generally happy with the distillation of common methods in the doc. I'd be curious to hear what folks on the Flink- and Spark- runner sides think.

To me, this seems like a good direction for a high-level API. Presumably, once a high-level API is in place, we could begin looking at what it would take to add lower-level ML algorithm support (e.g. iterative) to the Beam Model. Is this essentially what you're thinking?

Some more specific questions/comments:

   - Presumably you'd want to tackle this in Java first, since that's the
   only language we currently support? Given that half of your examples are in
   Python, I'm also assuming Python will be interesting once it's available.

   - Along those lines, what languages are represented in the capability
   matrix? E.g. is Spark ML support as detailed there identical across
   Java/Scala and Python?

   - Have you thought about how this would tie in at the runner level,
   particularly given the updated Runner API changes that are coming? I'm
   assuming they'd be provided as composite transforms that (for now) would
   have no default implementation, given the lack of low-level primitives for
   ML algorithms, but am curious what your thoughts are there.

   - I still don't fully understand how incremental updates due to model
   drift would tie in at the API level. There's a comment thread in the doc
   still open tracking this, so no need to comment here additionally. Just
   pointing it out as one of the things that stands out as potentially having
   API-level impacts to me that doesn't seem 100% fleshed out in the doc yet
   (thought that admittedly may just be my limited understanding at this point
   :-).

-Tyler

On Fri, May 13, 2016 at 10:48 AM Kam Kasravi <ka...@gmail.com> wrote:

> Hi Tyler - my bad. Comments should be enabled now.
>
> On Fri, May 13, 2016 at 10:45 AM, Tyler Akidau 
> <takidau@google.com.invalid
> >
> wrote:
>
> > Thanks a lot, Kam. Can you please enable comment access on the doc? 
> > I
> seem
> > to have view access only.
> >
> > -Tyler
> >
> > On Fri, May 13, 2016 at 9:54 AM Kam Kasravi <ka...@gmail.com>
> wrote:
> >
> > > Hi
> > >
> > > A number of readers have made comments on this topic recently. We 
> > > have created a document that does some analysis of common ML 
> > > models and
> > related
> > > APIs. We hope this can drive an approach that will result in an 
> > > API, compatibility matrix and involvement from the same groups 
> > > that are implementing transformation runners (spark, flink, etc). 
> > > We welcome comments here or in the document itself.
> > >
> > >
> > >
> >
> https://docs.google.com/document/d/17cRZk_yqHm3C0fljivjN66MbLkeKS1yjo4
> PBECHb-xA/edit?usp=sharing
> > >
> >
>

Re: machine learning API, common models

Posted by Tyler Akidau <ta...@google.com.INVALID>.

Hi Kam & Soila,

Thanks a lot for writing this up. I ran the doc past some of the folks
who've been doing ML work here at Google, and they were generally happy
with the distillation of common methods in the doc. I'd be curious to hear
what folks on the Flink- and Spark- runner sides think.

To me, this seems like a good direction for a high-level API. Presumably,
once a high-level API is in place, we could begin looking at what it would
take to add lower-level ML algorithm support (e.g. iterative) to the Beam
Model. Is this essentially what you're thinking?

Some more specific questions/comments:

   - Presumably you'd want to tackle this in Java first, since that's the
   only language we currently support? Given that half of your examples are in
   Python, I'm also assuming Python will be interesting once it's available.

   - Along those lines, what languages are represented in the capability
   matrix? E.g. is Spark ML support as detailed there identical across
   Java/Scala and Python?

   - Have you thought about how this would tie in at the runner level,
   particularly given the updated Runner API changes that are coming? I'm
   assuming they'd be provided as composite transforms that (for now) would
   have no default implementation, given the lack of low-level primitives for
   ML algorithms, but am curious what your thoughts are there.

   - I still don't fully understand how incremental updates due to model
   drift would tie in at the API level. There's a comment thread in the doc
   still open tracking this, so no need to comment here additionally. Just
   pointing it out as one of the things that stands out as potentially having
   API-level impacts to me that doesn't seem 100% fleshed out in the doc yet
   (thought that admittedly may just be my limited understanding at this point
   :-).

-Tyler




On Fri, May 13, 2016 at 10:48 AM Kam Kasravi <ka...@gmail.com> wrote:

> Hi Tyler - my bad. Comments should be enabled now.
>
> On Fri, May 13, 2016 at 10:45 AM, Tyler Akidau <takidau@google.com.invalid
> >
> wrote:
>
> > Thanks a lot, Kam. Can you please enable comment access on the doc? I
> seem
> > to have view access only.
> >
> > -Tyler
> >
> > On Fri, May 13, 2016 at 9:54 AM Kam Kasravi <ka...@gmail.com>
> wrote:
> >
> > > Hi
> > >
> > > A number of readers have made comments on this topic recently. We have
> > > created a document that does some analysis of common ML models and
> > related
> > > APIs. We hope this can drive an approach that will result in an API,
> > > compatibility matrix and involvement from the same groups that are
> > > implementing transformation runners (spark, flink, etc). We welcome
> > > comments here or in the document itself.
> > >
> > >
> > >
> >
> https://docs.google.com/document/d/17cRZk_yqHm3C0fljivjN66MbLkeKS1yjo4PBECHb-xA/edit?usp=sharing
> > >
> >
>

Re: machine learning API, common models

Posted by Kam Kasravi <ka...@gmail.com>.

Hi Tyler - my bad. Comments should be enabled now.

On Fri, May 13, 2016 at 10:45 AM, Tyler Akidau <ta...@google.com.invalid>
wrote:

> Thanks a lot, Kam. Can you please enable comment access on the doc? I seem
> to have view access only.
>
> -Tyler
>
> On Fri, May 13, 2016 at 9:54 AM Kam Kasravi <ka...@gmail.com> wrote:
>
> > Hi
> >
> > A number of readers have made comments on this topic recently. We have
> > created a document that does some analysis of common ML models and
> related
> > APIs. We hope this can drive an approach that will result in an API,
> > compatibility matrix and involvement from the same groups that are
> > implementing transformation runners (spark, flink, etc). We welcome
> > comments here or in the document itself.
> >
> >
> >
> https://docs.google.com/document/d/17cRZk_yqHm3C0fljivjN66MbLkeKS1yjo4PBECHb-xA/edit?usp=sharing
> >
>

Re: machine learning API, common models

Posted by Tyler Akidau <ta...@google.com.INVALID>.

Thanks a lot, Kam. Can you please enable comment access on the doc? I seem
to have view access only.

-Tyler

On Fri, May 13, 2016 at 9:54 AM Kam Kasravi <ka...@gmail.com> wrote:

> Hi
>
> A number of readers have made comments on this topic recently. We have
> created a document that does some analysis of common ML models and related
> APIs. We hope this can drive an approach that will result in an API,
> compatibility matrix and involvement from the same groups that are
> implementing transformation runners (spark, flink, etc). We welcome
> comments here or in the document itself.
>
>
> https://docs.google.com/document/d/17cRZk_yqHm3C0fljivjN66MbLkeKS1yjo4PBECHb-xA/edit?usp=sharing
>