You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Trevor Grant <tr...@gmail.com> on 2016/12/09 16:04:35 UTC

[DISCUSS] Mahout Streaming Bindings

Wanted to kick off a discussion about if we're ready to start thinking
about coming up with some bindings for streaming engines.

Had a question at a meetup tuesday in seattle, and it got me thinking about
it.

In my mind they would be a discreet set of Bindings, Flink Streaming
/Spark Streaming / (Beam?) / etc

I have some fleeting thoughts, but didn't write any of the bindings and
have only attempted to grok them in passing.

Or maybe not something we want to pursue at this time? However, it is
something I'd be interested in tinkering with on a branch of my own...

tg

Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*

Re: [DISCUSS] Mahout Streaming Bindings

Posted by Trevor Grant <tr...@gmail.com>.
Hey Saikat, not really.

I was opening the discussion on weather or not we wanted to pursue Mahout
on Streaming at this time.

The CL integration was just an aside.


<https://github.com/andrewpalumbo/mahout/tree/viennacl>



Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Fri, Dec 9, 2016 at 4:09 PM, Saikat Kanjilal <sx...@hotmail.com> wrote:

> Trveor,
>
> Out of curiosity is this JIRA item related to this:
> https://www.mail-archive.com/dev@mahout.apache.org/msg32584.html
>
>
>
> I had wanted to help out on this and found the following:
> https://github.com/nativelibs4java/ScalaCL
>
> [https://avatars1.githubusercontent.com/u/11545921?v=3&s=400]<https://
> github.com/nativelibs4java/ScalaCL>
>
> GitHub - nativelibs4java/ScalaCL: ScalaCL - run Scala on ...<
> https://github.com/nativelibs4java/ScalaCL>
> github.com
> README.md ScalaCL lets you run Scala code on GPUs through OpenCL
> (BSD-licensed). WORK IN PROGRESS (see ScalaCL if you want something that
> works, albeit only on Scala ...
>
>
> Any interest in looking into hooking this into mahout-scala.
>
>
>
> ________________________________
> From: Trevor Grant <tr...@gmail.com>
> Sent: Friday, December 9, 2016 8:04 AM
> To: dev@mahout.apache.org
> Subject: [DISCUSS] Mahout Streaming Bindings
>
> Wanted to kick off a discussion about if we're ready to start thinking
> about coming up with some bindings for streaming engines.
>
> Had a question at a meetup tuesday in seattle, and it got me thinking about
> it.
>
> In my mind they would be a discreet set of Bindings, Flink Streaming
> /Spark Streaming / (Beam?) / etc
>
> I have some fleeting thoughts, but didn't write any of the bindings and
> have only attempted to grok them in passing.
>
> Or maybe not something we want to pursue at this time? However, it is
> something I'd be interested in tinkering with on a branch of my own...
>
> tg
>
> Trevor Grant
> Data Scientist
> https://github.com/rawkintrevo
> [https://avatars3.githubusercontent.com/u/5852441?v=3&s=400]<https://
> github.com/rawkintrevo>
>
> rawkintrevo (Trevor Grant) · GitHub<https://github.com/rawkintrevo>
> github.com
> rawkintrevo has 22 repositories available. Follow their code on GitHub.
>
>
>
> http://stackexchange.com/users/3002022/rawkintrevo
> http://trevorgrant.org
> [https://s0.wp.com/i/blank.jpg]<http://trevorgrant.org/>
>
> The musings of rawkintrevo<http://trevorgrant.org/>
> trevorgrant.org
> Hot-rodder, opera enthusiast, mad data scientist; a man for all seasons.
>
>
>
>
> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
>

Re: [DISCUSS] Mahout Streaming Bindings

Posted by Saikat Kanjilal <sx...@hotmail.com>.
Trveor,

Out of curiosity is this JIRA item related to this:  https://www.mail-archive.com/dev@mahout.apache.org/msg32584.html



I had wanted to help out on this and found the following: https://github.com/nativelibs4java/ScalaCL

[https://avatars1.githubusercontent.com/u/11545921?v=3&s=400]<https://github.com/nativelibs4java/ScalaCL>

GitHub - nativelibs4java/ScalaCL: ScalaCL - run Scala on ...<https://github.com/nativelibs4java/ScalaCL>
github.com
README.md ScalaCL lets you run Scala code on GPUs through OpenCL (BSD-licensed). WORK IN PROGRESS (see ScalaCL if you want something that works, albeit only on Scala ...


Any interest in looking into hooking this into mahout-scala.



________________________________
From: Trevor Grant <tr...@gmail.com>
Sent: Friday, December 9, 2016 8:04 AM
To: dev@mahout.apache.org
Subject: [DISCUSS] Mahout Streaming Bindings

Wanted to kick off a discussion about if we're ready to start thinking
about coming up with some bindings for streaming engines.

Had a question at a meetup tuesday in seattle, and it got me thinking about
it.

In my mind they would be a discreet set of Bindings, Flink Streaming
/Spark Streaming / (Beam?) / etc

I have some fleeting thoughts, but didn't write any of the bindings and
have only attempted to grok them in passing.

Or maybe not something we want to pursue at this time? However, it is
something I'd be interested in tinkering with on a branch of my own...

tg

Trevor Grant
Data Scientist
https://github.com/rawkintrevo
[https://avatars3.githubusercontent.com/u/5852441?v=3&s=400]<https://github.com/rawkintrevo>

rawkintrevo (Trevor Grant) · GitHub<https://github.com/rawkintrevo>
github.com
rawkintrevo has 22 repositories available. Follow their code on GitHub.



http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org
[https://s0.wp.com/i/blank.jpg]<http://trevorgrant.org/>

The musings of rawkintrevo<http://trevorgrant.org/>
trevorgrant.org
Hot-rodder, opera enthusiast, mad data scientist; a man for all seasons.




*"Fortunate is he, who is able to know the causes of things."  -Virgil*

RE: [DISCUSS] Mahout Streaming Bindings

Posted by Andrew Palumbo <ap...@outlook.com>.
I am leaning twards Suneel's position on this.. for different reasons and three orders of magnitude less -ve..

Re: GPU gpu and streams it's difficult
.. there is alot of overhead in copying from main memory to gpu memory, and so it makes sense to wait for a bat h of data to come in anyway with our framework...

There could be some interesting window g applications for this, eg stream a 5 sec window to main memory, copy to gpu, and perform operation, copy back, etc..

It would be very cool to stream directly to gpu memory would be a cool engineering problem that I wish I had the time to work on.

All in all I'd say I'm -.9 on this, since we have a good amount of technical debt that we have to pony up for at the moment.

I would think it is something that we may want to revisit wither after 0.14 0 (algos) or after 1.0.



Sent from my Verizon Wireless 4G LTE smartphone


-------- Original message --------
From: Suneel Marthi <sm...@apache.org>
Date: 12/09/2016 9:13 AM (GMT-08:00)
To: mahout <de...@mahout.apache.org>
Subject: Re: [DISCUSS] Mahout Streaming Bindings

-100

It doesn't make sense adding support for all of the batch and streaming
engines that are available. We presently have a H2O binding which has never
seen any use and I have long been thinking of trashing out.

It would be more productive utilization of the very limited resource time
by porting the legacy Recommender algorithms to Samsara as opposed to
adding anymore of these streaming/batch frameworks.

I have had folks asking me if we support Akka Streams, Apex etc... but most
folks don't know as to how they are gonna be using it or what is it they
are trying to accomplish.

How many Recommender Systems are out there today that need Streaming?
Amazon doesn't do Realtime streaming recommendations yet.  Lambda
frameworks like Oryx 2.0, Summingbird etc.. support that and Mahout-Spark
should be pluggable as the ML engine into those frameworks.

My 2ç.

On Fri, Dec 9, 2016 at 11:04 AM, Trevor Grant <tr...@gmail.com>
wrote:

> Wanted to kick off a discussion about if we're ready to start thinking
> about coming up with some bindings for streaming engines.
>
> Had a question at a meetup tuesday in seattle, and it got me thinking about
> it.
>
> In my mind they would be a discreet set of Bindings, Flink Streaming
> /Spark Streaming / (Beam?) / etc
>
> I have some fleeting thoughts, but didn't write any of the bindings and
> have only attempted to grok them in passing.
>
> Or maybe not something we want to pursue at this time? However, it is
> something I'd be interested in tinkering with on a branch of my own...
>
> tg
>
> Trevor Grant
> Data Scientist
> https://github.com/rawkintrevo
> http://stackexchange.com/users/3002022/rawkintrevo
> http://trevorgrant.org
>
> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
>

Re: [DISCUSS] Mahout Streaming Bindings

Posted by Suneel Marthi <sm...@apache.org>.
-100

It doesn't make sense adding support for all of the batch and streaming
engines that are available. We presently have a H2O binding which has never
seen any use and I have long been thinking of trashing out.

It would be more productive utilization of the very limited resource time
by porting the legacy Recommender algorithms to Samsara as opposed to
adding anymore of these streaming/batch frameworks.

I have had folks asking me if we support Akka Streams, Apex etc... but most
folks don't know as to how they are gonna be using it or what is it they
are trying to accomplish.

How many Recommender Systems are out there today that need Streaming?
Amazon doesn't do Realtime streaming recommendations yet.  Lambda
frameworks like Oryx 2.0, Summingbird etc.. support that and Mahout-Spark
should be pluggable as the ML engine into those frameworks.

My 2ç.

On Fri, Dec 9, 2016 at 11:04 AM, Trevor Grant <tr...@gmail.com>
wrote:

> Wanted to kick off a discussion about if we're ready to start thinking
> about coming up with some bindings for streaming engines.
>
> Had a question at a meetup tuesday in seattle, and it got me thinking about
> it.
>
> In my mind they would be a discreet set of Bindings, Flink Streaming
> /Spark Streaming / (Beam?) / etc
>
> I have some fleeting thoughts, but didn't write any of the bindings and
> have only attempted to grok them in passing.
>
> Or maybe not something we want to pursue at this time? However, it is
> something I'd be interested in tinkering with on a branch of my own...
>
> tg
>
> Trevor Grant
> Data Scientist
> https://github.com/rawkintrevo
> http://stackexchange.com/users/3002022/rawkintrevo
> http://trevorgrant.org
>
> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
>

Re: [DISCUSS] Mahout Streaming Bindings

Posted by Trevor Grant <tr...@gmail.com>.
I was thinking, if its a thing we want to pursue, let's maybe attempt some
initial stabs- perhaps start a branch, and get a feel for how scary of a
nightmare its going to be. Then we can start talking time tables.

In my mind it will either be
1- Crazy easy, just creating a distributed context for the streaming
context and everything else kind of falls into place
2- crazy hard- nearly reinventing the wheel
3- some mixture of the two.

The upshot with the imminent GPU acceleration: we'd be the first streaming
+ gpu which would be quite dope


Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Fri, Dec 9, 2016 at 10:52 AM, Andrew Palumbo <ap...@outlook.com> wrote:

> It's  that this is something that we've been talking about for a while...
> would you be thinking for before 1.0 or after 1.0?
>
>
> Sent from my Verizon Wireless 4G LTE smartphone
>
>
> -------- Original message --------
> From: Trevor Grant <tr...@gmail.com>
> Date: 12/09/2016 8:05 AM (GMT-08:00)
> To: dev@mahout.apache.org
> Subject: [DISCUSS] Mahout Streaming Bindings
>
> Wanted to kick off a discussion about if we're ready to start thinking
> about coming up with some bindings for streaming engines.
>
> Had a question at a meetup tuesday in seattle, and it got me thinking about
> it.
>
> In my mind they would be a discreet set of Bindings, Flink Streaming
> /Spark Streaming / (Beam?) / etc
>
> I have some fleeting thoughts, but didn't write any of the bindings and
> have only attempted to grok them in passing.
>
> Or maybe not something we want to pursue at this time? However, it is
> something I'd be interested in tinkering with on a branch of my own...
>
> tg
>
> Trevor Grant
> Data Scientist
> https://github.com/rawkintrevo
> http://stackexchange.com/users/3002022/rawkintrevo
> http://trevorgrant.org
>
> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
>

RE: [DISCUSS] Mahout Streaming Bindings

Posted by Andrew Palumbo <ap...@outlook.com>.
It's  that this is something that we've been talking about for a while... would you be thinking for before 1.0 or after 1.0?


Sent from my Verizon Wireless 4G LTE smartphone


-------- Original message --------
From: Trevor Grant <tr...@gmail.com>
Date: 12/09/2016 8:05 AM (GMT-08:00)
To: dev@mahout.apache.org
Subject: [DISCUSS] Mahout Streaming Bindings

Wanted to kick off a discussion about if we're ready to start thinking
about coming up with some bindings for streaming engines.

Had a question at a meetup tuesday in seattle, and it got me thinking about
it.

In my mind they would be a discreet set of Bindings, Flink Streaming
/Spark Streaming / (Beam?) / etc

I have some fleeting thoughts, but didn't write any of the bindings and
have only attempted to grok them in passing.

Or maybe not something we want to pursue at this time? However, it is
something I'd be interested in tinkering with on a branch of my own...

tg

Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*