You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hama.apache.org by "Edward J. Yoon" <ed...@apache.org> on 2012/05/24 00:23:15 UTC

Online machine learning on top of Hama BSP

Hi,

Does anyone interesting in online machine learning?

-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Online machine learning on top of Hama BSP

Posted by Sebastian Schelter <ss...@apache.org>.

Hi Thomas,

I think that none of us wants to start a flame war here.

As a disclaimer I have to remark that I'm biased towards Giraph as well
because besides my engagement at Mahout, I'm committer and PMC member of
Giraph.

Regarding commit statistics: a single commit can correct a comment or
rewrite a whole layer of an application, so looking at the raw number of
commits is useless.

In my personal opinion, Mahout will have to move away from
Hadoop/MapReduce for a lot of problems. The question which alternative
execution model to integrate is a hard one, as well as deciding when
this should happen. The answer to that question will determine the
future of Mahout, and a discussion about this should be unagitated.

I think the real question is whether BSP itself is the optimal execution
model (regardless of the flavor of implementation) or whether Mahout
should better wait for a viable implementation of an asynchronous
execution model similar to what is implemented in GraphLab.

--sebastian

On 26.05.2012 11:26, Thomas Jungblut wrote:
> Hi Ted,
> 
> please keep this factual, we are not here to start a flame war.
> But to correct you, if you take a closter look at the mailing list
> statistics [1]:
> hama-commits: 1.51 mails per day (AVG)
> Opposed to giraph:
> giraph-commits: 0.68 mails per day (AVG)
> So we have a more faster development than giraph.
> Also we work on top of HDFS, so you can combine mapreduce jobs with BSP
> jobs easily.
> We are just not running inside of MapReduce, these things will neglect
> anyways when YARN has a stable release.
> Currently Hama can operate on YARN with it's on ApplicationMaster whereas
> Giraph still needs to be on top of MapReduce.
> 
> Now to you Sebastian,
> 
>> Interesting discussion, which examples do you have in mind that might be
>> easier representable in general BSP than in Giraph/Pregel?
> 
> 
> straight forward translations from MPI for example. Someone of us is
> currently working on a SVM implementation in BSP, which originally was
> based on MPI.[2]
> We would love to have this contributed to mahout, but if Ted is not
> interested in Hama we will put this in our modules.
> Also there are graph problems that need major supervision like Top-K
> Shortest Paths, which cannot be easily expressed with aggregators.
> 
> We have benchmarks showing the scalability and maturity of Hama [3] and
> would be glad to roll out to several other Apache projects.
> BTW it would be cool if we could compare the performance of your k-means in
> MapReduce with that of our BSP version, you see the benchmark in [3] as
> well.
> 
> Actually that was not why were are here, we wanted to hear some general
> interest in real-time recommendation with Hama since all the ML guys are
> here. Even if Ted is a fanboy of giraph ;)
> 
> Regards from Berlin,
> Thomas
> 
> [1] http://pulse.apache.org/#incubator.apache.org
> [2] http://code.google.com/p/psvm/
> [3] http://wiki.apache.org/hama/Benchmarks
> 
> 
> 2012/5/26 Ted Dunning <te...@gmail.com>
> 
>> On Fri, May 25, 2012 at 11:41 PM, Edward J. Yoon <edwardyoon@apache.org
>>> wrote:
>>
>>>> Compared with Hama, what's the advantage of giraph? probably
>>>
>>> probably mature implementation? :D
>>>
>>
>> Yes.  And very active community.  And recent history of rapid development.
>>  And easy compatibility with map-reduce programs.
>>
> 
> 
>

Re: Online machine learning on top of Hama BSP

Posted by Ted Dunning <te...@gmail.com>.

These speeds are not far from what the new streaming k-means achieves
except that instead of 16 nodes it reaches those speeds (1 million points
in 20 seconds at 10 dimensions) on a single node.  This is with a trivially
parallel algorithm with no need for iteration.  Running this under Hadoop
would incur the normal startup costs (10-20 seconds with MapR), but
otherwise should run at the same speed adjusted for node count.

See https://github.com/tdunning/knn/tree/master/docs for more info on this
clustering algorithm.

On Sat, May 26, 2012 at 9:26 AM, Thomas Jungblut <
thomas.jungblut@googlemail.com> wrote:

> We have benchmarks showing the scalability and maturity of Hama [3] and
> would be glad to roll out to several other Apache projects.
> BTW it would be cool if we could compare the performance of your k-means in
> MapReduce with that of our BSP version, you see the benchmark in [3] as
> well.
>
> Actually that was not why were are here, we wanted to hear some general
> interest in real-time recommendation with Hama since all the ML guys are
> here. Even if Ted is a fanboy of giraph ;)
>
> Regards from Berlin,
> Thomas
>
> [1] http://pulse.apache.org/#incubator.apache.org
> [2] http://code.google.com/p/psvm/
> [3] http://wiki.apache.org/hama/Benchmarks
>

Re: Online machine learning on top of Hama BSP

Posted by Thomas Jungblut <th...@googlemail.com>.

Hi Ted,

please keep this factual, we are not here to start a flame war.
But to correct you, if you take a closter look at the mailing list
statistics [1]:
hama-commits: 1.51 mails per day (AVG)
Opposed to giraph:
giraph-commits: 0.68 mails per day (AVG)
So we have a more faster development than giraph.
Also we work on top of HDFS, so you can combine mapreduce jobs with BSP
jobs easily.
We are just not running inside of MapReduce, these things will neglect
anyways when YARN has a stable release.
Currently Hama can operate on YARN with it's on ApplicationMaster whereas
Giraph still needs to be on top of MapReduce.

Now to you Sebastian,

> Interesting discussion, which examples do you have in mind that might be
> easier representable in general BSP than in Giraph/Pregel?

straight forward translations from MPI for example. Someone of us is
currently working on a SVM implementation in BSP, which originally was
based on MPI.[2]
We would love to have this contributed to mahout, but if Ted is not
interested in Hama we will put this in our modules.
Also there are graph problems that need major supervision like Top-K
Shortest Paths, which cannot be easily expressed with aggregators.

We have benchmarks showing the scalability and maturity of Hama [3] and
would be glad to roll out to several other Apache projects.
BTW it would be cool if we could compare the performance of your k-means in
MapReduce with that of our BSP version, you see the benchmark in [3] as
well.

Actually that was not why were are here, we wanted to hear some general
interest in real-time recommendation with Hama since all the ML guys are
here. Even if Ted is a fanboy of giraph ;)

Regards from Berlin,
Thomas

[1] http://pulse.apache.org/#incubator.apache.org
[2] http://code.google.com/p/psvm/
[3] http://wiki.apache.org/hama/Benchmarks

2012/5/26 Ted Dunning <te...@gmail.com>

> On Fri, May 25, 2012 at 11:41 PM, Edward J. Yoon <edwardyoon@apache.org
> >wrote:
>
> > > Compared with Hama, what's the advantage of giraph? probably
> >
> > probably mature implementation? :D
> >
>
> Yes.  And very active community.  And recent history of rapid development.
>  And easy compatibility with map-reduce programs.
>

-- 
Thomas Jungblut
Berlin <th...@gmail.com>

Re: Online machine learning on top of Hama BSP

Posted by Thomas Jungblut <th...@googlemail.com>.

Hi Ted,

please keep this factual, we are not here to start a flame war.
But to correct you, if you take a closter look at the mailing list
statistics [1]:
hama-commits: 1.51 mails per day (AVG)
Opposed to giraph:
giraph-commits: 0.68 mails per day (AVG)
So we have a more faster development than giraph.
Also we work on top of HDFS, so you can combine mapreduce jobs with BSP
jobs easily.
We are just not running inside of MapReduce, these things will neglect
anyways when YARN has a stable release.
Currently Hama can operate on YARN with it's on ApplicationMaster whereas
Giraph still needs to be on top of MapReduce.

Now to you Sebastian,

> Interesting discussion, which examples do you have in mind that might be
> easier representable in general BSP than in Giraph/Pregel?

straight forward translations from MPI for example. Someone of us is
currently working on a SVM implementation in BSP, which originally was
based on MPI.[2]
We would love to have this contributed to mahout, but if Ted is not
interested in Hama we will put this in our modules.
Also there are graph problems that need major supervision like Top-K
Shortest Paths, which cannot be easily expressed with aggregators.

We have benchmarks showing the scalability and maturity of Hama [3] and
would be glad to roll out to several other Apache projects.
BTW it would be cool if we could compare the performance of your k-means in
MapReduce with that of our BSP version, you see the benchmark in [3] as
well.

Actually that was not why were are here, we wanted to hear some general
interest in real-time recommendation with Hama since all the ML guys are
here. Even if Ted is a fanboy of giraph ;)

Regards from Berlin,
Thomas

[1] http://pulse.apache.org/#incubator.apache.org
[2] http://code.google.com/p/psvm/
[3] http://wiki.apache.org/hama/Benchmarks

2012/5/26 Ted Dunning <te...@gmail.com>

> On Fri, May 25, 2012 at 11:41 PM, Edward J. Yoon <edwardyoon@apache.org
> >wrote:
>
> > > Compared with Hama, what's the advantage of giraph? probably
> >
> > probably mature implementation? :D
> >
>
> Yes.  And very active community.  And recent history of rapid development.
>  And easy compatibility with map-reduce programs.
>

-- 
Thomas Jungblut
Berlin <th...@gmail.com>

Re: Online machine learning on top of Hama BSP

Posted by Robin Anil <ro...@gmail.com>.

It doesn't at all come into mahouts goals in anyway. all I am saying is
such a library could reduce the risk of mahout moving to bsp or any other
platform. And it is something non-map-reduce devs should try to push if
they want ease of adoption.
On May 28, 2012 11:12 AM, "Sean Owen" <sr...@gmail.com> wrote:

> Personally -- note, personally -- I think that's a whole other project. I
> doubt Mahout will ever be anything but Hadoop-based, plus some sequential /
> pure Java bits. Or, put another way: that's way too much scope, to span a
> third (fourth?) computation model, in a project already sprawling.
>
> I think this is certainly could, should, just be another project. BSP-based
> or graph-based ML algorithms. No reason it can't be done by same or similar
> people or reuse code, etc. It's a good idea. I don't see a reason such a
> thing has to intersect with Mahout directly.
>
> Sean
>
> On Mon, May 28, 2012 at 5:08 PM, Robin Anil <ro...@gmail.com> wrote:
>
> > OK. So say mahout moves to using bsp. There are obviously risks you
> > mentioned.
> >
> > if possible we need to be abstracting out the underlying execution. So an
> > iterative algorithm should be written using a wrapper library that hides
> > giraph, bsp and map reduce. That's something I think will be attractive
> to
> > mahout community, because the risks would no longer be there. We would
> > implement any algorithm without betting on the future of any execution
> > model. And it will serve as a place where providers of each execution
> model
> > will strive to improve benchmarking against a common platform
> >
> > Is this something bsp dev would be willing to push?. Because the way I
> see
> > it things are stacked in favour of hadoop map reduce. And a common
> > execution library will help bsp push people to go away from map reduce
> > without the risk
> >
> > Robin
> > On May 28, 2012 6:41 AM, "Suraj Menon" <su...@apache.org> wrote:
> >
> > > First of all we would like to mention that the ugly side in this
> > > thread was totally not intended.
> > > From the options you gave, (c) would be a waste of time.
> > >
> > > The original intention of this thread was to politely check with
> > > Mahout community, if it would consider another programming model than
> > > Map-Reduce to implement machine learning algorithms. My previous mail
> > > was to check if there is any specific feature set (e.g.
> > > fault-tolerance, proven scalability, etc.) that is required before
> > > Mahout community would consider a new model.
> > >
> > > But, we do understand now that adoption of a new model could be based
> > > on popularity of the system among ML programmers which in turn builds
> > > a strong community for that project.
> > >
> > > Thanks,
> > > Suraj
> > >
> > > On Sun, May 27, 2012 at 12:11 PM, Robin Anil <ro...@gmail.com>
> > wrote:
> > > > I am confused, what is the actual ask from the Hama community to
> Mahout
> > > > community?
> > > >
> > > > Is that
> > > > a) Port Mahout algorithms to use BSP?
> > > > b) Rewrite Mahout algorithms to use BSP?
> > > > c) Argue that Hama is better than Giraph and vice versa?
> > > >
> > > > Because the response will depend on what the actual question is? This
> > > > thread seems to have lost the intended question.
> > > >
> > > >
> > > > ------
> > > > Robin Anil
> > > >
> > > >
> > > > On Sat, May 26, 2012 at 4:03 PM, Ted Dunning <te...@gmail.com>
> > > wrote:
> > > >
> > > >> The key thing to look for is implementation on a platform that is
> > widely
> > > >> accepted for practical data mining.
> > > >>
> > > >> We have only recently begun considering Pig as an implementation
> > > platform
> > > >> after deciding not to use it before.  What has changed is the fairly
> > > wide
> > > >> adoption of Pig.
> > > >>
> > > >> On Sat, May 26, 2012 at 11:22 AM, Suraj Menon <
> menonsuraj5@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > Steering back to relevance, it would be nice to know if there is
> an
> > > >> > expectation on features and benchmarks for any system to be
> > considered
> > > >> > as a platform to implement machine learning algorithms on Mahout.
> > > >> >
> > > >>
> > >
> >
>

Re: Online machine learning on top of Hama BSP

Posted by Sean Owen <sr...@gmail.com>.

Personally -- note, personally -- I think that's a whole other project. I
doubt Mahout will ever be anything but Hadoop-based, plus some sequential /
pure Java bits. Or, put another way: that's way too much scope, to span a
third (fourth?) computation model, in a project already sprawling.

I think this is certainly could, should, just be another project. BSP-based
or graph-based ML algorithms. No reason it can't be done by same or similar
people or reuse code, etc. It's a good idea. I don't see a reason such a
thing has to intersect with Mahout directly.

Sean

On Mon, May 28, 2012 at 5:08 PM, Robin Anil <ro...@gmail.com> wrote:

> OK. So say mahout moves to using bsp. There are obviously risks you
> mentioned.
>
> if possible we need to be abstracting out the underlying execution. So an
> iterative algorithm should be written using a wrapper library that hides
> giraph, bsp and map reduce. That's something I think will be attractive to
> mahout community, because the risks would no longer be there. We would
> implement any algorithm without betting on the future of any execution
> model. And it will serve as a place where providers of each execution model
> will strive to improve benchmarking against a common platform
>
> Is this something bsp dev would be willing to push?. Because the way I see
> it things are stacked in favour of hadoop map reduce. And a common
> execution library will help bsp push people to go away from map reduce
> without the risk
>
> Robin
> On May 28, 2012 6:41 AM, "Suraj Menon" <su...@apache.org> wrote:
>
> > First of all we would like to mention that the ugly side in this
> > thread was totally not intended.
> > From the options you gave, (c) would be a waste of time.
> >
> > The original intention of this thread was to politely check with
> > Mahout community, if it would consider another programming model than
> > Map-Reduce to implement machine learning algorithms. My previous mail
> > was to check if there is any specific feature set (e.g.
> > fault-tolerance, proven scalability, etc.) that is required before
> > Mahout community would consider a new model.
> >
> > But, we do understand now that adoption of a new model could be based
> > on popularity of the system among ML programmers which in turn builds
> > a strong community for that project.
> >
> > Thanks,
> > Suraj
> >
> > On Sun, May 27, 2012 at 12:11 PM, Robin Anil <ro...@gmail.com>
> wrote:
> > > I am confused, what is the actual ask from the Hama community to Mahout
> > > community?
> > >
> > > Is that
> > > a) Port Mahout algorithms to use BSP?
> > > b) Rewrite Mahout algorithms to use BSP?
> > > c) Argue that Hama is better than Giraph and vice versa?
> > >
> > > Because the response will depend on what the actual question is? This
> > > thread seems to have lost the intended question.
> > >
> > >
> > > ------
> > > Robin Anil
> > >
> > >
> > > On Sat, May 26, 2012 at 4:03 PM, Ted Dunning <te...@gmail.com>
> > wrote:
> > >
> > >> The key thing to look for is implementation on a platform that is
> widely
> > >> accepted for practical data mining.
> > >>
> > >> We have only recently begun considering Pig as an implementation
> > platform
> > >> after deciding not to use it before.  What has changed is the fairly
> > wide
> > >> adoption of Pig.
> > >>
> > >> On Sat, May 26, 2012 at 11:22 AM, Suraj Menon <me...@gmail.com>
> > >> wrote:
> > >>
> > >> > Steering back to relevance, it would be nice to know if there is an
> > >> > expectation on features and benchmarks for any system to be
> considered
> > >> > as a platform to implement machine learning algorithms on Mahout.
> > >> >
> > >>
> >
>

Re: Online machine learning on top of Hama BSP

Posted by Robin Anil <ro...@gmail.com>.

OK. So say mahout moves to using bsp. There are obviously risks you
mentioned.

if possible we need to be abstracting out the underlying execution. So an
iterative algorithm should be written using a wrapper library that hides
giraph, bsp and map reduce. That's something I think will be attractive to
mahout community, because the risks would no longer be there. We would
implement any algorithm without betting on the future of any execution
model. And it will serve as a place where providers of each execution model
will strive to improve benchmarking against a common platform

Is this something bsp dev would be willing to push?. Because the way I see
it things are stacked in favour of hadoop map reduce. And a common
execution library will help bsp push people to go away from map reduce
without the risk

Robin
On May 28, 2012 6:41 AM, "Suraj Menon" <su...@apache.org> wrote:

> First of all we would like to mention that the ugly side in this
> thread was totally not intended.
> From the options you gave, (c) would be a waste of time.
>
> The original intention of this thread was to politely check with
> Mahout community, if it would consider another programming model than
> Map-Reduce to implement machine learning algorithms. My previous mail
> was to check if there is any specific feature set (e.g.
> fault-tolerance, proven scalability, etc.) that is required before
> Mahout community would consider a new model.
>
> But, we do understand now that adoption of a new model could be based
> on popularity of the system among ML programmers which in turn builds
> a strong community for that project.
>
> Thanks,
> Suraj
>
> On Sun, May 27, 2012 at 12:11 PM, Robin Anil <ro...@gmail.com> wrote:
> > I am confused, what is the actual ask from the Hama community to Mahout
> > community?
> >
> > Is that
> > a) Port Mahout algorithms to use BSP?
> > b) Rewrite Mahout algorithms to use BSP?
> > c) Argue that Hama is better than Giraph and vice versa?
> >
> > Because the response will depend on what the actual question is? This
> > thread seems to have lost the intended question.
> >
> >
> > ------
> > Robin Anil
> >
> >
> > On Sat, May 26, 2012 at 4:03 PM, Ted Dunning <te...@gmail.com>
> wrote:
> >
> >> The key thing to look for is implementation on a platform that is widely
> >> accepted for practical data mining.
> >>
> >> We have only recently begun considering Pig as an implementation
> platform
> >> after deciding not to use it before.  What has changed is the fairly
> wide
> >> adoption of Pig.
> >>
> >> On Sat, May 26, 2012 at 11:22 AM, Suraj Menon <me...@gmail.com>
> >> wrote:
> >>
> >> > Steering back to relevance, it would be nice to know if there is an
> >> > expectation on features and benchmarks for any system to be considered
> >> > as a platform to implement machine learning algorithms on Mahout.
> >> >
> >>
>

Re: Online machine learning on top of Hama BSP

Posted by Suraj Menon <su...@apache.org>.

First of all we would like to mention that the ugly side in this
thread was totally not intended.
>From the options you gave, (c) would be a waste of time.

The original intention of this thread was to politely check with
Mahout community, if it would consider another programming model than
Map-Reduce to implement machine learning algorithms. My previous mail
was to check if there is any specific feature set (e.g.
fault-tolerance, proven scalability, etc.) that is required before
Mahout community would consider a new model.

But, we do understand now that adoption of a new model could be based
on popularity of the system among ML programmers which in turn builds
a strong community for that project.

Thanks,
Suraj

On Sun, May 27, 2012 at 12:11 PM, Robin Anil <ro...@gmail.com> wrote:
> I am confused, what is the actual ask from the Hama community to Mahout
> community?
>
> Is that
> a) Port Mahout algorithms to use BSP?
> b) Rewrite Mahout algorithms to use BSP?
> c) Argue that Hama is better than Giraph and vice versa?
>
> Because the response will depend on what the actual question is? This
> thread seems to have lost the intended question.
>
>
> ------
> Robin Anil
>
>
> On Sat, May 26, 2012 at 4:03 PM, Ted Dunning <te...@gmail.com> wrote:
>
>> The key thing to look for is implementation on a platform that is widely
>> accepted for practical data mining.
>>
>> We have only recently begun considering Pig as an implementation platform
>> after deciding not to use it before.  What has changed is the fairly wide
>> adoption of Pig.
>>
>> On Sat, May 26, 2012 at 11:22 AM, Suraj Menon <me...@gmail.com>
>> wrote:
>>
>> > Steering back to relevance, it would be nice to know if there is an
>> > expectation on features and benchmarks for any system to be considered
>> > as a platform to implement machine learning algorithms on Mahout.
>> >
>>

Re: Online machine learning on top of Hama BSP

Posted by Robin Anil <ro...@gmail.com>.

I am confused, what is the actual ask from the Hama community to Mahout
community?

Is that
a) Port Mahout algorithms to use BSP?
b) Rewrite Mahout algorithms to use BSP?
c) Argue that Hama is better than Giraph and vice versa?

Because the response will depend on what the actual question is? This
thread seems to have lost the intended question.

------
Robin Anil

On Sat, May 26, 2012 at 4:03 PM, Ted Dunning <te...@gmail.com> wrote:

> The key thing to look for is implementation on a platform that is widely
> accepted for practical data mining.
>
> We have only recently begun considering Pig as an implementation platform
> after deciding not to use it before.  What has changed is the fairly wide
> adoption of Pig.
>
> On Sat, May 26, 2012 at 11:22 AM, Suraj Menon <me...@gmail.com>
> wrote:
>
> > Steering back to relevance, it would be nice to know if there is an
> > expectation on features and benchmarks for any system to be considered
> > as a platform to implement machine learning algorithms on Mahout.
> >
>

Re: Online machine learning on top of Hama BSP

Posted by Ted Dunning <te...@gmail.com>.

The key thing to look for is implementation on a platform that is widely
accepted for practical data mining.

We have only recently begun considering Pig as an implementation platform
after deciding not to use it before.  What has changed is the fairly wide
adoption of Pig.

On Sat, May 26, 2012 at 11:22 AM, Suraj Menon <me...@gmail.com> wrote:

> Steering back to relevance, it would be nice to know if there is an
> expectation on features and benchmarks for any system to be considered
> as a platform to implement machine learning algorithms on Mahout.
>

Re: Online machine learning on top of Hama BSP

Posted by Suraj Menon <me...@gmail.com>.

Steering back to relevance, it would be nice to know if there is an
expectation on features and benchmarks for any system to be considered
as a platform to implement machine learning algorithms on Mahout. This
would be a good input for Hama community. Compared to
Hadoop/MapReduce, Hama is young and evidently disruptive eventhough it
is and intends to be compatible with Hadoop as much as possible. But
if you have any inputs on aforesaid matters, it will be a good
direction for our community to test Hama.

Thanks,
Suraj

On Sat, May 26, 2012 at 5:58 AM, Edward J. Yoon <ed...@apache.org> wrote:
> Pls stop the matardor Ted.
>
> 나의 iPhone에서 보냄
>
> 2012. 5. 26. 오후 4:54 Ted Dunning <te...@gmail.com> 작성:
>
>> On Fri, May 25, 2012 at 11:41 PM, Edward J. Yoon <ed...@apache.org>wrote:
>>
>>>> Compared with Hama, what's the advantage of giraph? probably
>>>
>>> probably mature implementation? :D
>>>
>>
>> Yes.  And very active community.  And recent history of rapid development.
>> And easy compatibility with map-reduce programs.

Re: Online machine learning on top of Hama BSP

Posted by "Edward J. Yoon" <ed...@apache.org>.

Pls stop the matardor Ted.

나의 iPhone에서 보냄

2012. 5. 26. 오후 4:54 Ted Dunning <te...@gmail.com> 작성:

> On Fri, May 25, 2012 at 11:41 PM, Edward J. Yoon <ed...@apache.org>wrote:
> 
>>> Compared with Hama, what's the advantage of giraph? probably
>> 
>> probably mature implementation? :D
>> 
> 
> Yes.  And very active community.  And recent history of rapid development.
> And easy compatibility with map-reduce programs.

Re: Online machine learning on top of Hama BSP

Posted by Ted Dunning <te...@gmail.com>.

On Fri, May 25, 2012 at 11:41 PM, Edward J. Yoon <ed...@apache.org>wrote:

> > Compared with Hama, what's the advantage of giraph? probably
>
> probably mature implementation? :D
>

Yes.  And very active community.  And recent history of rapid development.
 And easy compatibility with map-reduce programs.

Re: Online machine learning on top of Hama BSP

Posted by "Edward J. Yoon" <ed...@apache.org>.

> Compared with Hama, what's the advantage of giraph? probably

probably mature implementation? :D

Anyway, what I said was not a discussion of your preferences.

On Sat, May 26, 2012 at 8:31 AM, Edward J. Yoon <ed...@apache.org> wrote:
> Seba,
>
> Hama has Pregel layer. If you love Pregel, you can use it instead of
> basic BSP model.
>
> Ted,
>
> Compared with Hama, what's the advantage of giraph? probably
>
> On Sat, May 26, 2012 at 4:24 AM, Sebastian Schelter <ss...@apache.org> wrote:
>> Hi Thomas,
>>
>> Interesting discussion, which examples do you have in mind that might be
>> easier representable in general BSP than in Giraph/Pregel?
>>
>> To add my 2-cent: I think the real question whether BSP itself is the
>> best model for distributed machine learning or an asychronous model as
>> implemented in GraphLab should be preferred. But that's more a
>> scientific/esoteric question :)
>>
>> --sebastian
>>
>> On 25.05.2012 19:24, Thomas Jungblut wrote:
>>> Hi Ted,
>>>
>>> Giraph offers a graph layer that uses internally BSP on top of MapReduce.
>>> You don't have access to the BSP primitives, therefore you need to treat
>>> every machine learning problem as graph problem which maybe very
>>> inconvenient in many cases.
>>>
>>> 2012/5/25 Ted Dunning <te...@gmail.com>
>>>
>>>> Apache Giraph probably offers a more mature BSP model of computation.  My
>>>> guess is that it would make a stronger implementation substrate.  It
>>>> certainly has a very strong community.
>>>>
>>>> On Fri, May 25, 2012 at 10:44 AM, Thomas Jungblut <
>>>> thomas.jungblut@googlemail.com> wrote:
>>>>
>>>>> Hi Manuel,
>>>>>
>>>>> 300k is small, I have one with 6 mio clicks.
>>>>> However it is more a question of interest and what algorithms could be
>>>>> suitable for BSP.
>>>>> In case you wonder what BSP is, it stands for bulk synchronous parallel
>>>>> [1].
>>>>> We think that realtime and strongly iterative algorithms that are slow in
>>>>> mapreduce could be more efficiently solved with BSP.
>>>>> If you're interested, let us know.
>>>>>
>>>>> Regards,
>>>>> Thomas
>>>>>
>>>>> [1] http://en.wikipedia.org/wiki/Bulk_synchronous_parallel
>>>>>
>>>>> 2012/5/25 Manuel Blechschmidt <Ma...@gmx.de>
>>>>>
>>>>>> Hi Edward,
>>>>>> do you already have a test dataset?
>>>>>>
>>>>>> I might get one with about 300.000 clicks for you.
>>>>>>
>>>>>> It is from www.nelou.com and we are already running a recommender in
>>>>>> preview mode:
>>>>>>
>>>>>
>>>> http://www.nelou.com/artikel-803746/Overall-von-mysuro#__apaxoPreviewMode
>>>>>>
>>>>>> It could be the case that you would have to sign an NDA. Would this be
>>>>>> possible for you?
>>>>>>
>>>>>> /Manuel
>>>>>>
>>>>>> On 25.05.2012, at 10:34, Edward J. Yoon wrote:
>>>>>>
>>>>>>> OKay, I'm FWD this to mahout dev.
>>>>>>>
>>>>>>> I'm planning to create a project related to On-line machine learning,
>>>>>>> as a Apache Hama sub-module. Since the graph of message queues and
>>>>>>> workers could be implemented using BSP (see also [1]). The first idea
>>>>>>> is On-line recommendation system based on click-stream data.
>>>>>>>
>>>>>>> If you have interested in this plan, let's talk together here.
>>>>>>>
>>>>>>> 1.
>>>>>>
>>>>>
>>>> http://codingwiththomas.blogspot.com/2011/10/apache-hama-realtime-processing.html
>>>>>>>
>>>>>>> ---------- Forwarded message ----------
>>>>>>> From: Thomas Jungblut <th...@googlemail.com>
>>>>>>> Date: Fri, May 25, 2012 at 4:55 PM
>>>>>>> Subject: Re: Online machine learning on top of Hama BSP
>>>>>>> To: dev@hama.apache.org
>>>>>>>
>>>>>>>
>>>>>>> Should we cooperate with the Mahout guys on this? I'm pretty sure
>>>> they
>>>>>>> would have fun with it.
>>>>>>> Edward, do you want to ask them?
>>>>>>>
>>>>>>> 2012/5/25 Tommaso Teofili <to...@gmail.com>
>>>>>>>
>>>>>>>> Do you have a plan for that Edward?
>>>>>>>> A separate package in examples or a separate (online) machine
>>>> learning
>>>>>>>> module? Or something else?
>>>>>>>> Regards
>>>>>>>> Tommaso
>>>>>>>>
>>>>>>>> 2012/5/25 Edward J. Yoon <ed...@apache.org>
>>>>>>>>
>>>>>>>>> OKay, then let's get started.
>>>>>>>>>
>>>>>>>>> My first idea is simple online recommendation system based on
>>>>>>>> click-stream
>>>>>>>>> data.
>>>>>>>>>
>>>>>>>>> On Thu, May 24, 2012 at 6:26 PM, Praveen Sripati
>>>>>>>>> <pr...@gmail.com> wrote:
>>>>>>>>>> +1
>>>>>>>>>>
>>>>>>>>>> For those who are interested in ML, please check this. GNU Octave
>>>> is
>>>>>>>>> used.
>>>>>>>>>>
>>>>>>>>>> https://www.coursera.org/course/ml
>>>>>>>>>>
>>>>>>>>>> Another session is yet to be announced.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Praveen
>>>>>>>>>>
>>>>>>>>>> On Thu, May 24, 2012 at 12:54 PM, Thomas Jungblut <
>>>>>>>>>> thomas.jungblut@googlemail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> +1
>>>>>>>>>>>
>>>>>>>>>>> 2012/5/24 Tommaso Teofili <to...@gmail.com>
>>>>>>>>>>>
>>>>>>>>>>>> and same here :)
>>>>>>>>>>>>
>>>>>>>>>>>> 2012/5/24 Vaijanath Rao <va...@gmail.com>
>>>>>>>>>>>>
>>>>>>>>>>>>> +1 me too
>>>>>>>>>>>>> On May 23, 2012 10:26 PM, "Aditya Sarawgi" <
>>>>>>>>> sarawgi.aditya@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>> I would be happy to help :)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, May 23, 2012 at 6:23 PM, Edward J. Yoon <
>>>>>>>>>>> edwardyoon@apache.org
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Does anyone interesting in online machine learning?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Best Regards, Edward J. Yoon
>>>>>>>>>>>>>>> @eddieyoon
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>> Aditya Sarawgi
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Thomas Jungblut
>>>>>>>>>>> Berlin <th...@gmail.com>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Best Regards, Edward J. Yoon
>>>>>>>>> @eddieyoon
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Thomas Jungblut
>>>>>>> Berlin <th...@gmail.com>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Regards, Edward J. Yoon
>>>>>>> @eddieyoon
>>>>>>
>>>>>> --
>>>>>> Manuel Blechschmidt
>>>>>> Dortustr. 57
>>>>>> 14467 Potsdam
>>>>>> Mobil: 0173/6322621
>>>>>> Twitter: http://twitter.com/Manuel_B
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Thomas Jungblut
>>>>> Berlin <th...@gmail.com>
>>>>>
>>>>
>>>
>>>
>>>
>>
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Online machine learning on top of Hama BSP

Posted by "Edward J. Yoon" <ed...@apache.org>.

Seba,

Hama has Pregel layer. If you love Pregel, you can use it instead of
basic BSP model.

Ted,

Compared with Hama, what's the advantage of giraph? probably

On Sat, May 26, 2012 at 4:24 AM, Sebastian Schelter <ss...@apache.org> wrote:
> Hi Thomas,
>
> Interesting discussion, which examples do you have in mind that might be
> easier representable in general BSP than in Giraph/Pregel?
>
> To add my 2-cent: I think the real question whether BSP itself is the
> best model for distributed machine learning or an asychronous model as
> implemented in GraphLab should be preferred. But that's more a
> scientific/esoteric question :)
>
> --sebastian
>
> On 25.05.2012 19:24, Thomas Jungblut wrote:
>> Hi Ted,
>>
>> Giraph offers a graph layer that uses internally BSP on top of MapReduce.
>> You don't have access to the BSP primitives, therefore you need to treat
>> every machine learning problem as graph problem which maybe very
>> inconvenient in many cases.
>>
>> 2012/5/25 Ted Dunning <te...@gmail.com>
>>
>>> Apache Giraph probably offers a more mature BSP model of computation.  My
>>> guess is that it would make a stronger implementation substrate.  It
>>> certainly has a very strong community.
>>>
>>> On Fri, May 25, 2012 at 10:44 AM, Thomas Jungblut <
>>> thomas.jungblut@googlemail.com> wrote:
>>>
>>>> Hi Manuel,
>>>>
>>>> 300k is small, I have one with 6 mio clicks.
>>>> However it is more a question of interest and what algorithms could be
>>>> suitable for BSP.
>>>> In case you wonder what BSP is, it stands for bulk synchronous parallel
>>>> [1].
>>>> We think that realtime and strongly iterative algorithms that are slow in
>>>> mapreduce could be more efficiently solved with BSP.
>>>> If you're interested, let us know.
>>>>
>>>> Regards,
>>>> Thomas
>>>>
>>>> [1] http://en.wikipedia.org/wiki/Bulk_synchronous_parallel
>>>>
>>>> 2012/5/25 Manuel Blechschmidt <Ma...@gmx.de>
>>>>
>>>>> Hi Edward,
>>>>> do you already have a test dataset?
>>>>>
>>>>> I might get one with about 300.000 clicks for you.
>>>>>
>>>>> It is from www.nelou.com and we are already running a recommender in
>>>>> preview mode:
>>>>>
>>>>
>>> http://www.nelou.com/artikel-803746/Overall-von-mysuro#__apaxoPreviewMode
>>>>>
>>>>> It could be the case that you would have to sign an NDA. Would this be
>>>>> possible for you?
>>>>>
>>>>> /Manuel
>>>>>
>>>>> On 25.05.2012, at 10:34, Edward J. Yoon wrote:
>>>>>
>>>>>> OKay, I'm FWD this to mahout dev.
>>>>>>
>>>>>> I'm planning to create a project related to On-line machine learning,
>>>>>> as a Apache Hama sub-module. Since the graph of message queues and
>>>>>> workers could be implemented using BSP (see also [1]). The first idea
>>>>>> is On-line recommendation system based on click-stream data.
>>>>>>
>>>>>> If you have interested in this plan, let's talk together here.
>>>>>>
>>>>>> 1.
>>>>>
>>>>
>>> http://codingwiththomas.blogspot.com/2011/10/apache-hama-realtime-processing.html
>>>>>>
>>>>>> ---------- Forwarded message ----------
>>>>>> From: Thomas Jungblut <th...@googlemail.com>
>>>>>> Date: Fri, May 25, 2012 at 4:55 PM
>>>>>> Subject: Re: Online machine learning on top of Hama BSP
>>>>>> To: dev@hama.apache.org
>>>>>>
>>>>>>
>>>>>> Should we cooperate with the Mahout guys on this? I'm pretty sure
>>> they
>>>>>> would have fun with it.
>>>>>> Edward, do you want to ask them?
>>>>>>
>>>>>> 2012/5/25 Tommaso Teofili <to...@gmail.com>
>>>>>>
>>>>>>> Do you have a plan for that Edward?
>>>>>>> A separate package in examples or a separate (online) machine
>>> learning
>>>>>>> module? Or something else?
>>>>>>> Regards
>>>>>>> Tommaso
>>>>>>>
>>>>>>> 2012/5/25 Edward J. Yoon <ed...@apache.org>
>>>>>>>
>>>>>>>> OKay, then let's get started.
>>>>>>>>
>>>>>>>> My first idea is simple online recommendation system based on
>>>>>>> click-stream
>>>>>>>> data.
>>>>>>>>
>>>>>>>> On Thu, May 24, 2012 at 6:26 PM, Praveen Sripati
>>>>>>>> <pr...@gmail.com> wrote:
>>>>>>>>> +1
>>>>>>>>>
>>>>>>>>> For those who are interested in ML, please check this. GNU Octave
>>> is
>>>>>>>> used.
>>>>>>>>>
>>>>>>>>> https://www.coursera.org/course/ml
>>>>>>>>>
>>>>>>>>> Another session is yet to be announced.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Praveen
>>>>>>>>>
>>>>>>>>> On Thu, May 24, 2012 at 12:54 PM, Thomas Jungblut <
>>>>>>>>> thomas.jungblut@googlemail.com> wrote:
>>>>>>>>>
>>>>>>>>>> +1
>>>>>>>>>>
>>>>>>>>>> 2012/5/24 Tommaso Teofili <to...@gmail.com>
>>>>>>>>>>
>>>>>>>>>>> and same here :)
>>>>>>>>>>>
>>>>>>>>>>> 2012/5/24 Vaijanath Rao <va...@gmail.com>
>>>>>>>>>>>
>>>>>>>>>>>> +1 me too
>>>>>>>>>>>> On May 23, 2012 10:26 PM, "Aditya Sarawgi" <
>>>>>>>> sarawgi.aditya@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> +1
>>>>>>>>>>>>> I would be happy to help :)
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, May 23, 2012 at 6:23 PM, Edward J. Yoon <
>>>>>>>>>> edwardyoon@apache.org
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Does anyone interesting in online machine learning?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Best Regards, Edward J. Yoon
>>>>>>>>>>>>>> @eddieyoon
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>> Aditya Sarawgi
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Thomas Jungblut
>>>>>>>>>> Berlin <th...@gmail.com>
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Regards, Edward J. Yoon
>>>>>>>> @eddieyoon
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Thomas Jungblut
>>>>>> Berlin <th...@gmail.com>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Regards, Edward J. Yoon
>>>>>> @eddieyoon
>>>>>
>>>>> --
>>>>> Manuel Blechschmidt
>>>>> Dortustr. 57
>>>>> 14467 Potsdam
>>>>> Mobil: 0173/6322621
>>>>> Twitter: http://twitter.com/Manuel_B
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Thomas Jungblut
>>>> Berlin <th...@gmail.com>
>>>>
>>>
>>
>>
>>
>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Online machine learning on top of Hama BSP

Posted by Sebastian Schelter <ss...@apache.org>.

Hi Thomas,

Interesting discussion, which examples do you have in mind that might be
easier representable in general BSP than in Giraph/Pregel?

To add my 2-cent: I think the real question whether BSP itself is the
best model for distributed machine learning or an asychronous model as
implemented in GraphLab should be preferred. But that's more a
scientific/esoteric question :)

--sebastian

On 25.05.2012 19:24, Thomas Jungblut wrote:
> Hi Ted,
> 
> Giraph offers a graph layer that uses internally BSP on top of MapReduce.
> You don't have access to the BSP primitives, therefore you need to treat
> every machine learning problem as graph problem which maybe very
> inconvenient in many cases.
> 
> 2012/5/25 Ted Dunning <te...@gmail.com>
> 
>> Apache Giraph probably offers a more mature BSP model of computation.  My
>> guess is that it would make a stronger implementation substrate.  It
>> certainly has a very strong community.
>>
>> On Fri, May 25, 2012 at 10:44 AM, Thomas Jungblut <
>> thomas.jungblut@googlemail.com> wrote:
>>
>>> Hi Manuel,
>>>
>>> 300k is small, I have one with 6 mio clicks.
>>> However it is more a question of interest and what algorithms could be
>>> suitable for BSP.
>>> In case you wonder what BSP is, it stands for bulk synchronous parallel
>>> [1].
>>> We think that realtime and strongly iterative algorithms that are slow in
>>> mapreduce could be more efficiently solved with BSP.
>>> If you're interested, let us know.
>>>
>>> Regards,
>>> Thomas
>>>
>>> [1] http://en.wikipedia.org/wiki/Bulk_synchronous_parallel
>>>
>>> 2012/5/25 Manuel Blechschmidt <Ma...@gmx.de>
>>>
>>>> Hi Edward,
>>>> do you already have a test dataset?
>>>>
>>>> I might get one with about 300.000 clicks for you.
>>>>
>>>> It is from www.nelou.com and we are already running a recommender in
>>>> preview mode:
>>>>
>>>
>> http://www.nelou.com/artikel-803746/Overall-von-mysuro#__apaxoPreviewMode
>>>>
>>>> It could be the case that you would have to sign an NDA. Would this be
>>>> possible for you?
>>>>
>>>> /Manuel
>>>>
>>>> On 25.05.2012, at 10:34, Edward J. Yoon wrote:
>>>>
>>>>> OKay, I'm FWD this to mahout dev.
>>>>>
>>>>> I'm planning to create a project related to On-line machine learning,
>>>>> as a Apache Hama sub-module. Since the graph of message queues and
>>>>> workers could be implemented using BSP (see also [1]). The first idea
>>>>> is On-line recommendation system based on click-stream data.
>>>>>
>>>>> If you have interested in this plan, let's talk together here.
>>>>>
>>>>> 1.
>>>>
>>>
>> http://codingwiththomas.blogspot.com/2011/10/apache-hama-realtime-processing.html
>>>>>
>>>>> ---------- Forwarded message ----------
>>>>> From: Thomas Jungblut <th...@googlemail.com>
>>>>> Date: Fri, May 25, 2012 at 4:55 PM
>>>>> Subject: Re: Online machine learning on top of Hama BSP
>>>>> To: dev@hama.apache.org
>>>>>
>>>>>
>>>>> Should we cooperate with the Mahout guys on this? I'm pretty sure
>> they
>>>>> would have fun with it.
>>>>> Edward, do you want to ask them?
>>>>>
>>>>> 2012/5/25 Tommaso Teofili <to...@gmail.com>
>>>>>
>>>>>> Do you have a plan for that Edward?
>>>>>> A separate package in examples or a separate (online) machine
>> learning
>>>>>> module? Or something else?
>>>>>> Regards
>>>>>> Tommaso
>>>>>>
>>>>>> 2012/5/25 Edward J. Yoon <ed...@apache.org>
>>>>>>
>>>>>>> OKay, then let's get started.
>>>>>>>
>>>>>>> My first idea is simple online recommendation system based on
>>>>>> click-stream
>>>>>>> data.
>>>>>>>
>>>>>>> On Thu, May 24, 2012 at 6:26 PM, Praveen Sripati
>>>>>>> <pr...@gmail.com> wrote:
>>>>>>>> +1
>>>>>>>>
>>>>>>>> For those who are interested in ML, please check this. GNU Octave
>> is
>>>>>>> used.
>>>>>>>>
>>>>>>>> https://www.coursera.org/course/ml
>>>>>>>>
>>>>>>>> Another session is yet to be announced.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Praveen
>>>>>>>>
>>>>>>>> On Thu, May 24, 2012 at 12:54 PM, Thomas Jungblut <
>>>>>>>> thomas.jungblut@googlemail.com> wrote:
>>>>>>>>
>>>>>>>>> +1
>>>>>>>>>
>>>>>>>>> 2012/5/24 Tommaso Teofili <to...@gmail.com>
>>>>>>>>>
>>>>>>>>>> and same here :)
>>>>>>>>>>
>>>>>>>>>> 2012/5/24 Vaijanath Rao <va...@gmail.com>
>>>>>>>>>>
>>>>>>>>>>> +1 me too
>>>>>>>>>>> On May 23, 2012 10:26 PM, "Aditya Sarawgi" <
>>>>>>> sarawgi.aditya@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> +1
>>>>>>>>>>>> I would be happy to help :)
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, May 23, 2012 at 6:23 PM, Edward J. Yoon <
>>>>>>>>> edwardyoon@apache.org
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Does anyone interesting in online machine learning?
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Best Regards, Edward J. Yoon
>>>>>>>>>>>>> @eddieyoon
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> Aditya Sarawgi
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Thomas Jungblut
>>>>>>>>> Berlin <th...@gmail.com>
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Regards, Edward J. Yoon
>>>>>>> @eddieyoon
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Thomas Jungblut
>>>>> Berlin <th...@gmail.com>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards, Edward J. Yoon
>>>>> @eddieyoon
>>>>
>>>> --
>>>> Manuel Blechschmidt
>>>> Dortustr. 57
>>>> 14467 Potsdam
>>>> Mobil: 0173/6322621
>>>> Twitter: http://twitter.com/Manuel_B
>>>>
>>>>
>>>
>>>
>>> --
>>> Thomas Jungblut
>>> Berlin <th...@gmail.com>
>>>
>>
> 
> 
>

Re: Online machine learning on top of Hama BSP

Posted by Thomas Jungblut <th...@googlemail.com>.

Hi Ted,

Giraph offers a graph layer that uses internally BSP on top of MapReduce.
You don't have access to the BSP primitives, therefore you need to treat
every machine learning problem as graph problem which maybe very
inconvenient in many cases.

2012/5/25 Ted Dunning <te...@gmail.com>

> Apache Giraph probably offers a more mature BSP model of computation.  My
> guess is that it would make a stronger implementation substrate.  It
> certainly has a very strong community.
>
> On Fri, May 25, 2012 at 10:44 AM, Thomas Jungblut <
> thomas.jungblut@googlemail.com> wrote:
>
> > Hi Manuel,
> >
> > 300k is small, I have one with 6 mio clicks.
> > However it is more a question of interest and what algorithms could be
> > suitable for BSP.
> > In case you wonder what BSP is, it stands for bulk synchronous parallel
> > [1].
> > We think that realtime and strongly iterative algorithms that are slow in
> > mapreduce could be more efficiently solved with BSP.
> > If you're interested, let us know.
> >
> > Regards,
> > Thomas
> >
> > [1] http://en.wikipedia.org/wiki/Bulk_synchronous_parallel
> >
> > 2012/5/25 Manuel Blechschmidt <Ma...@gmx.de>
> >
> > > Hi Edward,
> > > do you already have a test dataset?
> > >
> > > I might get one with about 300.000 clicks for you.
> > >
> > > It is from www.nelou.com and we are already running a recommender in
> > > preview mode:
> > >
> >
> http://www.nelou.com/artikel-803746/Overall-von-mysuro#__apaxoPreviewMode
> > >
> > > It could be the case that you would have to sign an NDA. Would this be
> > > possible for you?
> > >
> > > /Manuel
> > >
> > > On 25.05.2012, at 10:34, Edward J. Yoon wrote:
> > >
> > > > OKay, I'm FWD this to mahout dev.
> > > >
> > > > I'm planning to create a project related to On-line machine learning,
> > > > as a Apache Hama sub-module. Since the graph of message queues and
> > > > workers could be implemented using BSP (see also [1]). The first idea
> > > > is On-line recommendation system based on click-stream data.
> > > >
> > > > If you have interested in this plan, let's talk together here.
> > > >
> > > > 1.
> > >
> >
> http://codingwiththomas.blogspot.com/2011/10/apache-hama-realtime-processing.html
> > > >
> > > > ---------- Forwarded message ----------
> > > > From: Thomas Jungblut <th...@googlemail.com>
> > > > Date: Fri, May 25, 2012 at 4:55 PM
> > > > Subject: Re: Online machine learning on top of Hama BSP
> > > > To: dev@hama.apache.org
> > > >
> > > >
> > > > Should we cooperate with the Mahout guys on this? I'm pretty sure
> they
> > > > would have fun with it.
> > > > Edward, do you want to ask them?
> > > >
> > > > 2012/5/25 Tommaso Teofili <to...@gmail.com>
> > > >
> > > >> Do you have a plan for that Edward?
> > > >> A separate package in examples or a separate (online) machine
> learning
> > > >> module? Or something else?
> > > >> Regards
> > > >> Tommaso
> > > >>
> > > >> 2012/5/25 Edward J. Yoon <ed...@apache.org>
> > > >>
> > > >>> OKay, then let's get started.
> > > >>>
> > > >>> My first idea is simple online recommendation system based on
> > > >> click-stream
> > > >>> data.
> > > >>>
> > > >>> On Thu, May 24, 2012 at 6:26 PM, Praveen Sripati
> > > >>> <pr...@gmail.com> wrote:
> > > >>>> +1
> > > >>>>
> > > >>>> For those who are interested in ML, please check this. GNU Octave
> is
> > > >>> used.
> > > >>>>
> > > >>>> https://www.coursera.org/course/ml
> > > >>>>
> > > >>>> Another session is yet to be announced.
> > > >>>>
> > > >>>> Thanks,
> > > >>>> Praveen
> > > >>>>
> > > >>>> On Thu, May 24, 2012 at 12:54 PM, Thomas Jungblut <
> > > >>>> thomas.jungblut@googlemail.com> wrote:
> > > >>>>
> > > >>>>> +1
> > > >>>>>
> > > >>>>> 2012/5/24 Tommaso Teofili <to...@gmail.com>
> > > >>>>>
> > > >>>>>> and same here :)
> > > >>>>>>
> > > >>>>>> 2012/5/24 Vaijanath Rao <va...@gmail.com>
> > > >>>>>>
> > > >>>>>>> +1 me too
> > > >>>>>>> On May 23, 2012 10:26 PM, "Aditya Sarawgi" <
> > > >>> sarawgi.aditya@gmail.com>
> > > >>>>>>> wrote:
> > > >>>>>>>
> > > >>>>>>>> +1
> > > >>>>>>>> I would be happy to help :)
> > > >>>>>>>>
> > > >>>>>>>> On Wed, May 23, 2012 at 6:23 PM, Edward J. Yoon <
> > > >>>>> edwardyoon@apache.org
> > > >>>>>>>>> wrote:
> > > >>>>>>>>
> > > >>>>>>>>> Hi,
> > > >>>>>>>>>
> > > >>>>>>>>> Does anyone interesting in online machine learning?
> > > >>>>>>>>>
> > > >>>>>>>>> --
> > > >>>>>>>>> Best Regards, Edward J. Yoon
> > > >>>>>>>>> @eddieyoon
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> --
> > > >>>>>>>> Cheers,
> > > >>>>>>>> Aditya Sarawgi
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> --
> > > >>>>> Thomas Jungblut
> > > >>>>> Berlin <th...@gmail.com>
> > > >>>>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> --
> > > >>> Best Regards, Edward J. Yoon
> > > >>> @eddieyoon
> > > >>>
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Thomas Jungblut
> > > > Berlin <th...@gmail.com>
> > > >
> > > >
> > > > --
> > > > Best Regards, Edward J. Yoon
> > > > @eddieyoon
> > >
> > > --
> > > Manuel Blechschmidt
> > > Dortustr. 57
> > > 14467 Potsdam
> > > Mobil: 0173/6322621
> > > Twitter: http://twitter.com/Manuel_B
> > >
> > >
> >
> >
> > --
> > Thomas Jungblut
> > Berlin <th...@gmail.com>
> >
>



-- 
Thomas Jungblut
Berlin <th...@gmail.com>

Re: Online machine learning on top of Hama BSP

Posted by Thomas Jungblut <th...@googlemail.com>.

Hi Ted,

Giraph offers a graph layer that uses internally BSP on top of MapReduce.
You don't have access to the BSP primitives, therefore you need to treat
every machine learning problem as graph problem which maybe very
inconvenient in many cases.

2012/5/25 Ted Dunning <te...@gmail.com>

> Apache Giraph probably offers a more mature BSP model of computation.  My
> guess is that it would make a stronger implementation substrate.  It
> certainly has a very strong community.
>
> On Fri, May 25, 2012 at 10:44 AM, Thomas Jungblut <
> thomas.jungblut@googlemail.com> wrote:
>
> > Hi Manuel,
> >
> > 300k is small, I have one with 6 mio clicks.
> > However it is more a question of interest and what algorithms could be
> > suitable for BSP.
> > In case you wonder what BSP is, it stands for bulk synchronous parallel
> > [1].
> > We think that realtime and strongly iterative algorithms that are slow in
> > mapreduce could be more efficiently solved with BSP.
> > If you're interested, let us know.
> >
> > Regards,
> > Thomas
> >
> > [1] http://en.wikipedia.org/wiki/Bulk_synchronous_parallel
> >
> > 2012/5/25 Manuel Blechschmidt <Ma...@gmx.de>
> >
> > > Hi Edward,
> > > do you already have a test dataset?
> > >
> > > I might get one with about 300.000 clicks for you.
> > >
> > > It is from www.nelou.com and we are already running a recommender in
> > > preview mode:
> > >
> >
> http://www.nelou.com/artikel-803746/Overall-von-mysuro#__apaxoPreviewMode
> > >
> > > It could be the case that you would have to sign an NDA. Would this be
> > > possible for you?
> > >
> > > /Manuel
> > >
> > > On 25.05.2012, at 10:34, Edward J. Yoon wrote:
> > >
> > > > OKay, I'm FWD this to mahout dev.
> > > >
> > > > I'm planning to create a project related to On-line machine learning,
> > > > as a Apache Hama sub-module. Since the graph of message queues and
> > > > workers could be implemented using BSP (see also [1]). The first idea
> > > > is On-line recommendation system based on click-stream data.
> > > >
> > > > If you have interested in this plan, let's talk together here.
> > > >
> > > > 1.
> > >
> >
> http://codingwiththomas.blogspot.com/2011/10/apache-hama-realtime-processing.html
> > > >
> > > > ---------- Forwarded message ----------
> > > > From: Thomas Jungblut <th...@googlemail.com>
> > > > Date: Fri, May 25, 2012 at 4:55 PM
> > > > Subject: Re: Online machine learning on top of Hama BSP
> > > > To: dev@hama.apache.org
> > > >
> > > >
> > > > Should we cooperate with the Mahout guys on this? I'm pretty sure
> they
> > > > would have fun with it.
> > > > Edward, do you want to ask them?
> > > >
> > > > 2012/5/25 Tommaso Teofili <to...@gmail.com>
> > > >
> > > >> Do you have a plan for that Edward?
> > > >> A separate package in examples or a separate (online) machine
> learning
> > > >> module? Or something else?
> > > >> Regards
> > > >> Tommaso
> > > >>
> > > >> 2012/5/25 Edward J. Yoon <ed...@apache.org>
> > > >>
> > > >>> OKay, then let's get started.
> > > >>>
> > > >>> My first idea is simple online recommendation system based on
> > > >> click-stream
> > > >>> data.
> > > >>>
> > > >>> On Thu, May 24, 2012 at 6:26 PM, Praveen Sripati
> > > >>> <pr...@gmail.com> wrote:
> > > >>>> +1
> > > >>>>
> > > >>>> For those who are interested in ML, please check this. GNU Octave
> is
> > > >>> used.
> > > >>>>
> > > >>>> https://www.coursera.org/course/ml
> > > >>>>
> > > >>>> Another session is yet to be announced.
> > > >>>>
> > > >>>> Thanks,
> > > >>>> Praveen
> > > >>>>
> > > >>>> On Thu, May 24, 2012 at 12:54 PM, Thomas Jungblut <
> > > >>>> thomas.jungblut@googlemail.com> wrote:
> > > >>>>
> > > >>>>> +1
> > > >>>>>
> > > >>>>> 2012/5/24 Tommaso Teofili <to...@gmail.com>
> > > >>>>>
> > > >>>>>> and same here :)
> > > >>>>>>
> > > >>>>>> 2012/5/24 Vaijanath Rao <va...@gmail.com>
> > > >>>>>>
> > > >>>>>>> +1 me too
> > > >>>>>>> On May 23, 2012 10:26 PM, "Aditya Sarawgi" <
> > > >>> sarawgi.aditya@gmail.com>
> > > >>>>>>> wrote:
> > > >>>>>>>
> > > >>>>>>>> +1
> > > >>>>>>>> I would be happy to help :)
> > > >>>>>>>>
> > > >>>>>>>> On Wed, May 23, 2012 at 6:23 PM, Edward J. Yoon <
> > > >>>>> edwardyoon@apache.org
> > > >>>>>>>>> wrote:
> > > >>>>>>>>
> > > >>>>>>>>> Hi,
> > > >>>>>>>>>
> > > >>>>>>>>> Does anyone interesting in online machine learning?
> > > >>>>>>>>>
> > > >>>>>>>>> --
> > > >>>>>>>>> Best Regards, Edward J. Yoon
> > > >>>>>>>>> @eddieyoon
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> --
> > > >>>>>>>> Cheers,
> > > >>>>>>>> Aditya Sarawgi
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> --
> > > >>>>> Thomas Jungblut
> > > >>>>> Berlin <th...@gmail.com>
> > > >>>>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> --
> > > >>> Best Regards, Edward J. Yoon
> > > >>> @eddieyoon
> > > >>>
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Thomas Jungblut
> > > > Berlin <th...@gmail.com>
> > > >
> > > >
> > > > --
> > > > Best Regards, Edward J. Yoon
> > > > @eddieyoon
> > >
> > > --
> > > Manuel Blechschmidt
> > > Dortustr. 57
> > > 14467 Potsdam
> > > Mobil: 0173/6322621
> > > Twitter: http://twitter.com/Manuel_B
> > >
> > >
> >
> >
> > --
> > Thomas Jungblut
> > Berlin <th...@gmail.com>
> >
>



-- 
Thomas Jungblut
Berlin <th...@gmail.com>

Re: Online machine learning on top of Hama BSP

Posted by Ted Dunning <te...@gmail.com>.

Apache Giraph probably offers a more mature BSP model of computation.  My
guess is that it would make a stronger implementation substrate.  It
certainly has a very strong community.

On Fri, May 25, 2012 at 10:44 AM, Thomas Jungblut <
thomas.jungblut@googlemail.com> wrote:

> Hi Manuel,
>
> 300k is small, I have one with 6 mio clicks.
> However it is more a question of interest and what algorithms could be
> suitable for BSP.
> In case you wonder what BSP is, it stands for bulk synchronous parallel
> [1].
> We think that realtime and strongly iterative algorithms that are slow in
> mapreduce could be more efficiently solved with BSP.
> If you're interested, let us know.
>
> Regards,
> Thomas
>
> [1] http://en.wikipedia.org/wiki/Bulk_synchronous_parallel
>
> 2012/5/25 Manuel Blechschmidt <Ma...@gmx.de>
>
> > Hi Edward,
> > do you already have a test dataset?
> >
> > I might get one with about 300.000 clicks for you.
> >
> > It is from www.nelou.com and we are already running a recommender in
> > preview mode:
> >
> http://www.nelou.com/artikel-803746/Overall-von-mysuro#__apaxoPreviewMode
> >
> > It could be the case that you would have to sign an NDA. Would this be
> > possible for you?
> >
> > /Manuel
> >
> > On 25.05.2012, at 10:34, Edward J. Yoon wrote:
> >
> > > OKay, I'm FWD this to mahout dev.
> > >
> > > I'm planning to create a project related to On-line machine learning,
> > > as a Apache Hama sub-module. Since the graph of message queues and
> > > workers could be implemented using BSP (see also [1]). The first idea
> > > is On-line recommendation system based on click-stream data.
> > >
> > > If you have interested in this plan, let's talk together here.
> > >
> > > 1.
> >
> http://codingwiththomas.blogspot.com/2011/10/apache-hama-realtime-processing.html
> > >
> > > ---------- Forwarded message ----------
> > > From: Thomas Jungblut <th...@googlemail.com>
> > > Date: Fri, May 25, 2012 at 4:55 PM
> > > Subject: Re: Online machine learning on top of Hama BSP
> > > To: dev@hama.apache.org
> > >
> > >
> > > Should we cooperate with the Mahout guys on this? I'm pretty sure they
> > > would have fun with it.
> > > Edward, do you want to ask them?
> > >
> > > 2012/5/25 Tommaso Teofili <to...@gmail.com>
> > >
> > >> Do you have a plan for that Edward?
> > >> A separate package in examples or a separate (online) machine learning
> > >> module? Or something else?
> > >> Regards
> > >> Tommaso
> > >>
> > >> 2012/5/25 Edward J. Yoon <ed...@apache.org>
> > >>
> > >>> OKay, then let's get started.
> > >>>
> > >>> My first idea is simple online recommendation system based on
> > >> click-stream
> > >>> data.
> > >>>
> > >>> On Thu, May 24, 2012 at 6:26 PM, Praveen Sripati
> > >>> <pr...@gmail.com> wrote:
> > >>>> +1
> > >>>>
> > >>>> For those who are interested in ML, please check this. GNU Octave is
> > >>> used.
> > >>>>
> > >>>> https://www.coursera.org/course/ml
> > >>>>
> > >>>> Another session is yet to be announced.
> > >>>>
> > >>>> Thanks,
> > >>>> Praveen
> > >>>>
> > >>>> On Thu, May 24, 2012 at 12:54 PM, Thomas Jungblut <
> > >>>> thomas.jungblut@googlemail.com> wrote:
> > >>>>
> > >>>>> +1
> > >>>>>
> > >>>>> 2012/5/24 Tommaso Teofili <to...@gmail.com>
> > >>>>>
> > >>>>>> and same here :)
> > >>>>>>
> > >>>>>> 2012/5/24 Vaijanath Rao <va...@gmail.com>
> > >>>>>>
> > >>>>>>> +1 me too
> > >>>>>>> On May 23, 2012 10:26 PM, "Aditya Sarawgi" <
> > >>> sarawgi.aditya@gmail.com>
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> +1
> > >>>>>>>> I would be happy to help :)
> > >>>>>>>>
> > >>>>>>>> On Wed, May 23, 2012 at 6:23 PM, Edward J. Yoon <
> > >>>>> edwardyoon@apache.org
> > >>>>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> Hi,
> > >>>>>>>>>
> > >>>>>>>>> Does anyone interesting in online machine learning?
> > >>>>>>>>>
> > >>>>>>>>> --
> > >>>>>>>>> Best Regards, Edward J. Yoon
> > >>>>>>>>> @eddieyoon
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> --
> > >>>>>>>> Cheers,
> > >>>>>>>> Aditya Sarawgi
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>>> Thomas Jungblut
> > >>>>> Berlin <th...@gmail.com>
> > >>>>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Best Regards, Edward J. Yoon
> > >>> @eddieyoon
> > >>>
> > >>
> > >
> > >
> > >
> > > --
> > > Thomas Jungblut
> > > Berlin <th...@gmail.com>
> > >
> > >
> > > --
> > > Best Regards, Edward J. Yoon
> > > @eddieyoon
> >
> > --
> > Manuel Blechschmidt
> > Dortustr. 57
> > 14467 Potsdam
> > Mobil: 0173/6322621
> > Twitter: http://twitter.com/Manuel_B
> >
> >
>
>
> --
> Thomas Jungblut
> Berlin <th...@gmail.com>
>

Re: Online machine learning on top of Hama BSP

Posted by Thomas Jungblut <th...@googlemail.com>.

Hi Manuel,

300k is small, I have one with 6 mio clicks.
However it is more a question of interest and what algorithms could be
suitable for BSP.
In case you wonder what BSP is, it stands for bulk synchronous parallel [1].
We think that realtime and strongly iterative algorithms that are slow in
mapreduce could be more efficiently solved with BSP.
If you're interested, let us know.

Regards,
Thomas

[1] http://en.wikipedia.org/wiki/Bulk_synchronous_parallel

2012/5/25 Manuel Blechschmidt <Ma...@gmx.de>

> Hi Edward,
> do you already have a test dataset?
>
> I might get one with about 300.000 clicks for you.
>
> It is from www.nelou.com and we are already running a recommender in
> preview mode:
> http://www.nelou.com/artikel-803746/Overall-von-mysuro#__apaxoPreviewMode
>
> It could be the case that you would have to sign an NDA. Would this be
> possible for you?
>
> /Manuel
>
> On 25.05.2012, at 10:34, Edward J. Yoon wrote:
>
> > OKay, I'm FWD this to mahout dev.
> >
> > I'm planning to create a project related to On-line machine learning,
> > as a Apache Hama sub-module. Since the graph of message queues and
> > workers could be implemented using BSP (see also [1]). The first idea
> > is On-line recommendation system based on click-stream data.
> >
> > If you have interested in this plan, let's talk together here.
> >
> > 1.
> http://codingwiththomas.blogspot.com/2011/10/apache-hama-realtime-processing.html
> >
> > ---------- Forwarded message ----------
> > From: Thomas Jungblut <th...@googlemail.com>
> > Date: Fri, May 25, 2012 at 4:55 PM
> > Subject: Re: Online machine learning on top of Hama BSP
> > To: dev@hama.apache.org
> >
> >
> > Should we cooperate with the Mahout guys on this? I'm pretty sure they
> > would have fun with it.
> > Edward, do you want to ask them?
> >
> > 2012/5/25 Tommaso Teofili <to...@gmail.com>
> >
> >> Do you have a plan for that Edward?
> >> A separate package in examples or a separate (online) machine learning
> >> module? Or something else?
> >> Regards
> >> Tommaso
> >>
> >> 2012/5/25 Edward J. Yoon <ed...@apache.org>
> >>
> >>> OKay, then let's get started.
> >>>
> >>> My first idea is simple online recommendation system based on
> >> click-stream
> >>> data.
> >>>
> >>> On Thu, May 24, 2012 at 6:26 PM, Praveen Sripati
> >>> <pr...@gmail.com> wrote:
> >>>> +1
> >>>>
> >>>> For those who are interested in ML, please check this. GNU Octave is
> >>> used.
> >>>>
> >>>> https://www.coursera.org/course/ml
> >>>>
> >>>> Another session is yet to be announced.
> >>>>
> >>>> Thanks,
> >>>> Praveen
> >>>>
> >>>> On Thu, May 24, 2012 at 12:54 PM, Thomas Jungblut <
> >>>> thomas.jungblut@googlemail.com> wrote:
> >>>>
> >>>>> +1
> >>>>>
> >>>>> 2012/5/24 Tommaso Teofili <to...@gmail.com>
> >>>>>
> >>>>>> and same here :)
> >>>>>>
> >>>>>> 2012/5/24 Vaijanath Rao <va...@gmail.com>
> >>>>>>
> >>>>>>> +1 me too
> >>>>>>> On May 23, 2012 10:26 PM, "Aditya Sarawgi" <
> >>> sarawgi.aditya@gmail.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> +1
> >>>>>>>> I would be happy to help :)
> >>>>>>>>
> >>>>>>>> On Wed, May 23, 2012 at 6:23 PM, Edward J. Yoon <
> >>>>> edwardyoon@apache.org
> >>>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> Does anyone interesting in online machine learning?
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Best Regards, Edward J. Yoon
> >>>>>>>>> @eddieyoon
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Cheers,
> >>>>>>>> Aditya Sarawgi
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Thomas Jungblut
> >>>>> Berlin <th...@gmail.com>
> >>>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Best Regards, Edward J. Yoon
> >>> @eddieyoon
> >>>
> >>
> >
> >
> >
> > --
> > Thomas Jungblut
> > Berlin <th...@gmail.com>
> >
> >
> > --
> > Best Regards, Edward J. Yoon
> > @eddieyoon
>
> --
> Manuel Blechschmidt
> Dortustr. 57
> 14467 Potsdam
> Mobil: 0173/6322621
> Twitter: http://twitter.com/Manuel_B
>
>


-- 
Thomas Jungblut
Berlin <th...@gmail.com>

Re: Online machine learning on top of Hama BSP

Posted by Manuel Blechschmidt <Ma...@gmx.de>.

Hi Edward,
do you already have a test dataset?

I might get one with about 300.000 clicks for you.

It is from www.nelou.com and we are already running a recommender in preview mode:
http://www.nelou.com/artikel-803746/Overall-von-mysuro#__apaxoPreviewMode

It could be the case that you would have to sign an NDA. Would this be possible for you?

/Manuel

On 25.05.2012, at 10:34, Edward J. Yoon wrote:

> OKay, I'm FWD this to mahout dev.
> 
> I'm planning to create a project related to On-line machine learning,
> as a Apache Hama sub-module. Since the graph of message queues and
> workers could be implemented using BSP (see also [1]). The first idea
> is On-line recommendation system based on click-stream data.
> 
> If you have interested in this plan, let's talk together here.
> 
> 1. http://codingwiththomas.blogspot.com/2011/10/apache-hama-realtime-processing.html
> 
> ---------- Forwarded message ----------
> From: Thomas Jungblut <th...@googlemail.com>
> Date: Fri, May 25, 2012 at 4:55 PM
> Subject: Re: Online machine learning on top of Hama BSP
> To: dev@hama.apache.org
> 
> 
> Should we cooperate with the Mahout guys on this? I'm pretty sure they
> would have fun with it.
> Edward, do you want to ask them?
> 
> 2012/5/25 Tommaso Teofili <to...@gmail.com>
> 
>> Do you have a plan for that Edward?
>> A separate package in examples or a separate (online) machine learning
>> module? Or something else?
>> Regards
>> Tommaso
>> 
>> 2012/5/25 Edward J. Yoon <ed...@apache.org>
>> 
>>> OKay, then let's get started.
>>> 
>>> My first idea is simple online recommendation system based on
>> click-stream
>>> data.
>>> 
>>> On Thu, May 24, 2012 at 6:26 PM, Praveen Sripati
>>> <pr...@gmail.com> wrote:
>>>> +1
>>>> 
>>>> For those who are interested in ML, please check this. GNU Octave is
>>> used.
>>>> 
>>>> https://www.coursera.org/course/ml
>>>> 
>>>> Another session is yet to be announced.
>>>> 
>>>> Thanks,
>>>> Praveen
>>>> 
>>>> On Thu, May 24, 2012 at 12:54 PM, Thomas Jungblut <
>>>> thomas.jungblut@googlemail.com> wrote:
>>>> 
>>>>> +1
>>>>> 
>>>>> 2012/5/24 Tommaso Teofili <to...@gmail.com>
>>>>> 
>>>>>> and same here :)
>>>>>> 
>>>>>> 2012/5/24 Vaijanath Rao <va...@gmail.com>
>>>>>> 
>>>>>>> +1 me too
>>>>>>> On May 23, 2012 10:26 PM, "Aditya Sarawgi" <
>>> sarawgi.aditya@gmail.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> +1
>>>>>>>> I would be happy to help :)
>>>>>>>> 
>>>>>>>> On Wed, May 23, 2012 at 6:23 PM, Edward J. Yoon <
>>>>> edwardyoon@apache.org
>>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> Does anyone interesting in online machine learning?
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Best Regards, Edward J. Yoon
>>>>>>>>> @eddieyoon
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Cheers,
>>>>>>>> Aditya Sarawgi
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Thomas Jungblut
>>>>> Berlin <th...@gmail.com>
>>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Best Regards, Edward J. Yoon
>>> @eddieyoon
>>> 
>> 
> 
> 
> 
> --
> Thomas Jungblut
> Berlin <th...@gmail.com>
> 
> 
> -- 
> Best Regards, Edward J. Yoon
> @eddieyoon

-- 
Manuel Blechschmidt
Dortustr. 57
14467 Potsdam
Mobil: 0173/6322621
Twitter: http://twitter.com/Manuel_B

Re: Online machine learning on top of Hama BSP

Posted by "Edward J. Yoon" <ed...@apache.org>.

CC'ing hama dev.

On Fri, May 25, 2012 at 5:34 PM, Edward J. Yoon <ed...@apache.org> wrote:
> OKay, I'm FWD this to mahout dev.
>
> I'm planning to create a project related to On-line machine learning,
> as a Apache Hama sub-module. Since the graph of message queues and
> workers could be implemented using BSP (see also [1]). The first idea
> is On-line recommendation system based on click-stream data.
>
> If you have interested in this plan, let's talk together here.
>
> 1. http://codingwiththomas.blogspot.com/2011/10/apache-hama-realtime-processing.html
>
> ---------- Forwarded message ----------
> From: Thomas Jungblut <th...@googlemail.com>
> Date: Fri, May 25, 2012 at 4:55 PM
> Subject: Re: Online machine learning on top of Hama BSP
> To: dev@hama.apache.org
>
>
> Should we cooperate with the Mahout guys on this? I'm pretty sure they
> would have fun with it.
> Edward, do you want to ask them?
>
> 2012/5/25 Tommaso Teofili <to...@gmail.com>
>
>> Do you have a plan for that Edward?
>> A separate package in examples or a separate (online) machine learning
>> module? Or something else?
>> Regards
>> Tommaso
>>
>> 2012/5/25 Edward J. Yoon <ed...@apache.org>
>>
>> > OKay, then let's get started.
>> >
>> > My first idea is simple online recommendation system based on
>> click-stream
>> > data.
>> >
>> > On Thu, May 24, 2012 at 6:26 PM, Praveen Sripati
>> > <pr...@gmail.com> wrote:
>> > > +1
>> > >
>> > > For those who are interested in ML, please check this. GNU Octave is
>> > used.
>> > >
>> > > https://www.coursera.org/course/ml
>> > >
>> > > Another session is yet to be announced.
>> > >
>> > > Thanks,
>> > > Praveen
>> > >
>> > > On Thu, May 24, 2012 at 12:54 PM, Thomas Jungblut <
>> > > thomas.jungblut@googlemail.com> wrote:
>> > >
>> > >> +1
>> > >>
>> > >> 2012/5/24 Tommaso Teofili <to...@gmail.com>
>> > >>
>> > >> > and same here :)
>> > >> >
>> > >> > 2012/5/24 Vaijanath Rao <va...@gmail.com>
>> > >> >
>> > >> > > +1 me too
>> > >> > > On May 23, 2012 10:26 PM, "Aditya Sarawgi" <
>> > sarawgi.aditya@gmail.com>
>> > >> > > wrote:
>> > >> > >
>> > >> > > > +1
>> > >> > > > I would be happy to help :)
>> > >> > > >
>> > >> > > > On Wed, May 23, 2012 at 6:23 PM, Edward J. Yoon <
>> > >> edwardyoon@apache.org
>> > >> > > > >wrote:
>> > >> > > >
>> > >> > > > > Hi,
>> > >> > > > >
>> > >> > > > > Does anyone interesting in online machine learning?
>> > >> > > > >
>> > >> > > > > --
>> > >> > > > > Best Regards, Edward J. Yoon
>> > >> > > > > @eddieyoon
>> > >> > > > >
>> > >> > > >
>> > >> > > >
>> > >> > > >
>> > >> > > > --
>> > >> > > > Cheers,
>> > >> > > > Aditya Sarawgi
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> Thomas Jungblut
>> > >> Berlin <th...@gmail.com>
>> > >>
>> >
>> >
>> >
>> > --
>> > Best Regards, Edward J. Yoon
>> > @eddieyoon
>> >
>>
>
>
>
> --
> Thomas Jungblut
> Berlin <th...@gmail.com>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Online machine learning on top of Hama BSP

Posted by "Edward J. Yoon" <ed...@apache.org>.

CC'ing hama dev.

On Fri, May 25, 2012 at 5:34 PM, Edward J. Yoon <ed...@apache.org> wrote:
> OKay, I'm FWD this to mahout dev.
>
> I'm planning to create a project related to On-line machine learning,
> as a Apache Hama sub-module. Since the graph of message queues and
> workers could be implemented using BSP (see also [1]). The first idea
> is On-line recommendation system based on click-stream data.
>
> If you have interested in this plan, let's talk together here.
>
> 1. http://codingwiththomas.blogspot.com/2011/10/apache-hama-realtime-processing.html
>
> ---------- Forwarded message ----------
> From: Thomas Jungblut <th...@googlemail.com>
> Date: Fri, May 25, 2012 at 4:55 PM
> Subject: Re: Online machine learning on top of Hama BSP
> To: dev@hama.apache.org
>
>
> Should we cooperate with the Mahout guys on this? I'm pretty sure they
> would have fun with it.
> Edward, do you want to ask them?
>
> 2012/5/25 Tommaso Teofili <to...@gmail.com>
>
>> Do you have a plan for that Edward?
>> A separate package in examples or a separate (online) machine learning
>> module? Or something else?
>> Regards
>> Tommaso
>>
>> 2012/5/25 Edward J. Yoon <ed...@apache.org>
>>
>> > OKay, then let's get started.
>> >
>> > My first idea is simple online recommendation system based on
>> click-stream
>> > data.
>> >
>> > On Thu, May 24, 2012 at 6:26 PM, Praveen Sripati
>> > <pr...@gmail.com> wrote:
>> > > +1
>> > >
>> > > For those who are interested in ML, please check this. GNU Octave is
>> > used.
>> > >
>> > > https://www.coursera.org/course/ml
>> > >
>> > > Another session is yet to be announced.
>> > >
>> > > Thanks,
>> > > Praveen
>> > >
>> > > On Thu, May 24, 2012 at 12:54 PM, Thomas Jungblut <
>> > > thomas.jungblut@googlemail.com> wrote:
>> > >
>> > >> +1
>> > >>
>> > >> 2012/5/24 Tommaso Teofili <to...@gmail.com>
>> > >>
>> > >> > and same here :)
>> > >> >
>> > >> > 2012/5/24 Vaijanath Rao <va...@gmail.com>
>> > >> >
>> > >> > > +1 me too
>> > >> > > On May 23, 2012 10:26 PM, "Aditya Sarawgi" <
>> > sarawgi.aditya@gmail.com>
>> > >> > > wrote:
>> > >> > >
>> > >> > > > +1
>> > >> > > > I would be happy to help :)
>> > >> > > >
>> > >> > > > On Wed, May 23, 2012 at 6:23 PM, Edward J. Yoon <
>> > >> edwardyoon@apache.org
>> > >> > > > >wrote:
>> > >> > > >
>> > >> > > > > Hi,
>> > >> > > > >
>> > >> > > > > Does anyone interesting in online machine learning?
>> > >> > > > >
>> > >> > > > > --
>> > >> > > > > Best Regards, Edward J. Yoon
>> > >> > > > > @eddieyoon
>> > >> > > > >
>> > >> > > >
>> > >> > > >
>> > >> > > >
>> > >> > > > --
>> > >> > > > Cheers,
>> > >> > > > Aditya Sarawgi
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> Thomas Jungblut
>> > >> Berlin <th...@gmail.com>
>> > >>
>> >
>> >
>> >
>> > --
>> > Best Regards, Edward J. Yoon
>> > @eddieyoon
>> >
>>
>
>
>
> --
> Thomas Jungblut
> Berlin <th...@gmail.com>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Fwd: Online machine learning on top of Hama BSP

Posted by "Edward J. Yoon" <ed...@apache.org>.

OKay, I'm FWD this to mahout dev.

I'm planning to create a project related to On-line machine learning,
as a Apache Hama sub-module. Since the graph of message queues and
workers could be implemented using BSP (see also [1]). The first idea
is On-line recommendation system based on click-stream data.

If you have interested in this plan, let's talk together here.

1. http://codingwiththomas.blogspot.com/2011/10/apache-hama-realtime-processing.html

---------- Forwarded message ----------
From: Thomas Jungblut <th...@googlemail.com>
Date: Fri, May 25, 2012 at 4:55 PM
Subject: Re: Online machine learning on top of Hama BSP
To: dev@hama.apache.org


Should we cooperate with the Mahout guys on this? I'm pretty sure they
would have fun with it.
Edward, do you want to ask them?

2012/5/25 Tommaso Teofili <to...@gmail.com>

> Do you have a plan for that Edward?
> A separate package in examples or a separate (online) machine learning
> module? Or something else?
> Regards
> Tommaso
>
> 2012/5/25 Edward J. Yoon <ed...@apache.org>
>
> > OKay, then let's get started.
> >
> > My first idea is simple online recommendation system based on
> click-stream
> > data.
> >
> > On Thu, May 24, 2012 at 6:26 PM, Praveen Sripati
> > <pr...@gmail.com> wrote:
> > > +1
> > >
> > > For those who are interested in ML, please check this. GNU Octave is
> > used.
> > >
> > > https://www.coursera.org/course/ml
> > >
> > > Another session is yet to be announced.
> > >
> > > Thanks,
> > > Praveen
> > >
> > > On Thu, May 24, 2012 at 12:54 PM, Thomas Jungblut <
> > > thomas.jungblut@googlemail.com> wrote:
> > >
> > >> +1
> > >>
> > >> 2012/5/24 Tommaso Teofili <to...@gmail.com>
> > >>
> > >> > and same here :)
> > >> >
> > >> > 2012/5/24 Vaijanath Rao <va...@gmail.com>
> > >> >
> > >> > > +1 me too
> > >> > > On May 23, 2012 10:26 PM, "Aditya Sarawgi" <
> > sarawgi.aditya@gmail.com>
> > >> > > wrote:
> > >> > >
> > >> > > > +1
> > >> > > > I would be happy to help :)
> > >> > > >
> > >> > > > On Wed, May 23, 2012 at 6:23 PM, Edward J. Yoon <
> > >> edwardyoon@apache.org
> > >> > > > >wrote:
> > >> > > >
> > >> > > > > Hi,
> > >> > > > >
> > >> > > > > Does anyone interesting in online machine learning?
> > >> > > > >
> > >> > > > > --
> > >> > > > > Best Regards, Edward J. Yoon
> > >> > > > > @eddieyoon
> > >> > > > >
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > --
> > >> > > > Cheers,
> > >> > > > Aditya Sarawgi
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Thomas Jungblut
> > >> Berlin <th...@gmail.com>
> > >>
> >
> >
> >
> > --
> > Best Regards, Edward J. Yoon
> > @eddieyoon
> >
>



--
Thomas Jungblut
Berlin <th...@gmail.com>


-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Online machine learning on top of Hama BSP

Posted by Thomas Jungblut <th...@googlemail.com>.

Should we cooperate with the Mahout guys on this? I'm pretty sure they
would have fun with it.
Edward, do you want to ask them?

2012/5/25 Tommaso Teofili <to...@gmail.com>

> Do you have a plan for that Edward?
> A separate package in examples or a separate (online) machine learning
> module? Or something else?
> Regards
> Tommaso
>
> 2012/5/25 Edward J. Yoon <ed...@apache.org>
>
> > OKay, then let's get started.
> >
> > My first idea is simple online recommendation system based on
> click-stream
> > data.
> >
> > On Thu, May 24, 2012 at 6:26 PM, Praveen Sripati
> > <pr...@gmail.com> wrote:
> > > +1
> > >
> > > For those who are interested in ML, please check this. GNU Octave is
> > used.
> > >
> > > https://www.coursera.org/course/ml
> > >
> > > Another session is yet to be announced.
> > >
> > > Thanks,
> > > Praveen
> > >
> > > On Thu, May 24, 2012 at 12:54 PM, Thomas Jungblut <
> > > thomas.jungblut@googlemail.com> wrote:
> > >
> > >> +1
> > >>
> > >> 2012/5/24 Tommaso Teofili <to...@gmail.com>
> > >>
> > >> > and same here :)
> > >> >
> > >> > 2012/5/24 Vaijanath Rao <va...@gmail.com>
> > >> >
> > >> > > +1 me too
> > >> > > On May 23, 2012 10:26 PM, "Aditya Sarawgi" <
> > sarawgi.aditya@gmail.com>
> > >> > > wrote:
> > >> > >
> > >> > > > +1
> > >> > > > I would be happy to help :)
> > >> > > >
> > >> > > > On Wed, May 23, 2012 at 6:23 PM, Edward J. Yoon <
> > >> edwardyoon@apache.org
> > >> > > > >wrote:
> > >> > > >
> > >> > > > > Hi,
> > >> > > > >
> > >> > > > > Does anyone interesting in online machine learning?
> > >> > > > >
> > >> > > > > --
> > >> > > > > Best Regards, Edward J. Yoon
> > >> > > > > @eddieyoon
> > >> > > > >
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > --
> > >> > > > Cheers,
> > >> > > > Aditya Sarawgi
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Thomas Jungblut
> > >> Berlin <th...@gmail.com>
> > >>
> >
> >
> >
> > --
> > Best Regards, Edward J. Yoon
> > @eddieyoon
> >
>



-- 
Thomas Jungblut
Berlin <th...@gmail.com>

Re: Online machine learning on top of Hama BSP

Posted by Thomas Jungblut <th...@googlemail.com>.

Yes, we can express it with the superstep API very easy.

So if you're interested in my neural net, please follow:
https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/classification/nn/BatchBackpropagationBSP.java


I will start once I have a bigger chunk of time ;)

2012/6/14 Suraj Menon <su...@apache.org>

> Just adding my 2 cents. Thomas, this goes in line with the discussion we
> had recently on how Hama should have a superstep library, where each
> superstep does something that potential user (In this case, our machine
> learning library) can override and use. Few ideas for superstep library:
>
> 1. RealTimeSuperstep (extends Superstep but does not sync)
> 2. MutualBroadcastSuperstep (extends Superstep; used where all the peers
> have to send all their messages to each other. We should employ a peer
> communication strategy such that every peer internally does not have to
> open RPC connection with every other peer)
> 3. Mapper and Reducer(I have one WordCount test running for small set of
> data. Will need more time to increase its scalability, the first step of
> MapReduce would have to use MutualBroadCast.
> 4. OutputCommitter (a Superstep that would write output records to HDFS not
> based on the peer ID)
> 5. IterativeSuperstep (that holds static information on every iteration and
> checkpoints them)
> 6.. more expected as we work on new ideas.
>
>
> -Suraj
>
>
> On Thu, Jun 14, 2012 at 2:45 PM, Thomas Jungblut <
> thomas.jungblut@googlemail.com> wrote:
>
> > I have read a bit about batch neural networks and I think I have found a
> > viable solution for BSP.
> > The funny thing is, that it is the same intuition that my kmeans
> clustering
> > has.
> >
> > Each task is processing on a local block of the data, training a full
> model
> > for itself (making a forward pass and calculating the error of the output
> > neurons against the prediction).
> > Now after you have iterated over all the observations, you are going to
> > send all the weights of your neurons and the error (let's say the average
> > error over all observations) to all the other tasks.
> > After sync, each tasks has #tasks weights for a neuron and the avg
> > prediction error, now the weights are accumulated and the backward step
> > with the error begins.
> > When all weights are backpropagated on each task, you can start reading
> the
> > whole observations again and make the next epoch. (until some minimum
> > average error has been seen or maximum epochs has been reached).
> >
> > Don't know if that is a common pattern in machine learning, but seems to
> me
> > like we can extract some kind of API that helps building local models and
> > combining them again in the next superstep with more information (think
> of
> > the Pregel API with compute, but not on vertex level but on task level).
> >
> > What do you think about that?
> >
> > 2012/6/14 Thomas Jungblut <th...@googlemail.com>
> >
> > > Very cool project, I just need a few vectors and matrices where I will
> > use
> > > my own library first.
> > >
> > > Still having a hard time to distribute the network and update it
> > > accordingly in backprop. If you have smart ideas, let me know.
> > >
> > >
> > > 2012/6/14 Tommaso Teofili <to...@gmail.com>
> > >
> > >> Hi Thomas,
> > >> regarding neural networks I'm also working on it within Apache Yay (my
> > >> Apache labs project [1]) and I agree it'd make sense to run neural
> > network
> > >> algorithms on top of Hama, however at this stage I've just a prototype
> > in
> > >> memory implementation for feedforward (no actual learning) neural
> > >> networks.
> > >> Apart from that I think we need a math/linear algebra package running
> on
> > >> top of Hama to make those algorithms scale nicely.
> > >> I agree we can start from batch and then switch to online machine
> > learning
> > >> algorithms.
> > >> Regards,
> > >> Tommaso
> > >>
> > >> [1] : http://svn.apache.org/repos/asf/labs/yay/trunk/
> > >>
> > >> 2012/6/13 Thomas Jungblut <th...@googlemail.com>
> > >>
> > >> > I'm going to focus still on batch learning, my next aim would be to
> > try
> > >> out
> > >> > neural networks with BSP.
> > >> >
> > >> >
> > >> >
> > >>
> >
> http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=685414&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2FFabs_all.jsp%2FFarnumber%2F685414
> > >> >
> > >> > http://techreports.cs.queensu.ca/files/1997-406.pdf
> > >> >
> > >> > Along with the pSVM we have then two strong learners. If you're
> > >> interested,
> > >> > pass me a private message. But I have to write a few exams next week
> > so
> > >> I'm
> > >> > busy and this is just an idea, we'll see how fast I can get a
> > prototye.
> > >> >
> > >> > Real time is difficult at the moment, we need the out of sync
> > messaging.
> > >> >
> > >> > 2012/6/13 Edward J. Yoon <ed...@apache.org>
> > >> >
> > >> > > Thank you for your sharing!
> > >> > >
> > >> > > On Wed, Jun 13, 2012 at 7:03 PM, Tommaso Teofili
> > >> > > <to...@gmail.com> wrote:
> > >> > > > following up with this discussion on our dev list, I found an
> > >> > > introductory
> > >> > > > pdf to online ML which may be useful [1]
> > >> > > > Apart fromt that we can start by creating the module structure
> in
> > >> hama
> > >> > > svn
> > >> > > > (still the incubator one as the TLP move seems to take a while).
> > >> > > > Regards,
> > >> > > > Tommaso
> > >> > > >
> > >> > > > [1] :
> > >> > http://www.springerlink.com/content/m480047m572t6262/fulltext.pdf
> > >> > > >
> > >> > > > 2012/5/25 Edward J. Yoon <ed...@apache.org>
> > >> > > >
> > >> > > >> I'm roughly thinking to create new module so that I can add 3rd
> > >> party
> > >> > > >> dependencies easily.
> > >> > > >>
> > >> > > >> On Fri, May 25, 2012 at 4:36 PM, Tommaso Teofili
> > >> > > >> <to...@gmail.com> wrote:
> > >> > > >> > Do you have a plan for that Edward?
> > >> > > >> > A separate package in examples or a separate (online) machine
> > >> > learning
> > >> > > >> > module? Or something else?
> > >> > > >> > Regards
> > >> > > >> > Tommaso
> > >> > > >> >
> > >> > > >> > 2012/5/25 Edward J. Yoon <ed...@apache.org>
> > >> > > >> >
> > >> > > >> >> OKay, then let's get started.
> > >> > > >> >>
> > >> > > >> >> My first idea is simple online recommendation system based
> on
> > >> > > >> click-stream
> > >> > > >> >> data.
> > >> > > >> >>
> > >> > > >> >> On Thu, May 24, 2012 at 6:26 PM, Praveen Sripati
> > >> > > >> >> <pr...@gmail.com> wrote:
> > >> > > >> >> > +1
> > >> > > >> >> >
> > >> > > >> >> > For those who are interested in ML, please check this. GNU
> > >> Octave
> > >> > > is
> > >> > > >> >> used.
> > >> > > >> >> >
> > >> > > >> >> > https://www.coursera.org/course/ml
> > >> > > >> >> >
> > >> > > >> >> > Another session is yet to be announced.
> > >> > > >> >> >
> > >> > > >> >> > Thanks,
> > >> > > >> >> > Praveen
> > >> > > >> >> >
> > >> > > >> >> > On Thu, May 24, 2012 at 12:54 PM, Thomas Jungblut <
> > >> > > >> >> > thomas.jungblut@googlemail.com> wrote:
> > >> > > >> >> >
> > >> > > >> >> >> +1
> > >> > > >> >> >>
> > >> > > >> >> >> 2012/5/24 Tommaso Teofili <to...@gmail.com>
> > >> > > >> >> >>
> > >> > > >> >> >> > and same here :)
> > >> > > >> >> >> >
> > >> > > >> >> >> > 2012/5/24 Vaijanath Rao <va...@gmail.com>
> > >> > > >> >> >> >
> > >> > > >> >> >> > > +1 me too
> > >> > > >> >> >> > > On May 23, 2012 10:26 PM, "Aditya Sarawgi" <
> > >> > > >> >> sarawgi.aditya@gmail.com>
> > >> > > >> >> >> > > wrote:
> > >> > > >> >> >> > >
> > >> > > >> >> >> > > > +1
> > >> > > >> >> >> > > > I would be happy to help :)
> > >> > > >> >> >> > > >
> > >> > > >> >> >> > > > On Wed, May 23, 2012 at 6:23 PM, Edward J. Yoon <
> > >> > > >> >> >> edwardyoon@apache.org
> > >> > > >> >> >> > > > >wrote:
> > >> > > >> >> >> > > >
> > >> > > >> >> >> > > > > Hi,
> > >> > > >> >> >> > > > >
> > >> > > >> >> >> > > > > Does anyone interesting in online machine
> learning?
> > >> > > >> >> >> > > > >
> > >> > > >> >> >> > > > > --
> > >> > > >> >> >> > > > > Best Regards, Edward J. Yoon
> > >> > > >> >> >> > > > > @eddieyoon
> > >> > > >> >> >> > > > >
> > >> > > >> >> >> > > >
> > >> > > >> >> >> > > >
> > >> > > >> >> >> > > >
> > >> > > >> >> >> > > > --
> > >> > > >> >> >> > > > Cheers,
> > >> > > >> >> >> > > > Aditya Sarawgi
> > >> > > >> >> >> > > >
> > >> > > >> >> >> > >
> > >> > > >> >> >> >
> > >> > > >> >> >>
> > >> > > >> >> >>
> > >> > > >> >> >>
> > >> > > >> >> >> --
> > >> > > >> >> >> Thomas Jungblut
> > >> > > >> >> >> Berlin <th...@gmail.com>
> > >> > > >> >> >>
> > >> > > >> >>
> > >> > > >> >>
> > >> > > >> >>
> > >> > > >> >> --
> > >> > > >> >> Best Regards, Edward J. Yoon
> > >> > > >> >> @eddieyoon
> > >> > > >> >>
> > >> > > >>
> > >> > > >>
> > >> > > >>
> > >> > > >> --
> > >> > > >> Best Regards, Edward J. Yoon
> > >> > > >> @eddieyoon
> > >> > > >>
> > >> > >
> > >> > >
> > >> > >
> > >> > > --
> > >> > > Best Regards, Edward J. Yoon
> > >> > > @eddieyoon
> > >> > >
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > Thomas Jungblut
> > >> > Berlin <th...@gmail.com>
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > > Thomas Jungblut
> > > Berlin <th...@gmail.com>
> > >
> >
> >
> >
> > --
> > Thomas Jungblut
> > Berlin <th...@gmail.com>
> >
>



-- 
Thomas Jungblut
Berlin <th...@gmail.com>

Re: Online machine learning on top of Hama BSP

Posted by Suraj Menon <su...@apache.org>.

Just adding my 2 cents. Thomas, this goes in line with the discussion we
had recently on how Hama should have a superstep library, where each
superstep does something that potential user (In this case, our machine
learning library) can override and use. Few ideas for superstep library:

1. RealTimeSuperstep (extends Superstep but does not sync)
2. MutualBroadcastSuperstep (extends Superstep; used where all the peers
have to send all their messages to each other. We should employ a peer
communication strategy such that every peer internally does not have to
open RPC connection with every other peer)
3. Mapper and Reducer(I have one WordCount test running for small set of
data. Will need more time to increase its scalability, the first step of
MapReduce would have to use MutualBroadCast.
4. OutputCommitter (a Superstep that would write output records to HDFS not
based on the peer ID)
5. IterativeSuperstep (that holds static information on every iteration and
checkpoints them)
6.. more expected as we work on new ideas.


-Suraj


On Thu, Jun 14, 2012 at 2:45 PM, Thomas Jungblut <
thomas.jungblut@googlemail.com> wrote:

> I have read a bit about batch neural networks and I think I have found a
> viable solution for BSP.
> The funny thing is, that it is the same intuition that my kmeans clustering
> has.
>
> Each task is processing on a local block of the data, training a full model
> for itself (making a forward pass and calculating the error of the output
> neurons against the prediction).
> Now after you have iterated over all the observations, you are going to
> send all the weights of your neurons and the error (let's say the average
> error over all observations) to all the other tasks.
> After sync, each tasks has #tasks weights for a neuron and the avg
> prediction error, now the weights are accumulated and the backward step
> with the error begins.
> When all weights are backpropagated on each task, you can start reading the
> whole observations again and make the next epoch. (until some minimum
> average error has been seen or maximum epochs has been reached).
>
> Don't know if that is a common pattern in machine learning, but seems to me
> like we can extract some kind of API that helps building local models and
> combining them again in the next superstep with more information (think of
> the Pregel API with compute, but not on vertex level but on task level).
>
> What do you think about that?
>
> 2012/6/14 Thomas Jungblut <th...@googlemail.com>
>
> > Very cool project, I just need a few vectors and matrices where I will
> use
> > my own library first.
> >
> > Still having a hard time to distribute the network and update it
> > accordingly in backprop. If you have smart ideas, let me know.
> >
> >
> > 2012/6/14 Tommaso Teofili <to...@gmail.com>
> >
> >> Hi Thomas,
> >> regarding neural networks I'm also working on it within Apache Yay (my
> >> Apache labs project [1]) and I agree it'd make sense to run neural
> network
> >> algorithms on top of Hama, however at this stage I've just a prototype
> in
> >> memory implementation for feedforward (no actual learning) neural
> >> networks.
> >> Apart from that I think we need a math/linear algebra package running on
> >> top of Hama to make those algorithms scale nicely.
> >> I agree we can start from batch and then switch to online machine
> learning
> >> algorithms.
> >> Regards,
> >> Tommaso
> >>
> >> [1] : http://svn.apache.org/repos/asf/labs/yay/trunk/
> >>
> >> 2012/6/13 Thomas Jungblut <th...@googlemail.com>
> >>
> >> > I'm going to focus still on batch learning, my next aim would be to
> try
> >> out
> >> > neural networks with BSP.
> >> >
> >> >
> >> >
> >>
> http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=685414&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2FFabs_all.jsp%2FFarnumber%2F685414
> >> >
> >> > http://techreports.cs.queensu.ca/files/1997-406.pdf
> >> >
> >> > Along with the pSVM we have then two strong learners. If you're
> >> interested,
> >> > pass me a private message. But I have to write a few exams next week
> so
> >> I'm
> >> > busy and this is just an idea, we'll see how fast I can get a
> prototye.
> >> >
> >> > Real time is difficult at the moment, we need the out of sync
> messaging.
> >> >
> >> > 2012/6/13 Edward J. Yoon <ed...@apache.org>
> >> >
> >> > > Thank you for your sharing!
> >> > >
> >> > > On Wed, Jun 13, 2012 at 7:03 PM, Tommaso Teofili
> >> > > <to...@gmail.com> wrote:
> >> > > > following up with this discussion on our dev list, I found an
> >> > > introductory
> >> > > > pdf to online ML which may be useful [1]
> >> > > > Apart fromt that we can start by creating the module structure in
> >> hama
> >> > > svn
> >> > > > (still the incubator one as the TLP move seems to take a while).
> >> > > > Regards,
> >> > > > Tommaso
> >> > > >
> >> > > > [1] :
> >> > http://www.springerlink.com/content/m480047m572t6262/fulltext.pdf
> >> > > >
> >> > > > 2012/5/25 Edward J. Yoon <ed...@apache.org>
> >> > > >
> >> > > >> I'm roughly thinking to create new module so that I can add 3rd
> >> party
> >> > > >> dependencies easily.
> >> > > >>
> >> > > >> On Fri, May 25, 2012 at 4:36 PM, Tommaso Teofili
> >> > > >> <to...@gmail.com> wrote:
> >> > > >> > Do you have a plan for that Edward?
> >> > > >> > A separate package in examples or a separate (online) machine
> >> > learning
> >> > > >> > module? Or something else?
> >> > > >> > Regards
> >> > > >> > Tommaso
> >> > > >> >
> >> > > >> > 2012/5/25 Edward J. Yoon <ed...@apache.org>
> >> > > >> >
> >> > > >> >> OKay, then let's get started.
> >> > > >> >>
> >> > > >> >> My first idea is simple online recommendation system based on
> >> > > >> click-stream
> >> > > >> >> data.
> >> > > >> >>
> >> > > >> >> On Thu, May 24, 2012 at 6:26 PM, Praveen Sripati
> >> > > >> >> <pr...@gmail.com> wrote:
> >> > > >> >> > +1
> >> > > >> >> >
> >> > > >> >> > For those who are interested in ML, please check this. GNU
> >> Octave
> >> > > is
> >> > > >> >> used.
> >> > > >> >> >
> >> > > >> >> > https://www.coursera.org/course/ml
> >> > > >> >> >
> >> > > >> >> > Another session is yet to be announced.
> >> > > >> >> >
> >> > > >> >> > Thanks,
> >> > > >> >> > Praveen
> >> > > >> >> >
> >> > > >> >> > On Thu, May 24, 2012 at 12:54 PM, Thomas Jungblut <
> >> > > >> >> > thomas.jungblut@googlemail.com> wrote:
> >> > > >> >> >
> >> > > >> >> >> +1
> >> > > >> >> >>
> >> > > >> >> >> 2012/5/24 Tommaso Teofili <to...@gmail.com>
> >> > > >> >> >>
> >> > > >> >> >> > and same here :)
> >> > > >> >> >> >
> >> > > >> >> >> > 2012/5/24 Vaijanath Rao <va...@gmail.com>
> >> > > >> >> >> >
> >> > > >> >> >> > > +1 me too
> >> > > >> >> >> > > On May 23, 2012 10:26 PM, "Aditya Sarawgi" <
> >> > > >> >> sarawgi.aditya@gmail.com>
> >> > > >> >> >> > > wrote:
> >> > > >> >> >> > >
> >> > > >> >> >> > > > +1
> >> > > >> >> >> > > > I would be happy to help :)
> >> > > >> >> >> > > >
> >> > > >> >> >> > > > On Wed, May 23, 2012 at 6:23 PM, Edward J. Yoon <
> >> > > >> >> >> edwardyoon@apache.org
> >> > > >> >> >> > > > >wrote:
> >> > > >> >> >> > > >
> >> > > >> >> >> > > > > Hi,
> >> > > >> >> >> > > > >
> >> > > >> >> >> > > > > Does anyone interesting in online machine learning?
> >> > > >> >> >> > > > >
> >> > > >> >> >> > > > > --
> >> > > >> >> >> > > > > Best Regards, Edward J. Yoon
> >> > > >> >> >> > > > > @eddieyoon
> >> > > >> >> >> > > > >
> >> > > >> >> >> > > >
> >> > > >> >> >> > > >
> >> > > >> >> >> > > >
> >> > > >> >> >> > > > --
> >> > > >> >> >> > > > Cheers,
> >> > > >> >> >> > > > Aditya Sarawgi
> >> > > >> >> >> > > >
> >> > > >> >> >> > >
> >> > > >> >> >> >
> >> > > >> >> >>
> >> > > >> >> >>
> >> > > >> >> >>
> >> > > >> >> >> --
> >> > > >> >> >> Thomas Jungblut
> >> > > >> >> >> Berlin <th...@gmail.com>
> >> > > >> >> >>
> >> > > >> >>
> >> > > >> >>
> >> > > >> >>
> >> > > >> >> --
> >> > > >> >> Best Regards, Edward J. Yoon
> >> > > >> >> @eddieyoon
> >> > > >> >>
> >> > > >>
> >> > > >>
> >> > > >>
> >> > > >> --
> >> > > >> Best Regards, Edward J. Yoon
> >> > > >> @eddieyoon
> >> > > >>
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > Best Regards, Edward J. Yoon
> >> > > @eddieyoon
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Thomas Jungblut
> >> > Berlin <th...@gmail.com>
> >> >
> >>
> >
> >
> >
> > --
> > Thomas Jungblut
> > Berlin <th...@gmail.com>
> >
>
>
>
> --
> Thomas Jungblut
> Berlin <th...@gmail.com>
>

Re: Online machine learning on top of Hama BSP

Posted by Thomas Jungblut <th...@googlemail.com>.

I have read a bit about batch neural networks and I think I have found a
viable solution for BSP.
The funny thing is, that it is the same intuition that my kmeans clustering
has.

Each task is processing on a local block of the data, training a full model
for itself (making a forward pass and calculating the error of the output
neurons against the prediction).
Now after you have iterated over all the observations, you are going to
send all the weights of your neurons and the error (let's say the average
error over all observations) to all the other tasks.
After sync, each tasks has #tasks weights for a neuron and the avg
prediction error, now the weights are accumulated and the backward step
with the error begins.
When all weights are backpropagated on each task, you can start reading the
whole observations again and make the next epoch. (until some minimum
average error has been seen or maximum epochs has been reached).

Don't know if that is a common pattern in machine learning, but seems to me
like we can extract some kind of API that helps building local models and
combining them again in the next superstep with more information (think of
the Pregel API with compute, but not on vertex level but on task level).

What do you think about that?

2012/6/14 Thomas Jungblut <th...@googlemail.com>

> Very cool project, I just need a few vectors and matrices where I will use
> my own library first.
>
> Still having a hard time to distribute the network and update it
> accordingly in backprop. If you have smart ideas, let me know.
>
>
> 2012/6/14 Tommaso Teofili <to...@gmail.com>
>
>> Hi Thomas,
>> regarding neural networks I'm also working on it within Apache Yay (my
>> Apache labs project [1]) and I agree it'd make sense to run neural network
>> algorithms on top of Hama, however at this stage I've just a prototype in
>> memory implementation for feedforward (no actual learning) neural
>> networks.
>> Apart from that I think we need a math/linear algebra package running on
>> top of Hama to make those algorithms scale nicely.
>> I agree we can start from batch and then switch to online machine learning
>> algorithms.
>> Regards,
>> Tommaso
>>
>> [1] : http://svn.apache.org/repos/asf/labs/yay/trunk/
>>
>> 2012/6/13 Thomas Jungblut <th...@googlemail.com>
>>
>> > I'm going to focus still on batch learning, my next aim would be to try
>> out
>> > neural networks with BSP.
>> >
>> >
>> >
>> http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=685414&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2FFabs_all.jsp%2FFarnumber%2F685414
>> >
>> > http://techreports.cs.queensu.ca/files/1997-406.pdf
>> >
>> > Along with the pSVM we have then two strong learners. If you're
>> interested,
>> > pass me a private message. But I have to write a few exams next week so
>> I'm
>> > busy and this is just an idea, we'll see how fast I can get a prototye.
>> >
>> > Real time is difficult at the moment, we need the out of sync messaging.
>> >
>> > 2012/6/13 Edward J. Yoon <ed...@apache.org>
>> >
>> > > Thank you for your sharing!
>> > >
>> > > On Wed, Jun 13, 2012 at 7:03 PM, Tommaso Teofili
>> > > <to...@gmail.com> wrote:
>> > > > following up with this discussion on our dev list, I found an
>> > > introductory
>> > > > pdf to online ML which may be useful [1]
>> > > > Apart fromt that we can start by creating the module structure in
>> hama
>> > > svn
>> > > > (still the incubator one as the TLP move seems to take a while).
>> > > > Regards,
>> > > > Tommaso
>> > > >
>> > > > [1] :
>> > http://www.springerlink.com/content/m480047m572t6262/fulltext.pdf
>> > > >
>> > > > 2012/5/25 Edward J. Yoon <ed...@apache.org>
>> > > >
>> > > >> I'm roughly thinking to create new module so that I can add 3rd
>> party
>> > > >> dependencies easily.
>> > > >>
>> > > >> On Fri, May 25, 2012 at 4:36 PM, Tommaso Teofili
>> > > >> <to...@gmail.com> wrote:
>> > > >> > Do you have a plan for that Edward?
>> > > >> > A separate package in examples or a separate (online) machine
>> > learning
>> > > >> > module? Or something else?
>> > > >> > Regards
>> > > >> > Tommaso
>> > > >> >
>> > > >> > 2012/5/25 Edward J. Yoon <ed...@apache.org>
>> > > >> >
>> > > >> >> OKay, then let's get started.
>> > > >> >>
>> > > >> >> My first idea is simple online recommendation system based on
>> > > >> click-stream
>> > > >> >> data.
>> > > >> >>
>> > > >> >> On Thu, May 24, 2012 at 6:26 PM, Praveen Sripati
>> > > >> >> <pr...@gmail.com> wrote:
>> > > >> >> > +1
>> > > >> >> >
>> > > >> >> > For those who are interested in ML, please check this. GNU
>> Octave
>> > > is
>> > > >> >> used.
>> > > >> >> >
>> > > >> >> > https://www.coursera.org/course/ml
>> > > >> >> >
>> > > >> >> > Another session is yet to be announced.
>> > > >> >> >
>> > > >> >> > Thanks,
>> > > >> >> > Praveen
>> > > >> >> >
>> > > >> >> > On Thu, May 24, 2012 at 12:54 PM, Thomas Jungblut <
>> > > >> >> > thomas.jungblut@googlemail.com> wrote:
>> > > >> >> >
>> > > >> >> >> +1
>> > > >> >> >>
>> > > >> >> >> 2012/5/24 Tommaso Teofili <to...@gmail.com>
>> > > >> >> >>
>> > > >> >> >> > and same here :)
>> > > >> >> >> >
>> > > >> >> >> > 2012/5/24 Vaijanath Rao <va...@gmail.com>
>> > > >> >> >> >
>> > > >> >> >> > > +1 me too
>> > > >> >> >> > > On May 23, 2012 10:26 PM, "Aditya Sarawgi" <
>> > > >> >> sarawgi.aditya@gmail.com>
>> > > >> >> >> > > wrote:
>> > > >> >> >> > >
>> > > >> >> >> > > > +1
>> > > >> >> >> > > > I would be happy to help :)
>> > > >> >> >> > > >
>> > > >> >> >> > > > On Wed, May 23, 2012 at 6:23 PM, Edward J. Yoon <
>> > > >> >> >> edwardyoon@apache.org
>> > > >> >> >> > > > >wrote:
>> > > >> >> >> > > >
>> > > >> >> >> > > > > Hi,
>> > > >> >> >> > > > >
>> > > >> >> >> > > > > Does anyone interesting in online machine learning?
>> > > >> >> >> > > > >
>> > > >> >> >> > > > > --
>> > > >> >> >> > > > > Best Regards, Edward J. Yoon
>> > > >> >> >> > > > > @eddieyoon
>> > > >> >> >> > > > >
>> > > >> >> >> > > >
>> > > >> >> >> > > >
>> > > >> >> >> > > >
>> > > >> >> >> > > > --
>> > > >> >> >> > > > Cheers,
>> > > >> >> >> > > > Aditya Sarawgi
>> > > >> >> >> > > >
>> > > >> >> >> > >
>> > > >> >> >> >
>> > > >> >> >>
>> > > >> >> >>
>> > > >> >> >>
>> > > >> >> >> --
>> > > >> >> >> Thomas Jungblut
>> > > >> >> >> Berlin <th...@gmail.com>
>> > > >> >> >>
>> > > >> >>
>> > > >> >>
>> > > >> >>
>> > > >> >> --
>> > > >> >> Best Regards, Edward J. Yoon
>> > > >> >> @eddieyoon
>> > > >> >>
>> > > >>
>> > > >>
>> > > >>
>> > > >> --
>> > > >> Best Regards, Edward J. Yoon
>> > > >> @eddieyoon
>> > > >>
>> > >
>> > >
>> > >
>> > > --
>> > > Best Regards, Edward J. Yoon
>> > > @eddieyoon
>> > >
>> >
>> >
>> >
>> > --
>> > Thomas Jungblut
>> > Berlin <th...@gmail.com>
>> >
>>
>
>
>
> --
> Thomas Jungblut
> Berlin <th...@gmail.com>
>



-- 
Thomas Jungblut
Berlin <th...@gmail.com>

Re: Online machine learning on top of Hama BSP

Posted by Thomas Jungblut <th...@googlemail.com>.

Very cool project, I just need a few vectors and matrices where I will use
my own library first.

Still having a hard time to distribute the network and update it
accordingly in backprop. If you have smart ideas, let me know.

2012/6/14 Tommaso Teofili <to...@gmail.com>

> Hi Thomas,
> regarding neural networks I'm also working on it within Apache Yay (my
> Apache labs project [1]) and I agree it'd make sense to run neural network
> algorithms on top of Hama, however at this stage I've just a prototype in
> memory implementation for feedforward (no actual learning) neural networks.
> Apart from that I think we need a math/linear algebra package running on
> top of Hama to make those algorithms scale nicely.
> I agree we can start from batch and then switch to online machine learning
> algorithms.
> Regards,
> Tommaso
>
> [1] : http://svn.apache.org/repos/asf/labs/yay/trunk/
>
> 2012/6/13 Thomas Jungblut <th...@googlemail.com>
>
> > I'm going to focus still on batch learning, my next aim would be to try
> out
> > neural networks with BSP.
> >
> >
> >
> http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=685414&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2FFabs_all.jsp%2FFarnumber%2F685414
> >
> > http://techreports.cs.queensu.ca/files/1997-406.pdf
> >
> > Along with the pSVM we have then two strong learners. If you're
> interested,
> > pass me a private message. But I have to write a few exams next week so
> I'm
> > busy and this is just an idea, we'll see how fast I can get a prototye.
> >
> > Real time is difficult at the moment, we need the out of sync messaging.
> >
> > 2012/6/13 Edward J. Yoon <ed...@apache.org>
> >
> > > Thank you for your sharing!
> > >
> > > On Wed, Jun 13, 2012 at 7:03 PM, Tommaso Teofili
> > > <to...@gmail.com> wrote:
> > > > following up with this discussion on our dev list, I found an
> > > introductory
> > > > pdf to online ML which may be useful [1]
> > > > Apart fromt that we can start by creating the module structure in
> hama
> > > svn
> > > > (still the incubator one as the TLP move seems to take a while).
> > > > Regards,
> > > > Tommaso
> > > >
> > > > [1] :
> > http://www.springerlink.com/content/m480047m572t6262/fulltext.pdf
> > > >
> > > > 2012/5/25 Edward J. Yoon <ed...@apache.org>
> > > >
> > > >> I'm roughly thinking to create new module so that I can add 3rd
> party
> > > >> dependencies easily.
> > > >>
> > > >> On Fri, May 25, 2012 at 4:36 PM, Tommaso Teofili
> > > >> <to...@gmail.com> wrote:
> > > >> > Do you have a plan for that Edward?
> > > >> > A separate package in examples or a separate (online) machine
> > learning
> > > >> > module? Or something else?
> > > >> > Regards
> > > >> > Tommaso
> > > >> >
> > > >> > 2012/5/25 Edward J. Yoon <ed...@apache.org>
> > > >> >
> > > >> >> OKay, then let's get started.
> > > >> >>
> > > >> >> My first idea is simple online recommendation system based on
> > > >> click-stream
> > > >> >> data.
> > > >> >>
> > > >> >> On Thu, May 24, 2012 at 6:26 PM, Praveen Sripati
> > > >> >> <pr...@gmail.com> wrote:
> > > >> >> > +1
> > > >> >> >
> > > >> >> > For those who are interested in ML, please check this. GNU
> Octave
> > > is
> > > >> >> used.
> > > >> >> >
> > > >> >> > https://www.coursera.org/course/ml
> > > >> >> >
> > > >> >> > Another session is yet to be announced.
> > > >> >> >
> > > >> >> > Thanks,
> > > >> >> > Praveen
> > > >> >> >
> > > >> >> > On Thu, May 24, 2012 at 12:54 PM, Thomas Jungblut <
> > > >> >> > thomas.jungblut@googlemail.com> wrote:
> > > >> >> >
> > > >> >> >> +1
> > > >> >> >>
> > > >> >> >> 2012/5/24 Tommaso Teofili <to...@gmail.com>
> > > >> >> >>
> > > >> >> >> > and same here :)
> > > >> >> >> >
> > > >> >> >> > 2012/5/24 Vaijanath Rao <va...@gmail.com>
> > > >> >> >> >
> > > >> >> >> > > +1 me too
> > > >> >> >> > > On May 23, 2012 10:26 PM, "Aditya Sarawgi" <
> > > >> >> sarawgi.aditya@gmail.com>
> > > >> >> >> > > wrote:
> > > >> >> >> > >
> > > >> >> >> > > > +1
> > > >> >> >> > > > I would be happy to help :)
> > > >> >> >> > > >
> > > >> >> >> > > > On Wed, May 23, 2012 at 6:23 PM, Edward J. Yoon <
> > > >> >> >> edwardyoon@apache.org
> > > >> >> >> > > > >wrote:
> > > >> >> >> > > >
> > > >> >> >> > > > > Hi,
> > > >> >> >> > > > >
> > > >> >> >> > > > > Does anyone interesting in online machine learning?
> > > >> >> >> > > > >
> > > >> >> >> > > > > --
> > > >> >> >> > > > > Best Regards, Edward J. Yoon
> > > >> >> >> > > > > @eddieyoon
> > > >> >> >> > > > >
> > > >> >> >> > > >
> > > >> >> >> > > >
> > > >> >> >> > > >
> > > >> >> >> > > > --
> > > >> >> >> > > > Cheers,
> > > >> >> >> > > > Aditya Sarawgi
> > > >> >> >> > > >
> > > >> >> >> > >
> > > >> >> >> >
> > > >> >> >>
> > > >> >> >>
> > > >> >> >>
> > > >> >> >> --
> > > >> >> >> Thomas Jungblut
> > > >> >> >> Berlin <th...@gmail.com>
> > > >> >> >>
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >> --
> > > >> >> Best Regards, Edward J. Yoon
> > > >> >> @eddieyoon
> > > >> >>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Best Regards, Edward J. Yoon
> > > >> @eddieyoon
> > > >>
> > >
> > >
> > >
> > > --
> > > Best Regards, Edward J. Yoon
> > > @eddieyoon
> > >
> >
> >
> >
> > --
> > Thomas Jungblut
> > Berlin <th...@gmail.com>
> >
>



-- 
Thomas Jungblut
Berlin <th...@gmail.com>

Re: Online machine learning on top of Hama BSP

Posted by Tommaso Teofili <to...@gmail.com>.

Hi Thomas,
regarding neural networks I'm also working on it within Apache Yay (my
Apache labs project [1]) and I agree it'd make sense to run neural network
algorithms on top of Hama, however at this stage I've just a prototype in
memory implementation for feedforward (no actual learning) neural networks.
Apart from that I think we need a math/linear algebra package running on
top of Hama to make those algorithms scale nicely.
I agree we can start from batch and then switch to online machine learning
algorithms.
Regards,
Tommaso

[1] : http://svn.apache.org/repos/asf/labs/yay/trunk/

2012/6/13 Thomas Jungblut <th...@googlemail.com>

> I'm going to focus still on batch learning, my next aim would be to try out
> neural networks with BSP.
>
>
> http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=685414&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2FFabs_all.jsp%2FFarnumber%2F685414
>
> http://techreports.cs.queensu.ca/files/1997-406.pdf
>
> Along with the pSVM we have then two strong learners. If you're interested,
> pass me a private message. But I have to write a few exams next week so I'm
> busy and this is just an idea, we'll see how fast I can get a prototye.
>
> Real time is difficult at the moment, we need the out of sync messaging.
>
> 2012/6/13 Edward J. Yoon <ed...@apache.org>
>
> > Thank you for your sharing!
> >
> > On Wed, Jun 13, 2012 at 7:03 PM, Tommaso Teofili
> > <to...@gmail.com> wrote:
> > > following up with this discussion on our dev list, I found an
> > introductory
> > > pdf to online ML which may be useful [1]
> > > Apart fromt that we can start by creating the module structure in hama
> > svn
> > > (still the incubator one as the TLP move seems to take a while).
> > > Regards,
> > > Tommaso
> > >
> > > [1] :
> http://www.springerlink.com/content/m480047m572t6262/fulltext.pdf
> > >
> > > 2012/5/25 Edward J. Yoon <ed...@apache.org>
> > >
> > >> I'm roughly thinking to create new module so that I can add 3rd party
> > >> dependencies easily.
> > >>
> > >> On Fri, May 25, 2012 at 4:36 PM, Tommaso Teofili
> > >> <to...@gmail.com> wrote:
> > >> > Do you have a plan for that Edward?
> > >> > A separate package in examples or a separate (online) machine
> learning
> > >> > module? Or something else?
> > >> > Regards
> > >> > Tommaso
> > >> >
> > >> > 2012/5/25 Edward J. Yoon <ed...@apache.org>
> > >> >
> > >> >> OKay, then let's get started.
> > >> >>
> > >> >> My first idea is simple online recommendation system based on
> > >> click-stream
> > >> >> data.
> > >> >>
> > >> >> On Thu, May 24, 2012 at 6:26 PM, Praveen Sripati
> > >> >> <pr...@gmail.com> wrote:
> > >> >> > +1
> > >> >> >
> > >> >> > For those who are interested in ML, please check this. GNU Octave
> > is
> > >> >> used.
> > >> >> >
> > >> >> > https://www.coursera.org/course/ml
> > >> >> >
> > >> >> > Another session is yet to be announced.
> > >> >> >
> > >> >> > Thanks,
> > >> >> > Praveen
> > >> >> >
> > >> >> > On Thu, May 24, 2012 at 12:54 PM, Thomas Jungblut <
> > >> >> > thomas.jungblut@googlemail.com> wrote:
> > >> >> >
> > >> >> >> +1
> > >> >> >>
> > >> >> >> 2012/5/24 Tommaso Teofili <to...@gmail.com>
> > >> >> >>
> > >> >> >> > and same here :)
> > >> >> >> >
> > >> >> >> > 2012/5/24 Vaijanath Rao <va...@gmail.com>
> > >> >> >> >
> > >> >> >> > > +1 me too
> > >> >> >> > > On May 23, 2012 10:26 PM, "Aditya Sarawgi" <
> > >> >> sarawgi.aditya@gmail.com>
> > >> >> >> > > wrote:
> > >> >> >> > >
> > >> >> >> > > > +1
> > >> >> >> > > > I would be happy to help :)
> > >> >> >> > > >
> > >> >> >> > > > On Wed, May 23, 2012 at 6:23 PM, Edward J. Yoon <
> > >> >> >> edwardyoon@apache.org
> > >> >> >> > > > >wrote:
> > >> >> >> > > >
> > >> >> >> > > > > Hi,
> > >> >> >> > > > >
> > >> >> >> > > > > Does anyone interesting in online machine learning?
> > >> >> >> > > > >
> > >> >> >> > > > > --
> > >> >> >> > > > > Best Regards, Edward J. Yoon
> > >> >> >> > > > > @eddieyoon
> > >> >> >> > > > >
> > >> >> >> > > >
> > >> >> >> > > >
> > >> >> >> > > >
> > >> >> >> > > > --
> > >> >> >> > > > Cheers,
> > >> >> >> > > > Aditya Sarawgi
> > >> >> >> > > >
> > >> >> >> > >
> > >> >> >> >
> > >> >> >>
> > >> >> >>
> > >> >> >>
> > >> >> >> --
> > >> >> >> Thomas Jungblut
> > >> >> >> Berlin <th...@gmail.com>
> > >> >> >>
> > >> >>
> > >> >>
> > >> >>
> > >> >> --
> > >> >> Best Regards, Edward J. Yoon
> > >> >> @eddieyoon
> > >> >>
> > >>
> > >>
> > >>
> > >> --
> > >> Best Regards, Edward J. Yoon
> > >> @eddieyoon
> > >>
> >
> >
> >
> > --
> > Best Regards, Edward J. Yoon
> > @eddieyoon
> >
>
>
>
> --
> Thomas Jungblut
> Berlin <th...@gmail.com>
>

Re: Online machine learning on top of Hama BSP

Posted by "Edward J. Yoon" <ed...@apache.org>.

We can separate them into two packages in ML module like this.

Traditional batch machine learning algorithms:
 - org.apache.hama.ml.batch

Online machine learning algorithms for massive and streaming data:
 - org.apache.hama.ml.online

The biggest advantage is we can share reusable code between the two packages.

On Thu, Jun 14, 2012 at 1:10 AM, Thomas Jungblut
<th...@googlemail.com> wrote:
> I'm going to focus still on batch learning, my next aim would be to try out
> neural networks with BSP.
>
> http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=685414&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2FFabs_all.jsp%2FFarnumber%2F685414
>
> http://techreports.cs.queensu.ca/files/1997-406.pdf
>
> Along with the pSVM we have then two strong learners. If you're interested,
> pass me a private message. But I have to write a few exams next week so I'm
> busy and this is just an idea, we'll see how fast I can get a prototye.
>
> Real time is difficult at the moment, we need the out of sync messaging.
>
> 2012/6/13 Edward J. Yoon <ed...@apache.org>
>
>> Thank you for your sharing!
>>
>> On Wed, Jun 13, 2012 at 7:03 PM, Tommaso Teofili
>> <to...@gmail.com> wrote:
>> > following up with this discussion on our dev list, I found an
>> introductory
>> > pdf to online ML which may be useful [1]
>> > Apart fromt that we can start by creating the module structure in hama
>> svn
>> > (still the incubator one as the TLP move seems to take a while).
>> > Regards,
>> > Tommaso
>> >
>> > [1] : http://www.springerlink.com/content/m480047m572t6262/fulltext.pdf
>> >
>> > 2012/5/25 Edward J. Yoon <ed...@apache.org>
>> >
>> >> I'm roughly thinking to create new module so that I can add 3rd party
>> >> dependencies easily.
>> >>
>> >> On Fri, May 25, 2012 at 4:36 PM, Tommaso Teofili
>> >> <to...@gmail.com> wrote:
>> >> > Do you have a plan for that Edward?
>> >> > A separate package in examples or a separate (online) machine learning
>> >> > module? Or something else?
>> >> > Regards
>> >> > Tommaso
>> >> >
>> >> > 2012/5/25 Edward J. Yoon <ed...@apache.org>
>> >> >
>> >> >> OKay, then let's get started.
>> >> >>
>> >> >> My first idea is simple online recommendation system based on
>> >> click-stream
>> >> >> data.
>> >> >>
>> >> >> On Thu, May 24, 2012 at 6:26 PM, Praveen Sripati
>> >> >> <pr...@gmail.com> wrote:
>> >> >> > +1
>> >> >> >
>> >> >> > For those who are interested in ML, please check this. GNU Octave
>> is
>> >> >> used.
>> >> >> >
>> >> >> > https://www.coursera.org/course/ml
>> >> >> >
>> >> >> > Another session is yet to be announced.
>> >> >> >
>> >> >> > Thanks,
>> >> >> > Praveen
>> >> >> >
>> >> >> > On Thu, May 24, 2012 at 12:54 PM, Thomas Jungblut <
>> >> >> > thomas.jungblut@googlemail.com> wrote:
>> >> >> >
>> >> >> >> +1
>> >> >> >>
>> >> >> >> 2012/5/24 Tommaso Teofili <to...@gmail.com>
>> >> >> >>
>> >> >> >> > and same here :)
>> >> >> >> >
>> >> >> >> > 2012/5/24 Vaijanath Rao <va...@gmail.com>
>> >> >> >> >
>> >> >> >> > > +1 me too
>> >> >> >> > > On May 23, 2012 10:26 PM, "Aditya Sarawgi" <
>> >> >> sarawgi.aditya@gmail.com>
>> >> >> >> > > wrote:
>> >> >> >> > >
>> >> >> >> > > > +1
>> >> >> >> > > > I would be happy to help :)
>> >> >> >> > > >
>> >> >> >> > > > On Wed, May 23, 2012 at 6:23 PM, Edward J. Yoon <
>> >> >> >> edwardyoon@apache.org
>> >> >> >> > > > >wrote:
>> >> >> >> > > >
>> >> >> >> > > > > Hi,
>> >> >> >> > > > >
>> >> >> >> > > > > Does anyone interesting in online machine learning?
>> >> >> >> > > > >
>> >> >> >> > > > > --
>> >> >> >> > > > > Best Regards, Edward J. Yoon
>> >> >> >> > > > > @eddieyoon
>> >> >> >> > > > >
>> >> >> >> > > >
>> >> >> >> > > >
>> >> >> >> > > >
>> >> >> >> > > > --
>> >> >> >> > > > Cheers,
>> >> >> >> > > > Aditya Sarawgi
>> >> >> >> > > >
>> >> >> >> > >
>> >> >> >> >
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> --
>> >> >> >> Thomas Jungblut
>> >> >> >> Berlin <th...@gmail.com>
>> >> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Best Regards, Edward J. Yoon
>> >> >> @eddieyoon
>> >> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Best Regards, Edward J. Yoon
>> >> @eddieyoon
>> >>
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>>
>
>
>
> --
> Thomas Jungblut
> Berlin <th...@gmail.com>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Online machine learning on top of Hama BSP

Posted by Thomas Jungblut <th...@googlemail.com>.

I'm going to focus still on batch learning, my next aim would be to try out
neural networks with BSP.

http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=685414&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2FFabs_all.jsp%2FFarnumber%2F685414

http://techreports.cs.queensu.ca/files/1997-406.pdf

Along with the pSVM we have then two strong learners. If you're interested,
pass me a private message. But I have to write a few exams next week so I'm
busy and this is just an idea, we'll see how fast I can get a prototye.

Real time is difficult at the moment, we need the out of sync messaging.

2012/6/13 Edward J. Yoon <ed...@apache.org>

> Thank you for your sharing!
>
> On Wed, Jun 13, 2012 at 7:03 PM, Tommaso Teofili
> <to...@gmail.com> wrote:
> > following up with this discussion on our dev list, I found an
> introductory
> > pdf to online ML which may be useful [1]
> > Apart fromt that we can start by creating the module structure in hama
> svn
> > (still the incubator one as the TLP move seems to take a while).
> > Regards,
> > Tommaso
> >
> > [1] : http://www.springerlink.com/content/m480047m572t6262/fulltext.pdf
> >
> > 2012/5/25 Edward J. Yoon <ed...@apache.org>
> >
> >> I'm roughly thinking to create new module so that I can add 3rd party
> >> dependencies easily.
> >>
> >> On Fri, May 25, 2012 at 4:36 PM, Tommaso Teofili
> >> <to...@gmail.com> wrote:
> >> > Do you have a plan for that Edward?
> >> > A separate package in examples or a separate (online) machine learning
> >> > module? Or something else?
> >> > Regards
> >> > Tommaso
> >> >
> >> > 2012/5/25 Edward J. Yoon <ed...@apache.org>
> >> >
> >> >> OKay, then let's get started.
> >> >>
> >> >> My first idea is simple online recommendation system based on
> >> click-stream
> >> >> data.
> >> >>
> >> >> On Thu, May 24, 2012 at 6:26 PM, Praveen Sripati
> >> >> <pr...@gmail.com> wrote:
> >> >> > +1
> >> >> >
> >> >> > For those who are interested in ML, please check this. GNU Octave
> is
> >> >> used.
> >> >> >
> >> >> > https://www.coursera.org/course/ml
> >> >> >
> >> >> > Another session is yet to be announced.
> >> >> >
> >> >> > Thanks,
> >> >> > Praveen
> >> >> >
> >> >> > On Thu, May 24, 2012 at 12:54 PM, Thomas Jungblut <
> >> >> > thomas.jungblut@googlemail.com> wrote:
> >> >> >
> >> >> >> +1
> >> >> >>
> >> >> >> 2012/5/24 Tommaso Teofili <to...@gmail.com>
> >> >> >>
> >> >> >> > and same here :)
> >> >> >> >
> >> >> >> > 2012/5/24 Vaijanath Rao <va...@gmail.com>
> >> >> >> >
> >> >> >> > > +1 me too
> >> >> >> > > On May 23, 2012 10:26 PM, "Aditya Sarawgi" <
> >> >> sarawgi.aditya@gmail.com>
> >> >> >> > > wrote:
> >> >> >> > >
> >> >> >> > > > +1
> >> >> >> > > > I would be happy to help :)
> >> >> >> > > >
> >> >> >> > > > On Wed, May 23, 2012 at 6:23 PM, Edward J. Yoon <
> >> >> >> edwardyoon@apache.org
> >> >> >> > > > >wrote:
> >> >> >> > > >
> >> >> >> > > > > Hi,
> >> >> >> > > > >
> >> >> >> > > > > Does anyone interesting in online machine learning?
> >> >> >> > > > >
> >> >> >> > > > > --
> >> >> >> > > > > Best Regards, Edward J. Yoon
> >> >> >> > > > > @eddieyoon
> >> >> >> > > > >
> >> >> >> > > >
> >> >> >> > > >
> >> >> >> > > >
> >> >> >> > > > --
> >> >> >> > > > Cheers,
> >> >> >> > > > Aditya Sarawgi
> >> >> >> > > >
> >> >> >> > >
> >> >> >> >
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >> Thomas Jungblut
> >> >> >> Berlin <th...@gmail.com>
> >> >> >>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Best Regards, Edward J. Yoon
> >> >> @eddieyoon
> >> >>
> >>
> >>
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >> @eddieyoon
> >>
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>



-- 
Thomas Jungblut
Berlin <th...@gmail.com>

Re: Online machine learning on top of Hama BSP

Posted by "Edward J. Yoon" <ed...@apache.org>.

Thank you for your sharing!

On Wed, Jun 13, 2012 at 7:03 PM, Tommaso Teofili
<to...@gmail.com> wrote:
> following up with this discussion on our dev list, I found an introductory
> pdf to online ML which may be useful [1]
> Apart fromt that we can start by creating the module structure in hama svn
> (still the incubator one as the TLP move seems to take a while).
> Regards,
> Tommaso
>
> [1] : http://www.springerlink.com/content/m480047m572t6262/fulltext.pdf
>
> 2012/5/25 Edward J. Yoon <ed...@apache.org>
>
>> I'm roughly thinking to create new module so that I can add 3rd party
>> dependencies easily.
>>
>> On Fri, May 25, 2012 at 4:36 PM, Tommaso Teofili
>> <to...@gmail.com> wrote:
>> > Do you have a plan for that Edward?
>> > A separate package in examples or a separate (online) machine learning
>> > module? Or something else?
>> > Regards
>> > Tommaso
>> >
>> > 2012/5/25 Edward J. Yoon <ed...@apache.org>
>> >
>> >> OKay, then let's get started.
>> >>
>> >> My first idea is simple online recommendation system based on
>> click-stream
>> >> data.
>> >>
>> >> On Thu, May 24, 2012 at 6:26 PM, Praveen Sripati
>> >> <pr...@gmail.com> wrote:
>> >> > +1
>> >> >
>> >> > For those who are interested in ML, please check this. GNU Octave is
>> >> used.
>> >> >
>> >> > https://www.coursera.org/course/ml
>> >> >
>> >> > Another session is yet to be announced.
>> >> >
>> >> > Thanks,
>> >> > Praveen
>> >> >
>> >> > On Thu, May 24, 2012 at 12:54 PM, Thomas Jungblut <
>> >> > thomas.jungblut@googlemail.com> wrote:
>> >> >
>> >> >> +1
>> >> >>
>> >> >> 2012/5/24 Tommaso Teofili <to...@gmail.com>
>> >> >>
>> >> >> > and same here :)
>> >> >> >
>> >> >> > 2012/5/24 Vaijanath Rao <va...@gmail.com>
>> >> >> >
>> >> >> > > +1 me too
>> >> >> > > On May 23, 2012 10:26 PM, "Aditya Sarawgi" <
>> >> sarawgi.aditya@gmail.com>
>> >> >> > > wrote:
>> >> >> > >
>> >> >> > > > +1
>> >> >> > > > I would be happy to help :)
>> >> >> > > >
>> >> >> > > > On Wed, May 23, 2012 at 6:23 PM, Edward J. Yoon <
>> >> >> edwardyoon@apache.org
>> >> >> > > > >wrote:
>> >> >> > > >
>> >> >> > > > > Hi,
>> >> >> > > > >
>> >> >> > > > > Does anyone interesting in online machine learning?
>> >> >> > > > >
>> >> >> > > > > --
>> >> >> > > > > Best Regards, Edward J. Yoon
>> >> >> > > > > @eddieyoon
>> >> >> > > > >
>> >> >> > > >
>> >> >> > > >
>> >> >> > > >
>> >> >> > > > --
>> >> >> > > > Cheers,
>> >> >> > > > Aditya Sarawgi
>> >> >> > > >
>> >> >> > >
>> >> >> >
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Thomas Jungblut
>> >> >> Berlin <th...@gmail.com>
>> >> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Best Regards, Edward J. Yoon
>> >> @eddieyoon
>> >>
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Online machine learning on top of Hama BSP

Posted by Tommaso Teofili <to...@gmail.com>.

following up with this discussion on our dev list, I found an introductory
pdf to online ML which may be useful [1]
Apart fromt that we can start by creating the module structure in hama svn
(still the incubator one as the TLP move seems to take a while).
Regards,
Tommaso

[1] : http://www.springerlink.com/content/m480047m572t6262/fulltext.pdf

2012/5/25 Edward J. Yoon <ed...@apache.org>

> I'm roughly thinking to create new module so that I can add 3rd party
> dependencies easily.
>
> On Fri, May 25, 2012 at 4:36 PM, Tommaso Teofili
> <to...@gmail.com> wrote:
> > Do you have a plan for that Edward?
> > A separate package in examples or a separate (online) machine learning
> > module? Or something else?
> > Regards
> > Tommaso
> >
> > 2012/5/25 Edward J. Yoon <ed...@apache.org>
> >
> >> OKay, then let's get started.
> >>
> >> My first idea is simple online recommendation system based on
> click-stream
> >> data.
> >>
> >> On Thu, May 24, 2012 at 6:26 PM, Praveen Sripati
> >> <pr...@gmail.com> wrote:
> >> > +1
> >> >
> >> > For those who are interested in ML, please check this. GNU Octave is
> >> used.
> >> >
> >> > https://www.coursera.org/course/ml
> >> >
> >> > Another session is yet to be announced.
> >> >
> >> > Thanks,
> >> > Praveen
> >> >
> >> > On Thu, May 24, 2012 at 12:54 PM, Thomas Jungblut <
> >> > thomas.jungblut@googlemail.com> wrote:
> >> >
> >> >> +1
> >> >>
> >> >> 2012/5/24 Tommaso Teofili <to...@gmail.com>
> >> >>
> >> >> > and same here :)
> >> >> >
> >> >> > 2012/5/24 Vaijanath Rao <va...@gmail.com>
> >> >> >
> >> >> > > +1 me too
> >> >> > > On May 23, 2012 10:26 PM, "Aditya Sarawgi" <
> >> sarawgi.aditya@gmail.com>
> >> >> > > wrote:
> >> >> > >
> >> >> > > > +1
> >> >> > > > I would be happy to help :)
> >> >> > > >
> >> >> > > > On Wed, May 23, 2012 at 6:23 PM, Edward J. Yoon <
> >> >> edwardyoon@apache.org
> >> >> > > > >wrote:
> >> >> > > >
> >> >> > > > > Hi,
> >> >> > > > >
> >> >> > > > > Does anyone interesting in online machine learning?
> >> >> > > > >
> >> >> > > > > --
> >> >> > > > > Best Regards, Edward J. Yoon
> >> >> > > > > @eddieyoon
> >> >> > > > >
> >> >> > > >
> >> >> > > >
> >> >> > > >
> >> >> > > > --
> >> >> > > > Cheers,
> >> >> > > > Aditya Sarawgi
> >> >> > > >
> >> >> > >
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Thomas Jungblut
> >> >> Berlin <th...@gmail.com>
> >> >>
> >>
> >>
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >> @eddieyoon
> >>
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>

Re: Online machine learning on top of Hama BSP

Posted by "Edward J. Yoon" <ed...@apache.org>.

I'm roughly thinking to create new module so that I can add 3rd party
dependencies easily.

On Fri, May 25, 2012 at 4:36 PM, Tommaso Teofili
<to...@gmail.com> wrote:
> Do you have a plan for that Edward?
> A separate package in examples or a separate (online) machine learning
> module? Or something else?
> Regards
> Tommaso
>
> 2012/5/25 Edward J. Yoon <ed...@apache.org>
>
>> OKay, then let's get started.
>>
>> My first idea is simple online recommendation system based on click-stream
>> data.
>>
>> On Thu, May 24, 2012 at 6:26 PM, Praveen Sripati
>> <pr...@gmail.com> wrote:
>> > +1
>> >
>> > For those who are interested in ML, please check this. GNU Octave is
>> used.
>> >
>> > https://www.coursera.org/course/ml
>> >
>> > Another session is yet to be announced.
>> >
>> > Thanks,
>> > Praveen
>> >
>> > On Thu, May 24, 2012 at 12:54 PM, Thomas Jungblut <
>> > thomas.jungblut@googlemail.com> wrote:
>> >
>> >> +1
>> >>
>> >> 2012/5/24 Tommaso Teofili <to...@gmail.com>
>> >>
>> >> > and same here :)
>> >> >
>> >> > 2012/5/24 Vaijanath Rao <va...@gmail.com>
>> >> >
>> >> > > +1 me too
>> >> > > On May 23, 2012 10:26 PM, "Aditya Sarawgi" <
>> sarawgi.aditya@gmail.com>
>> >> > > wrote:
>> >> > >
>> >> > > > +1
>> >> > > > I would be happy to help :)
>> >> > > >
>> >> > > > On Wed, May 23, 2012 at 6:23 PM, Edward J. Yoon <
>> >> edwardyoon@apache.org
>> >> > > > >wrote:
>> >> > > >
>> >> > > > > Hi,
>> >> > > > >
>> >> > > > > Does anyone interesting in online machine learning?
>> >> > > > >
>> >> > > > > --
>> >> > > > > Best Regards, Edward J. Yoon
>> >> > > > > @eddieyoon
>> >> > > > >
>> >> > > >
>> >> > > >
>> >> > > >
>> >> > > > --
>> >> > > > Cheers,
>> >> > > > Aditya Sarawgi
>> >> > > >
>> >> > >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Thomas Jungblut
>> >> Berlin <th...@gmail.com>
>> >>
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Online machine learning on top of Hama BSP

Posted by Tommaso Teofili <to...@gmail.com>.

Do you have a plan for that Edward?
A separate package in examples or a separate (online) machine learning
module? Or something else?
Regards
Tommaso

2012/5/25 Edward J. Yoon <ed...@apache.org>

> OKay, then let's get started.
>
> My first idea is simple online recommendation system based on click-stream
> data.
>
> On Thu, May 24, 2012 at 6:26 PM, Praveen Sripati
> <pr...@gmail.com> wrote:
> > +1
> >
> > For those who are interested in ML, please check this. GNU Octave is
> used.
> >
> > https://www.coursera.org/course/ml
> >
> > Another session is yet to be announced.
> >
> > Thanks,
> > Praveen
> >
> > On Thu, May 24, 2012 at 12:54 PM, Thomas Jungblut <
> > thomas.jungblut@googlemail.com> wrote:
> >
> >> +1
> >>
> >> 2012/5/24 Tommaso Teofili <to...@gmail.com>
> >>
> >> > and same here :)
> >> >
> >> > 2012/5/24 Vaijanath Rao <va...@gmail.com>
> >> >
> >> > > +1 me too
> >> > > On May 23, 2012 10:26 PM, "Aditya Sarawgi" <
> sarawgi.aditya@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > +1
> >> > > > I would be happy to help :)
> >> > > >
> >> > > > On Wed, May 23, 2012 at 6:23 PM, Edward J. Yoon <
> >> edwardyoon@apache.org
> >> > > > >wrote:
> >> > > >
> >> > > > > Hi,
> >> > > > >
> >> > > > > Does anyone interesting in online machine learning?
> >> > > > >
> >> > > > > --
> >> > > > > Best Regards, Edward J. Yoon
> >> > > > > @eddieyoon
> >> > > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > > Cheers,
> >> > > > Aditya Sarawgi
> >> > > >
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> Thomas Jungblut
> >> Berlin <th...@gmail.com>
> >>
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>

Re: Online machine learning on top of Hama BSP

Posted by "Edward J. Yoon" <ed...@apache.org>.

OKay, then let's get started.

My first idea is simple online recommendation system based on click-stream data.

On Thu, May 24, 2012 at 6:26 PM, Praveen Sripati
<pr...@gmail.com> wrote:
> +1
>
> For those who are interested in ML, please check this. GNU Octave is used.
>
> https://www.coursera.org/course/ml
>
> Another session is yet to be announced.
>
> Thanks,
> Praveen
>
> On Thu, May 24, 2012 at 12:54 PM, Thomas Jungblut <
> thomas.jungblut@googlemail.com> wrote:
>
>> +1
>>
>> 2012/5/24 Tommaso Teofili <to...@gmail.com>
>>
>> > and same here :)
>> >
>> > 2012/5/24 Vaijanath Rao <va...@gmail.com>
>> >
>> > > +1 me too
>> > > On May 23, 2012 10:26 PM, "Aditya Sarawgi" <sa...@gmail.com>
>> > > wrote:
>> > >
>> > > > +1
>> > > > I would be happy to help :)
>> > > >
>> > > > On Wed, May 23, 2012 at 6:23 PM, Edward J. Yoon <
>> edwardyoon@apache.org
>> > > > >wrote:
>> > > >
>> > > > > Hi,
>> > > > >
>> > > > > Does anyone interesting in online machine learning?
>> > > > >
>> > > > > --
>> > > > > Best Regards, Edward J. Yoon
>> > > > > @eddieyoon
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > Cheers,
>> > > > Aditya Sarawgi
>> > > >
>> > >
>> >
>>
>>
>>
>> --
>> Thomas Jungblut
>> Berlin <th...@gmail.com>
>>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Online machine learning on top of Hama BSP

Posted by Praveen Sripati <pr...@gmail.com>.

+1

For those who are interested in ML, please check this. GNU Octave is used.

https://www.coursera.org/course/ml

Another session is yet to be announced.

Thanks,
Praveen

On Thu, May 24, 2012 at 12:54 PM, Thomas Jungblut <
thomas.jungblut@googlemail.com> wrote:

> +1
>
> 2012/5/24 Tommaso Teofili <to...@gmail.com>
>
> > and same here :)
> >
> > 2012/5/24 Vaijanath Rao <va...@gmail.com>
> >
> > > +1 me too
> > > On May 23, 2012 10:26 PM, "Aditya Sarawgi" <sa...@gmail.com>
> > > wrote:
> > >
> > > > +1
> > > > I would be happy to help :)
> > > >
> > > > On Wed, May 23, 2012 at 6:23 PM, Edward J. Yoon <
> edwardyoon@apache.org
> > > > >wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Does anyone interesting in online machine learning?
> > > > >
> > > > > --
> > > > > Best Regards, Edward J. Yoon
> > > > > @eddieyoon
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Cheers,
> > > > Aditya Sarawgi
> > > >
> > >
> >
>
>
>
> --
> Thomas Jungblut
> Berlin <th...@gmail.com>
>

Re: Online machine learning on top of Hama BSP

Posted by Thomas Jungblut <th...@googlemail.com>.

+1

2012/5/24 Tommaso Teofili <to...@gmail.com>

> and same here :)
>
> 2012/5/24 Vaijanath Rao <va...@gmail.com>
>
> > +1 me too
> > On May 23, 2012 10:26 PM, "Aditya Sarawgi" <sa...@gmail.com>
> > wrote:
> >
> > > +1
> > > I would be happy to help :)
> > >
> > > On Wed, May 23, 2012 at 6:23 PM, Edward J. Yoon <edwardyoon@apache.org
> > > >wrote:
> > >
> > > > Hi,
> > > >
> > > > Does anyone interesting in online machine learning?
> > > >
> > > > --
> > > > Best Regards, Edward J. Yoon
> > > > @eddieyoon
> > > >
> > >
> > >
> > >
> > > --
> > > Cheers,
> > > Aditya Sarawgi
> > >
> >
>



-- 
Thomas Jungblut
Berlin <th...@gmail.com>

Re: Online machine learning on top of Hama BSP

Posted by Tommaso Teofili <to...@gmail.com>.

and same here :)

2012/5/24 Vaijanath Rao <va...@gmail.com>

> +1 me too
> On May 23, 2012 10:26 PM, "Aditya Sarawgi" <sa...@gmail.com>
> wrote:
>
> > +1
> > I would be happy to help :)
> >
> > On Wed, May 23, 2012 at 6:23 PM, Edward J. Yoon <edwardyoon@apache.org
> > >wrote:
> >
> > > Hi,
> > >
> > > Does anyone interesting in online machine learning?
> > >
> > > --
> > > Best Regards, Edward J. Yoon
> > > @eddieyoon
> > >
> >
> >
> >
> > --
> > Cheers,
> > Aditya Sarawgi
> >
>

Re: Online machine learning on top of Hama BSP

Posted by Vaijanath Rao <va...@gmail.com>.

+1 me too
On May 23, 2012 10:26 PM, "Aditya Sarawgi" <sa...@gmail.com> wrote:

> +1
> I would be happy to help :)
>
> On Wed, May 23, 2012 at 6:23 PM, Edward J. Yoon <edwardyoon@apache.org
> >wrote:
>
> > Hi,
> >
> > Does anyone interesting in online machine learning?
> >
> > --
> > Best Regards, Edward J. Yoon
> > @eddieyoon
> >
>
>
>
> --
> Cheers,
> Aditya Sarawgi
>

Re: Online machine learning on top of Hama BSP

Posted by Aditya Sarawgi <sa...@gmail.com>.

+1
I would be happy to help :)

On Wed, May 23, 2012 at 6:23 PM, Edward J. Yoon <ed...@apache.org>wrote:

> Hi,
>
> Does anyone interesting in online machine learning?
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>



-- 
Cheers,
Aditya Sarawgi