You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Ted Dunning <te...@gmail.com> on 2013/11/12 14:04:37 UTC

Cloudera announces Oryx

Sean writes:

We release Oryx today -- get some.
#cloudera<https://plus.google.com/s/%23cloudera>
>  #oryx <https://plus.google.com/s/%23oryx>
> The Oryx open source project provides simple, real-time large-scale
> machine learning infrastructure. It implements a few classes of algorithm
> commonly used in business applications: collaborative filtering /
> recommendation, classification / regression, and clustering. It can
> continuously build models from a stream of data at large scale using Apache
> Hadoop's MapReduce. It also serves queries of those models in real-time via
> an HTTP REST API, and can update models approximately in response to new
> data. Models are exchanged in PMML format.



I personally find it a pity that Cloudera talks the open source talk, but
doesn't walk the walk by contributing to, for example, Mahout.

Their decision.

Sean's decision as well, I guess.

Re: Cloudera announces Oryx

Posted by Isabel Drost-Fromm <is...@apache.org>.
On Tuesday, November 12, 2013 10:33:39 AM Amir Sedighi wrote:
> Seems Oryx is a Cloudera version of Myrrix.  Is there any improvement list?

For general design - how about using dev@mahout. For specific needs - how about 
filing tickets in JIRA.

Best way to get improvements not only talked about but actually done is to get 
active in submitting patches - be it tests only, documentation, examples or 
code. For more background also check out 

http://markmail.org/thread/jhdjlrom2jvcjx5v


Isabel


Re: Cloudera announces Oryx

Posted by Amir Sedighi <am...@yahoo.com>.
Seems Oryx is a Cloudera version of Myrrix.  Is there any improvement list?

Regards,
Amir.




On Tuesday, November 12, 2013 8:46 PM, Isabel Drost-Fromm <is...@apache.org> wrote:
 
On Tuesday, November 12, 2013 04:27:48 PM Sean Owen wrote:
> I like the benchmark sentiment. The two projects actually have little
> overlap in functionality, which is the essence of the reason why it's
> a different project.

One starting point of discussion for this dev list I would see valuable is 
answering two questions: 

Oryx seems to support pretty much the exact same two use cases as Apache 
Mahout: Clustering, Classification and Recommendations. When would you advise 
people to use the one, when the other? 

I think asking this question could also lead us closer to an answer to what I 
believe Manuel is getting at with his request for benchmarks: From a user's 
perspective, what are the factors that determine when to go for which? Is 
there anything to learn technologically here for Mahout?

One last thing to keep in mind during the discussion: Both projects are Apache 
licensed. So if there's people putting valuable code into Oryx - beautiful: 
All the more code that Mahout can benefit from as well ;)


Isabel

Re: Cloudera announces Oryx

Posted by Andrew Musselman <an...@gmail.com>.
I'd like to congratulate Sean and Cloudera on shipping a system that does a
few things well and then lets you put them into production easily.

This feels like the direction Mahout ought to go as well, and the group's
been going toward a simpler system recently.

My reason for using Mahout is that it has a good array of tools, uses Colt,
etc., but it is still hard for most people to use.  That's something to
improve.

Best
Andrew

Re: Cloudera announces Oryx

Posted by Isabel Drost-Fromm <is...@apache.org>.
On Tuesday, November 12, 2013 04:27:48 PM Sean Owen wrote:
> I like the benchmark sentiment. The two projects actually have little
> overlap in functionality, which is the essence of the reason why it's
> a different project.

One starting point of discussion for this dev list I would see valuable is 
answering two questions: 

Oryx seems to support pretty much the exact same two use cases as Apache 
Mahout: Clustering, Classification and Recommendations. When would you advise 
people to use the one, when the other? 

I think asking this question could also lead us closer to an answer to what I 
believe Manuel is getting at with his request for benchmarks: From a user's 
perspective, what are the factors that determine when to go for which? Is 
there anything to learn technologically here for Mahout?

One last thing to keep in mind during the discussion: Both projects are Apache 
licensed. So if there's people putting valuable code into Oryx - beautiful: 
All the more code that Mahout can benefit from as well ;)


Isabel


Re: Cloudera announces Oryx

Posted by Sean Owen <sr...@gmail.com>.
On Tue, Nov 12, 2013 at 4:02 PM, Manuel Blechschmidt
<Ma...@gmx.de> wrote:
> It would be nice if Cloudera could publish some benchmarks. Cloudera vs. Mahout vs. SAP HANA PAL vs. SPSS to give somebody the chances to enhance Mahout in a way that it can catch up.

Does this need to be a "versus" thing? I and other engs here did a
fair bit of work to keep the Mahout code working in CDH5 / Hadoop 2.2,
and contributed that back. For a company apparently trying to
undermine Mahout we're not very good at it...

I like the benchmark sentiment. The two projects actually have little
overlap in functionality, which is the essence of the reason why it's
a different project. Oryx has nothing but RDF, kmeans++, and ALS. No
visualization, no text processing tools. No library-like interfaces.

On the other hand the piece of the puzzle Oryx is trying to add (model
serving) has no counterpart in this project, with possible exception
of Taste. So there's not much to compare with a benchmark.

In-memory pretty well always beats Hadoop. I can tell you that I think
the ALS in Mahout is *faster* I'm pretty sure mostly for loading a
bunch into memory. But the in-memory ALS in Oryx of course is faster
by an order of magnitude than both. How do you want to benchmark that?

I have never used SPSS or HANA's offering here, but am willing to bet
it's wicked fast without even bothering to measure.

I'm not even sure speed is the only or main point? Things like
usability out of the box top my list. And being open source and
working with data in HDFS.

Re: Cloudera announces Oryx

Posted by Manuel Blechschmidt <Ma...@gmx.de>.
Hallo Ted, hello Sean,
I appreciate both of your work. I am not a contributor of code at all and I am just spreading the word around Mahout and creating some documentation and demo projects.

I can understand that it is difficult to integrate the interests of an employer and of an open source project.

Nevertheless I believe in commercial offerings and in competition. So more competition will in the end create better products. Sometimes it makes sense to keep temporary secrets sometimes it doesn't as far as I know research did not found evidence yet.

On 12.11.2013, at 09:13, Ted Dunning wrote:

> On Tue, Nov 12, 2013 at 1:46 PM, Sean Owen <sr...@gmail.com> wrote:
> 
>> Mahout has served well for a long time as measured in Hadoop-years --
>> like 4+ years. It's still in usable life. I don't think the current
>> state of the code means it's feasible to truly evolve it towards
>> things like Hadoop 2, Spark, real-time.
> 
> 

It would be nice if Cloudera could publish some benchmarks. Cloudera vs. Mahout vs. SAP HANA PAL vs. SPSS to give somebody the chances to enhance Mahout in a way that it can catch up.

It would be great if the current discussion can lead to some benefitting competition like the NetFlix prize.

Thanks a lot
    Manuel

-- 
Manuel Blechschmidt
M.Sc. IT Systems Engineering
Dortustr. 57
14467 Potsdam
Mobil: 0173/6322621
Twitter: http://twitter.com/Manuel_B


Re: Cloudera announces Oryx

Posted by Isabel Drost-Fromm <is...@apache.org>.
Sebastian, thanks for providing your perspective.

On Tuesday, November 12, 2013 07:18:43 PM Sebastian Schelter wrote:
> The lead developer of Impala answered the question whether Impala accepts
> patches with the statement that Impala is developed by Cloudera engineers
> and others can only look at the source code on github...

I'm intentionally taking out only this comment and putting my ASF member hat 
on for a second: There's a couple of ways for running open source projects.

Compared to how in the past large corporations abused the term open source* to 
turn it into "shared source" - what you describe above actually isn't so bad 
at all: 

The license is a valid (as in OSI approved) open source license that gives 
users of the project all four freedoms**. So if someone wants to change the 
project in a way that is not compatible with the way Cloudera wants to drive 
the project, they can simply clone the project, make modifications and pull 
changes from upstream in a timely manner. 

If there's more than one such person it's easy enough to establish a new 
community leader who clones the project, fetches and merges from upstream in a 
timely manner and manages patches from downstream contributors. In order to 
make this fly it needs a bit of experience, a whole lot of dedication, quite 
some time and some knowledge in herding cats^W^W^W with managing a community 
of developers.

Essentially it's not unlike other open source projects controlled by a single 
entity only: You end up with a benevolent dictatorship where the dictators 
power is limited by the forkability of his project.



The way Mahout works is fundamentally different: Instead of having one entity 
with full control over development direction those actively contributing to 
the project earn merit over time - with merit comes more influence on the code 
base (and, granted, more responsibility).

This particular aspect may seem incredibly unimportant when you are starting a 
new project at your employer. However software projects tend to live much 
longer than originally anticipated. So if you bet part of your business on an 
open source project, it seems worthwhile to put some thought into how you can 
assure that the project isn't going into a direction that is completely at 
odds with what you want to do.

Of course with this freedom comes responsibility: Giving a say in the project 
to multiple entities means that there is more need for communication before 
reaching consensus and going with that decision. On the other hand getting to 
the point where one can influence the project needs active participation - much 
like Mahout itself depends crucially on it's users contributing code, 
documentation and help on mailing lists. See also the following two texts on 
how open source works - though the umbrella project that published them is no 
longer active the content is still relevant today:

http://jakarta.apache.org/site/contributing.html
http://jakarta.apache.org/site/understandingopensource.html

For founders openess comes with giving up direct control over what that person 
may consider their personal baby. However it also comes with the virtue of 
knowing that the project has a chance to survive the founder's interest in and 
time for it - essentially giving it a chance to stand and live on it's own.


It may be my personal bias however I have met many people (not counting Apache 
Software Foundation people obviously) who value having the freedom to 
participate, to drive the project further and to be able to influence the 
project themselves in particular when betting parts of their business on that 
project.

For interested in a bit more background information on what the typical 
options for open source governance are it might be valuable to take a look at 
the following text:

http://producingoss.com/en/producingoss.html#social-infrastructure


Sure it's sad for Mahout to see someone very capable leaving. It's also sad to 
see a project being founded with a proposal that for the uniniated is hard to 
distinguish from Mahout's offering. However sometimes it's better to try out 
new ideas in a space independent of the original project as opposed to 
becoming ever more grumpy with not being able to persue one's own ideas of how 
things should be. Who knows, given how different the two projects are according 
to Sean - maybe there's room for mutual benefit after all. I certainly welcome 
contributions in that direction.



Isabel



* http://en.wikipedia.org/wiki/Shared_source
** http://fsfe.org/freesoftware/basics/4freedoms.en.html 

Re: Cloudera announces Oryx

Posted by Sean Owen <sr...@gmail.com>.
On Tue, Nov 12, 2013 at 6:18 PM, Sebastian Schelter <ss...@apache.org> wrote:
> However, I also cannot understand why Cloudera and you need to start a
> new open source project that in many ways mirrors what mahout offers.
> Why not contribute the algorithm implementations (the computation layer)
> to mahout and built the serving layer as a project on top of that? I
> don't see what would have prevented this, I would think it would have
> been warmly welcomed by this community.
>
> It is not that this new project creates competition from which users
> will benefit, its exactly the opposite. To me it feels like an
> intentional abandonment of mahout. Instead of giving users a single
> project where we could have united efforts, users now have to choose
> between two things that in general do the same things with each of them
> missing some functionality. In my eyes, users lose here.

I disagree that these are apples-to-apples alternatives. Oryx does a
lot that Mahout doesn't and vice versa. That's the main reason it's a
separate entity.

The Oryx implementation is quite different, even if there is some
overlap. Putting two different implementations of the same thing
alongside each other, when they're not related, doesn't solve a
problem of choice and seems confusing. You could well ask, why is
there a need for a different implementation? that's a different
conversation.

There is more code today than yesterday not less. At worst, n+1
choices, not n. Nobody removed Mahout code.

I hear an implication that there is no legitimate reason to put energy
into something else. I think it is as fair to ask whether the state of
code and community here have no part in deciding that for people out
there?


> I can also understand Ted's worries about Cloudera's attitude towards
> open source, after having heard Impala's view of "open source" at the
> last Buzzwords (the lead developer of Impala answered the question
> whether Impala accepts patches with the statement that Impala is
> developed by Cloudera engineers and others can only look at the source
> code on github...). I hope that Oryx chooses another path (I also hope
> this for Impala).

It depends if anyone wants to contribute and whether the contribution
makes sense. Whether you like Marcel's stance on open source is a
different and interesting question -- won't engage it here to spare
the list.

Re: Cloudera announces Oryx

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
On Tue, Nov 12, 2013 at 10:18 AM, Sebastian Schelter <ss...@apache.org> wrote:

>
>
> @Sean
>
> However, I also cannot understand why Cloudera and you need to start a
> new open source project that in many ways mirrors what mahout offers.
> Why not contribute the algorithm implementations (the computation layer)
> to mahout and built the serving layer as a project on top of that? I
> don't see what would have prevented this, I would think it would have
> been warmly welcomed by this community.
>
> I can also understand Ted's worries about Cloudera's attitude towards
> open source, after having heard Impala's view of "open source" at the
> last Buzzwords (the lead developer of Impala answered the question
> whether Impala accepts patches with the statement that Impala is
> developed by Cloudera engineers and others can only look at the source
> code on github...). I hope that Oryx chooses another path (I also hope
> this for Impala).
>
> Its a very bad day for mahout today.
>

Second that -- it's all about control. It has always been.  Even when
rivals of Cloudera's model have been forced to work together on a project,
it has always been about committership clout. The next logical step is to
cut all foreign committers and claim (sell) the utter domain expertise. It
is not bad, it's just a business model. There are tons of companies that
try to build Drill-like brute force machine clones too today, even i they
open some of all what they do.

Even Amplab is sort of that way, keeping control over architecture during
decision times and then dumping it on the community once all important
calls are already made. Again, nothing wrong with it, it is all open and
free after all. And a healthy share of dictatorship cuts the dev effort.

However, let's call it out for what it is.  Open source usually means
community. Just because something got opened, doesn't mean there's a
community outside its initial effort, but i suppose customers still like
the "open" word without making much of a distinction over "community" part
of it.

-d


>
> --sebastian
>
> PS:
>
> I still have to comment to this statement: "I don't think the current
> state of the code means it's feasible to truly evolve it towards things
> like Hadoop 2, Spark, real-time."
>
> To me this sounds like a marketing statement, "look, we can give you
> something better than mahout". Porting mahout's algorithms to spark is
> something that can be done with very little effort, I ported
> RowSimilarityJob in a single evening recently as a getting started with
> Spark exercise. Making the codebase ready is only a matter of will to
> invest time and efforts.
>

+1111. This statement Sebastian is referring to can't be farther from the
truth. We use Mahout as building blocks in Spark framework. Including
Pregel and GraphX (the latter is still under dev at Amplab). We are, and
will on hook to contribute that back to Mahout. (well, we are still
rehashing what we move and what we are not. but i am moving at github some
parts of it per mahout issue).

All the difference is really whether one decides to contribute it, subject
to peer review, or just keeps saying "it's not possible".


>
>
>
> On 12.11.2013 16:54, Sean Owen wrote:
> > On Tue, Nov 12, 2013 at 2:13 PM, Ted Dunning <te...@gmail.com>
> wrote:
> >> Cloudera's primary influence is to get you to ask to go emeritus, i.e.
> stop
> >> contributing.
> >>
> >> You have contributed in the past.  That's great.  And now you work for
> >> Cloudera.
> >
> > I started building on a new code base and left the PMC from about mid
> > 2012 and began at Cloudera in July 2013. Right -- check the archives?
> > I mean... it doesn't add up even time-wise.
> >
> > It's only relevant in that I hope to expose and defuse this suggestion
> > of some kind of plot. Certainly, it's best to steer clear of what
> > might be perceived as vendor stone-throwing... I am sure it's not
> > relevant to dev@.
> >
> >
> >> Getting a paycheck is also a legitimate reason for you do this.  And it
> >> should be recognized where the paycheck comes from and what is really
> going
> >> on.
> >
> > A plot so deep even the plotters are unaware! I am definitely paid to
> > write open source code as are a lot of people here and it's a Good
> > Thing. Surely we do not suggest otherwise?
> >
> >
> >> Well, I think that it is a hypocrisy fail going on.  I get criticized
> all
> >> the time by Cloudera employees for "not being open".  And now the shoe
> is
> >> on the other foot where Cloudera decides it is better to not contribute
> to
> >> an existing open source project and, indeed, even hires away a key
> >> developer of same.
> >
> > I don't understand the equivalence -- was it not clear that Oryx is
> > open source not proprietary? -- but pursuing it is just going to look
> > like vendor spat.
> >
> > I don't understand the idea that contributing to one open source
> > project is wrong, but to another is right. Mahout is not more sacred
> > than any other, nor more open or important by having an Apache badge.
> > It can't be that, because Mahout exists, nobody else should try to
> > write anything like ML on Hadoop.
> >
> > Ted sorry to be on your black list -- a lesson to anyone else thinking
> > of leaving an Apache project? ay, you know where I live! I am happy to
> > be accused of working on another open project now, but hope nobody
> > agrees with the other suggestions. I'd feel bad if it were read widely
> > this way.
> >
>
>

Re: Cloudera announces Oryx

Posted by Sebastian Schelter <ss...@apache.org>.
@Ted

I don't see Cloudera buying Sean out of Mahout. As, I recall it, Sean
stepped down as PMC Chair after a discussion on the future of mahout,
where he saw his future vision for the project not concur with that of
the others. He reduced his engagement with mahout and built myrrix first
on his own. In my eyes, Oryx looks like myrrix + classification and
clustering.


@Sean

However, I also cannot understand why Cloudera and you need to start a
new open source project that in many ways mirrors what mahout offers.
Why not contribute the algorithm implementations (the computation layer)
to mahout and built the serving layer as a project on top of that? I
don't see what would have prevented this, I would think it would have
been warmly welcomed by this community.

It is not that this new project creates competition from which users
will benefit, its exactly the opposite. To me it feels like an
intentional abandonment of mahout. Instead of giving users a single
project where we could have united efforts, users now have to choose
between two things that in general do the same things with each of them
missing some functionality. In my eyes, users lose here.

I can also understand Ted's worries about Cloudera's attitude towards
open source, after having heard Impala's view of "open source" at the
last Buzzwords (the lead developer of Impala answered the question
whether Impala accepts patches with the statement that Impala is
developed by Cloudera engineers and others can only look at the source
code on github...). I hope that Oryx chooses another path (I also hope
this for Impala).

Its a very bad day for mahout today.

--sebastian

PS:

I still have to comment to this statement: "I don't think the current
state of the code means it's feasible to truly evolve it towards things
like Hadoop 2, Spark, real-time."

To me this sounds like a marketing statement, "look, we can give you
something better than mahout". Porting mahout's algorithms to spark is
something that can be done with very little effort, I ported
RowSimilarityJob in a single evening recently as a getting started with
Spark exercise. Making the codebase ready is only a matter of will to
invest time and efforts.




On 12.11.2013 16:54, Sean Owen wrote:
> On Tue, Nov 12, 2013 at 2:13 PM, Ted Dunning <te...@gmail.com> wrote:
>> Cloudera's primary influence is to get you to ask to go emeritus, i.e. stop
>> contributing.
>>
>> You have contributed in the past.  That's great.  And now you work for
>> Cloudera.
> 
> I started building on a new code base and left the PMC from about mid
> 2012 and began at Cloudera in July 2013. Right -- check the archives?
> I mean... it doesn't add up even time-wise.
> 
> It's only relevant in that I hope to expose and defuse this suggestion
> of some kind of plot. Certainly, it's best to steer clear of what
> might be perceived as vendor stone-throwing... I am sure it's not
> relevant to dev@.
> 
> 
>> Getting a paycheck is also a legitimate reason for you do this.  And it
>> should be recognized where the paycheck comes from and what is really going
>> on.
> 
> A plot so deep even the plotters are unaware! I am definitely paid to
> write open source code as are a lot of people here and it's a Good
> Thing. Surely we do not suggest otherwise?
> 
> 
>> Well, I think that it is a hypocrisy fail going on.  I get criticized all
>> the time by Cloudera employees for "not being open".  And now the shoe is
>> on the other foot where Cloudera decides it is better to not contribute to
>> an existing open source project and, indeed, even hires away a key
>> developer of same.
> 
> I don't understand the equivalence -- was it not clear that Oryx is
> open source not proprietary? -- but pursuing it is just going to look
> like vendor spat.
> 
> I don't understand the idea that contributing to one open source
> project is wrong, but to another is right. Mahout is not more sacred
> than any other, nor more open or important by having an Apache badge.
> It can't be that, because Mahout exists, nobody else should try to
> write anything like ML on Hadoop.
> 
> Ted sorry to be on your black list -- a lesson to anyone else thinking
> of leaving an Apache project? ay, you know where I live! I am happy to
> be accused of working on another open project now, but hope nobody
> agrees with the other suggestions. I'd feel bad if it were read widely
> this way.
> 


Re: Cloudera announces Oryx

Posted by Sean Owen <sr...@gmail.com>.
On Tue, Nov 12, 2013 at 2:13 PM, Ted Dunning <te...@gmail.com> wrote:
> Cloudera's primary influence is to get you to ask to go emeritus, i.e. stop
> contributing.
>
> You have contributed in the past.  That's great.  And now you work for
> Cloudera.

I started building on a new code base and left the PMC from about mid
2012 and began at Cloudera in July 2013. Right -- check the archives?
I mean... it doesn't add up even time-wise.

It's only relevant in that I hope to expose and defuse this suggestion
of some kind of plot. Certainly, it's best to steer clear of what
might be perceived as vendor stone-throwing... I am sure it's not
relevant to dev@.


> Getting a paycheck is also a legitimate reason for you do this.  And it
> should be recognized where the paycheck comes from and what is really going
> on.

A plot so deep even the plotters are unaware! I am definitely paid to
write open source code as are a lot of people here and it's a Good
Thing. Surely we do not suggest otherwise?


> Well, I think that it is a hypocrisy fail going on.  I get criticized all
> the time by Cloudera employees for "not being open".  And now the shoe is
> on the other foot where Cloudera decides it is better to not contribute to
> an existing open source project and, indeed, even hires away a key
> developer of same.

I don't understand the equivalence -- was it not clear that Oryx is
open source not proprietary? -- but pursuing it is just going to look
like vendor spat.

I don't understand the idea that contributing to one open source
project is wrong, but to another is right. Mahout is not more sacred
than any other, nor more open or important by having an Apache badge.
It can't be that, because Mahout exists, nobody else should try to
write anything like ML on Hadoop.

Ted sorry to be on your black list -- a lesson to anyone else thinking
of leaving an Apache project? ay, you know where I live! I am happy to
be accused of working on another open project now, but hope nobody
agrees with the other suggestions. I'd feel bad if it were read widely
this way.

Re: Cloudera announces Oryx

Posted by Ted Dunning <te...@gmail.com>.
On Tue, Nov 12, 2013 at 1:46 PM, Sean Owen <sr...@gmail.com> wrote:

> I think I'm the biggest single contributor to Mahout over time (? was
> at one point), and so by extension Cloudera is. And this new project
> is all open source. Surely that's maximally "walking the walk" in
> these regards?
>

Absolutely not.

Cloudera's primary influence is to get you to ask to go emeritus, i.e. stop
contributing.

You have contributed in the past.  That's great.  And now you work for
Cloudera.


> Mahout has served well for a long time as measured in Hadoop-years --
> like 4+ years. It's still in usable life. I don't think the current
> state of the code means it's feasible to truly evolve it towards
> things like Hadoop 2, Spark, real-time.


I also disagree strongly here.


> That is to say, there are
> legitimate reasons to start forward from a new project with different
> goals.
>

Getting a paycheck is also a legitimate reason for you do this.  And it
should be recognized where the paycheck comes from and what is really going
on.


> CDH5 still supports Mahout for sure. Oryx will work on any Hadoop (2)
> distro. I hope there is no openness foul here.
>

Well, I think that it is a hypocrisy fail going on.  I get criticized all
the time by Cloudera employees for "not being open".  And now the shoe is
on the other foot where Cloudera decides it is better to not contribute to
an existing open source project and, indeed, even hires away a key
developer of same.

The only foul is not calling out the situation for what it is.

Re: Cloudera announces Oryx

Posted by Sean Owen <sr...@gmail.com>.
I think I'm the biggest single contributor to Mahout over time (? was
at one point), and so by extension Cloudera is. And this new project
is all open source. Surely that's maximally "walking the walk" in
these regards?

Mahout has served well for a long time as measured in Hadoop-years --
like 4+ years. It's still in usable life. I don't think the current
state of the code means it's feasible to truly evolve it towards
things like Hadoop 2, Spark, real-time. That is to say, there are
legitimate reasons to start forward from a new project with different
goals.

CDH5 still supports Mahout for sure. Oryx will work on any Hadoop (2)
distro. I hope there is no openness foul here.



On Tue, Nov 12, 2013 at 1:04 PM, Ted Dunning <te...@gmail.com> wrote:
> Sean writes:
>
> We release Oryx today -- get some.
> #cloudera<https://plus.google.com/s/%23cloudera>
>>  #oryx <https://plus.google.com/s/%23oryx>
>> The Oryx open source project provides simple, real-time large-scale
>> machine learning infrastructure. It implements a few classes of algorithm
>> commonly used in business applications: collaborative filtering /
>> recommendation, classification / regression, and clustering. It can
>> continuously build models from a stream of data at large scale using Apache
>> Hadoop's MapReduce. It also serves queries of those models in real-time via
>> an HTTP REST API, and can update models approximately in response to new
>> data. Models are exchanged in PMML format.
>
>
>
> I personally find it a pity that Cloudera talks the open source talk, but
> doesn't walk the walk by contributing to, for example, Mahout.
>
> Their decision.
>
> Sean's decision as well, I guess.