You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by Grant Ingersoll <gs...@apache.org> on 2013/06/08 12:55:41 UTC

[DRAFT] 0.8 Release Announcement + Future Plans Discussion

Hi Mahouts,

A full copy of proposed draft release notes are up at https://cwiki.apache.org/confluence/display/MAHOUT/Release+0.8. Please add/edit as appropriate.

IN PARTICULAR, PLEASE PAY CLOSE ATTENTION TO THE SECTION LABELLED __FUTURE PLANS__, which I have included below. This is purely my own opinion, but I think it reflects conversations I've had w/ both Robin and Sebastian at Berlin Buzzwords. I'm also interested in opinions on my proposed deprecation plan (which I haven't discussed with anyone) which is put forth in the 1.0 plans below.

-------------------------- DRAFT -------------------------
FUTURE PLANS

0.9

As the project moves towards a 1.0 release, the community is working to clean up and/or remove parts of the code base that are under-supported or that underperform as well as to better focus the energy and contributions on key algorithms that are proven to scale in production and have seen wide-spread adoption. To this end, in the next release, the project is planning on removing support for the following algorithms unless there is sustained support and improvement of them before the next release.

The algorithms to be removed are:
- From Clustering:
Dirichlet
MeanShift
MinHash
- From Classification (both are sequential implementations)
Winnow
Perceptron
- Frequent Pattern Mining
- Collaborative Filtering
GSI: DO ANY GO HERE?
- Other
GSI: ANYTHING?

If you are interested in supporting 1 or more of these algorithms, please make it known on dev@mahout.apache.org and via JIRA issues that fix and/or improve them. Please also provide supporting evidence as to there effectiveness for you in production.

1.0 PLANS

Our plans as a community are to focus 0.9 on cleanup of bugs and the removal of the code mentioned above and then to follow with a 1.0 release soon thereafter, at which point the community is committing to the support of the algorithms packaged in the 1.0 for at least two minor versions after their release. In the case of removal, we will deprecate the functionality in the 1.(x+1) minor release and remove it in the 1.(x+2) release. For instance, if feature X is to be removed after the 1.2 release, it will be deprecated in 1.3 and removed in 1.4.

------------------- DRAFT ----------------------

-Grant

Re: [DRAFT] 0.8 Release Announcement + Future Plans Discussion

Posted by Sean Owen <sr...@gmail.com>.

I agree with deprecating all of that FWIW.

On Sat, Jun 8, 2013 at 6:33 PM, Grant Ingersoll <gs...@apache.org> wrote:
>> Collaborative Filtering:
>>
>> - all recommenders in o.a.m.cf.taste.impl.recommender.knn
>>
>> - the TreeClusteringRecommender in o.a.m.cf.taste.impl.recommender
>>
>> - the SlopeOne implementations in o.a.m..cf.taste.hadoop.slopeone and
>> o.a.m.cf.taste.impl.recommender.slopeone
>>
>> - the distributed pseudo recommender in o.a.m.cf.taste.hadoop.pseudo
>
> Pseudo is useful, no?  Don't know about the others.

Re: [DRAFT] 0.8 Release Announcement + Future Plans Discussion

Posted by Sebastian Schelter <ss...@apache.org>.

I didn't refer to the stats stuff in math, but to stats.entropy in core/
which has Hadoop code to calculate entropy etc.


2013/6/9 Ted Dunning <te...@gmail.com>

> Actually this stats stuff is definitely used in application code (of mine
> if not others).
>
> The OnlineSummarizer has 20 usages throughout Mahout.
>
>
> On Sat, Jun 8, 2013 at 11:08 PM, Grant Ingersoll <gsingers@apache.org
> >wrote:
>
> > Yes, please edit the Wiki directly with the highlights!
> >
> >
> > On Jun 8, 2013, at 3:21 PM, Suneel Marthi <su...@yahoo.com>
> wrote:
> >
> > > Under Release Highlights, please also add:
> > >
> > > a) Dan's Streaming kmeans clustering.
> > > b) Mahout upgrade to be Lucene 4.3.0 compatible
> > >
> > >
> > > (both of the above deserve special mentions along with lucene2seq and
> > vector/matrix performance improvements).
> >
> >
>

Re: [DRAFT] 0.8 Release Announcement + Future Plans Discussion

Posted by Ted Dunning <te...@gmail.com>.

Actually this stats stuff is definitely used in application code (of mine
if not others).

The OnlineSummarizer has 20 usages throughout Mahout.

On Sat, Jun 8, 2013 at 11:08 PM, Grant Ingersoll <gs...@apache.org>wrote:

> Yes, please edit the Wiki directly with the highlights!
>
>
> On Jun 8, 2013, at 3:21 PM, Suneel Marthi <su...@yahoo.com> wrote:
>
> > Under Release Highlights, please also add:
> >
> > a) Dan's Streaming kmeans clustering.
> > b) Mahout upgrade to be Lucene 4.3.0 compatible
> >
> >
> > (both of the above deserve special mentions along with lucene2seq and
> vector/matrix performance improvements).
>
>

Re: [DRAFT] 0.8 Release Announcement + Future Plans Discussion

Posted by Grant Ingersoll <gs...@apache.org>.

Yes, please edit the Wiki directly with the highlights!

On Jun 8, 2013, at 3:21 PM, Suneel Marthi <su...@yahoo.com> wrote:

> Under Release Highlights, please also add:
> 
> a) Dan's Streaming kmeans clustering.
> b) Mahout upgrade to be Lucene 4.3.0 compatible 
> 
> 
> (both of the above deserve special mentions along with lucene2seq and vector/matrix performance improvements).

Re: [DRAFT] 0.8 Release Announcement + Future Plans Discussion

Posted by Suneel Marthi <su...@yahoo.com>.

Under Release Highlights, please also add:

a) Dan's Streaming kmeans clustering.
b) Mahout upgrade to be Lucene 4.3.0 compatible 


(both of the above deserve special mentions along with lucene2seq and vector/matrix performance improvements).



________________________________
 From: Grant Ingersoll <gs...@apache.org>
To: dev@mahout.apache.org; ssc@apache.org 
Cc: user@mahout.apache.org 
Sent: Saturday, June 8, 2013 1:33 PM
Subject: Re: [DRAFT] 0.8 Release Announcement + Future Plans Discussion
 


On Jun 8, 2013, at 1:26 PM, Sebastian Schelter <ss...@apache.org> wrote:

> Hi Grant,
> 
> Very good release announcement. I propose that we deprecate a lot more,
> I think we should be aggressive here to pave the way for a clean and
> slim 1.0 release.
> 
> I propose to additionally deprecate the following algorithms, as to my
> state of knowledge, they are not actively used:
> 
> Collaborative Filtering:
> 
> - all recommenders in o.a.m.cf.taste.impl.recommender.knn
> 
> - the TreeClusteringRecommender in o.a.m.cf.taste.impl.recommender
> 
> - the SlopeOne implementations in o.a.m..cf.taste.hadoop.slopeone and
> o.a.m.cf.taste.impl.recommender.slopeone
> 
> - the distributed pseudo recommender in o.a.m.cf.taste.hadoop.pseudo

Pseudo is useful, no?  Don't know about the others.

> 
> Classification:
> 
> - the Hidden Markov Models in o.a.m.classifier.sequencelearning.hmm

We have some parallel training stuff coming, so I'd say -1 here, as I think HMMs are pretty important, no?

> 
> Clustering
> 
> - Fuzzy k-Means o.a.m.clustering.fuzzykmeans
> - Spectral k-Means in o.a.m.clustering.spectral

-1 on spectral being dropped as that seems to receive decent traction.

Not sure on Fuzzy, as I think it is a pretty trivial extension of K-Means.

> 
> Math
> 
> - the tooling in o.a.m.math.stats.entropy
> 
> Furthermore, I think we should deprecate the Lanczos implementation in
> o.a.m.math.hadoop.decomposer and port all code that uses it to SSVD.

No opinion.

+1 on everything else.

> 
> To all users and other committers, this is a biased first proposal,
> please shout, if you see things different and want to have things kept.
> 
> Best,
> Sebastian
> 
> 
> On 08.06.2013 16:42, Grant Ingersoll wrote:
>> More tests are always welcome.
>> 
>> On Jun 8, 2013, at 10:29 AM, Ravi Mummulla <ra...@gmail.com> wrote:
>> 
>>> Hi Grant,
>>> Regarding 1.0 plans, do we also want to include a note on adding tests
>>> where they don't exist or improving them where needed or is that implicit?
>>> 
>>> Thanks.
>>> 
>>> 
>>> On Sat, Jun 8, 2013 at 3:55 AM, Grant Ingersoll <gs...@apache.org> wrote:
>>> 
>>>> Hi Mahouts,
>>>> 
>>>> A full copy of proposed draft release notes are up at
>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Release+0.8.  Please
>>>> add/edit as appropriate.
>>>> 
>>>> IN PARTICULAR, PLEASE PAY CLOSE ATTENTION TO THE SECTION LABELLED __FUTURE
>>>> PLANS__, which I have included below.  This is purely my own opinion, but I
>>>> think it reflects conversations I've had w/ both Robin and Sebastian at
>>>> Berlin Buzzwords.   I'm also interested in opinions on my proposed
>>>> deprecation plan (which I haven't discussed with anyone) which is put forth
>>>> in the 1.0 plans below.
>>>> 
>>>> --------------------------  DRAFT -------------------------
>>>> FUTURE PLANS
>>>> 
>>>> 0.9
>>>> 
>>>> As the project moves towards a 1.0 release, the community is working to
>>>> clean up and/or remove parts of the code base that are under-supported or
>>>> that underperform as well as to better focus the energy and contributions
>>>> on key algorithms that are proven to scale in production and have seen
>>>> wide-spread adoption.  To this end, in the next release, the project is
>>>> planning on removing support for the following algorithms unless there is
>>>> sustained support and improvement of them before the next release.
>>>> 
>>>> The algorithms to be removed are:
>>>> - From Clustering:
>>>>       Dirichlet
>>>>       MeanShift
>>>>       MinHash
>>>> - From Classification (both are sequential implementations)
>>>>       Winnow
>>>>       Perceptron
>>>> - Frequent Pattern Mining
>>>> - Collaborative Filtering
>>>>       GSI: DO ANY GO HERE?
>>>> - Other
>>>>       GSI: ANYTHING?
>>>> 
>>>> If you are interested in supporting 1 or more of these algorithms, please
>>>> make it known on dev@mahout.apache.org and via JIRA issues that fix
>>>> and/or improve them.  Please also provide supporting evidence as to there
>>>> effectiveness for you in production.
>>>> 
>>>> 1.0 PLANS
>>>> 
>>>> Our plans as a community are to focus 0.9 on cleanup of bugs and the
>>>> removal of the code mentioned above and then to follow with a 1.0 release
>>>> soon thereafter, at which point the community is committing to the support
>>>> of the algorithms packaged in the 1.0 for at least two minor versions after
>>>> their release.  In the case of removal, we will deprecate the functionality
>>>> in the 1.(x+1) minor release and remove it in the 1.(x+2) release.  For
>>>> instance, if feature X is to be removed after the 1.2 release, it will be
>>>> deprecated in 1.3 and removed in 1.4.
>>>> 
>>>> ------------------- DRAFT ----------------------
>>>> 
>>>> -Grant
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> Thanks.
>> 
>> --------------------------------------------
>> Grant Ingersoll | @gsingers
>> http://www.lucidworks.com
>> 
>> 
>> 
>> 
>> 
>> 
> 

--------------------------------------------
Grant Ingersoll | @gsingers
http://www.lucidworks.com

Re: [DRAFT] 0.8 Release Announcement + Future Plans Discussion

Posted by Suneel Marthi <su...@yahoo.com>.

Under Release Highlights, please also add:

a) Dan's Streaming kmeans clustering.
b) Mahout upgrade to be Lucene 4.3.0 compatible 


(both of the above deserve special mentions along with lucene2seq and vector/matrix performance improvements).



________________________________
 From: Grant Ingersoll <gs...@apache.org>
To: dev@mahout.apache.org; ssc@apache.org 
Cc: user@mahout.apache.org 
Sent: Saturday, June 8, 2013 1:33 PM
Subject: Re: [DRAFT] 0.8 Release Announcement + Future Plans Discussion
 


On Jun 8, 2013, at 1:26 PM, Sebastian Schelter <ss...@apache.org> wrote:

> Hi Grant,
> 
> Very good release announcement. I propose that we deprecate a lot more,
> I think we should be aggressive here to pave the way for a clean and
> slim 1.0 release.
> 
> I propose to additionally deprecate the following algorithms, as to my
> state of knowledge, they are not actively used:
> 
> Collaborative Filtering:
> 
> - all recommenders in o.a.m.cf.taste.impl.recommender.knn
> 
> - the TreeClusteringRecommender in o.a.m.cf.taste.impl.recommender
> 
> - the SlopeOne implementations in o.a.m..cf.taste.hadoop.slopeone and
> o.a.m.cf.taste.impl.recommender.slopeone
> 
> - the distributed pseudo recommender in o.a.m.cf.taste.hadoop.pseudo

Pseudo is useful, no?  Don't know about the others.

> 
> Classification:
> 
> - the Hidden Markov Models in o.a.m.classifier.sequencelearning.hmm

We have some parallel training stuff coming, so I'd say -1 here, as I think HMMs are pretty important, no?

> 
> Clustering
> 
> - Fuzzy k-Means o.a.m.clustering.fuzzykmeans
> - Spectral k-Means in o.a.m.clustering.spectral

-1 on spectral being dropped as that seems to receive decent traction.

Not sure on Fuzzy, as I think it is a pretty trivial extension of K-Means.

> 
> Math
> 
> - the tooling in o.a.m.math.stats.entropy
> 
> Furthermore, I think we should deprecate the Lanczos implementation in
> o.a.m.math.hadoop.decomposer and port all code that uses it to SSVD.

No opinion.

+1 on everything else.

> 
> To all users and other committers, this is a biased first proposal,
> please shout, if you see things different and want to have things kept.
> 
> Best,
> Sebastian
> 
> 
> On 08.06.2013 16:42, Grant Ingersoll wrote:
>> More tests are always welcome.
>> 
>> On Jun 8, 2013, at 10:29 AM, Ravi Mummulla <ra...@gmail.com> wrote:
>> 
>>> Hi Grant,
>>> Regarding 1.0 plans, do we also want to include a note on adding tests
>>> where they don't exist or improving them where needed or is that implicit?
>>> 
>>> Thanks.
>>> 
>>> 
>>> On Sat, Jun 8, 2013 at 3:55 AM, Grant Ingersoll <gs...@apache.org> wrote:
>>> 
>>>> Hi Mahouts,
>>>> 
>>>> A full copy of proposed draft release notes are up at
>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Release+0.8.  Please
>>>> add/edit as appropriate.
>>>> 
>>>> IN PARTICULAR, PLEASE PAY CLOSE ATTENTION TO THE SECTION LABELLED __FUTURE
>>>> PLANS__, which I have included below.  This is purely my own opinion, but I
>>>> think it reflects conversations I've had w/ both Robin and Sebastian at
>>>> Berlin Buzzwords.   I'm also interested in opinions on my proposed
>>>> deprecation plan (which I haven't discussed with anyone) which is put forth
>>>> in the 1.0 plans below.
>>>> 
>>>> --------------------------  DRAFT -------------------------
>>>> FUTURE PLANS
>>>> 
>>>> 0.9
>>>> 
>>>> As the project moves towards a 1.0 release, the community is working to
>>>> clean up and/or remove parts of the code base that are under-supported or
>>>> that underperform as well as to better focus the energy and contributions
>>>> on key algorithms that are proven to scale in production and have seen
>>>> wide-spread adoption.  To this end, in the next release, the project is
>>>> planning on removing support for the following algorithms unless there is
>>>> sustained support and improvement of them before the next release.
>>>> 
>>>> The algorithms to be removed are:
>>>> - From Clustering:
>>>>       Dirichlet
>>>>       MeanShift
>>>>       MinHash
>>>> - From Classification (both are sequential implementations)
>>>>       Winnow
>>>>       Perceptron
>>>> - Frequent Pattern Mining
>>>> - Collaborative Filtering
>>>>       GSI: DO ANY GO HERE?
>>>> - Other
>>>>       GSI: ANYTHING?
>>>> 
>>>> If you are interested in supporting 1 or more of these algorithms, please
>>>> make it known on dev@mahout.apache.org and via JIRA issues that fix
>>>> and/or improve them.  Please also provide supporting evidence as to there
>>>> effectiveness for you in production.
>>>> 
>>>> 1.0 PLANS
>>>> 
>>>> Our plans as a community are to focus 0.9 on cleanup of bugs and the
>>>> removal of the code mentioned above and then to follow with a 1.0 release
>>>> soon thereafter, at which point the community is committing to the support
>>>> of the algorithms packaged in the 1.0 for at least two minor versions after
>>>> their release.  In the case of removal, we will deprecate the functionality
>>>> in the 1.(x+1) minor release and remove it in the 1.(x+2) release.  For
>>>> instance, if feature X is to be removed after the 1.2 release, it will be
>>>> deprecated in 1.3 and removed in 1.4.
>>>> 
>>>> ------------------- DRAFT ----------------------
>>>> 
>>>> -Grant
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> Thanks.
>> 
>> --------------------------------------------
>> Grant Ingersoll | @gsingers
>> http://www.lucidworks.com
>> 
>> 
>> 
>> 
>> 
>> 
> 

--------------------------------------------
Grant Ingersoll | @gsingers
http://www.lucidworks.com

Re: [DRAFT] 0.8 Release Announcement + Future Plans Discussion

Posted by Shannon Quinn <sq...@gatech.edu>.

Sorry, that's o.a.m.clustering.spectral.eigencuts. Then move the .kmeans 
package to simply be o.a.m.clustering.spectral .

On 6/8/13 1:37 PM, Shannon Quinn wrote:
>
>>> Clustering
>>>
>>> - Fuzzy k-Means o.a.m.clustering.fuzzykmeans
>>> - Spectral k-Means in o.a.m.clustering.spectral
>> -1 on spectral being dropped as that seems to receive decent traction.
> Agreed, given recent activity in particular. However I would put forth 
> deprecating Eigencuts (o.a.m.clustering.eigencuts) until such time 
> that it can be made scalable.
>

Re: [DRAFT] 0.8 Release Announcement + Future Plans Discussion

Posted by Shannon Quinn <sq...@gatech.edu>.

>> Clustering
>>
>> - Fuzzy k-Means o.a.m.clustering.fuzzykmeans
>> - Spectral k-Means in o.a.m.clustering.spectral
> -1 on spectral being dropped as that seems to receive decent traction.
Agreed, given recent activity in particular. However I would put forth 
deprecating Eigencuts (o.a.m.clustering.eigencuts) until such time that 
it can be made scalable.

Re: [DRAFT] 0.8 Release Announcement + Future Plans Discussion

Posted by Sean Owen <sr...@gmail.com>.

I agree with deprecating all of that FWIW.

On Sat, Jun 8, 2013 at 6:33 PM, Grant Ingersoll <gs...@apache.org> wrote:
>> Collaborative Filtering:
>>
>> - all recommenders in o.a.m.cf.taste.impl.recommender.knn
>>
>> - the TreeClusteringRecommender in o.a.m.cf.taste.impl.recommender
>>
>> - the SlopeOne implementations in o.a.m..cf.taste.hadoop.slopeone and
>> o.a.m.cf.taste.impl.recommender.slopeone
>>
>> - the distributed pseudo recommender in o.a.m.cf.taste.hadoop.pseudo
>
> Pseudo is useful, no?  Don't know about the others.

Re: [DRAFT] 0.8 Release Announcement + Future Plans Discussion

Posted by Grant Ingersoll <gs...@apache.org>.

On Jun 8, 2013, at 1:26 PM, Sebastian Schelter <ss...@apache.org> wrote:

> Hi Grant,
> 
> Very good release announcement. I propose that we deprecate a lot more,
> I think we should be aggressive here to pave the way for a clean and
> slim 1.0 release.
> 
> I propose to additionally deprecate the following algorithms, as to my
> state of knowledge, they are not actively used:
> 
> Collaborative Filtering:
> 
> - all recommenders in o.a.m.cf.taste.impl.recommender.knn
> 
> - the TreeClusteringRecommender in o.a.m.cf.taste.impl.recommender
> 
> - the SlopeOne implementations in o.a.m..cf.taste.hadoop.slopeone and
> o.a.m.cf.taste.impl.recommender.slopeone
> 
> - the distributed pseudo recommender in o.a.m.cf.taste.hadoop.pseudo

Pseudo is useful, no?  Don't know about the others.

> 
> Classification:
> 
> - the Hidden Markov Models in o.a.m.classifier.sequencelearning.hmm

We have some parallel training stuff coming, so I'd say -1 here, as I think HMMs are pretty important, no?

> 
> Clustering
> 
> - Fuzzy k-Means o.a.m.clustering.fuzzykmeans
> - Spectral k-Means in o.a.m.clustering.spectral

-1 on spectral being dropped as that seems to receive decent traction.

Not sure on Fuzzy, as I think it is a pretty trivial extension of K-Means.

> 
> Math
> 
> - the tooling in o.a.m.math.stats.entropy
> 
> Furthermore, I think we should deprecate the Lanczos implementation in
> o.a.m.math.hadoop.decomposer and port all code that uses it to SSVD.

No opinion.

+1 on everything else.

> 
> To all users and other committers, this is a biased first proposal,
> please shout, if you see things different and want to have things kept.
> 
> Best,
> Sebastian
> 
> 
> On 08.06.2013 16:42, Grant Ingersoll wrote:
>> More tests are always welcome.
>> 
>> On Jun 8, 2013, at 10:29 AM, Ravi Mummulla <ra...@gmail.com> wrote:
>> 
>>> Hi Grant,
>>> Regarding 1.0 plans, do we also want to include a note on adding tests
>>> where they don't exist or improving them where needed or is that implicit?
>>> 
>>> Thanks.
>>> 
>>> 
>>> On Sat, Jun 8, 2013 at 3:55 AM, Grant Ingersoll <gs...@apache.org> wrote:
>>> 
>>>> Hi Mahouts,
>>>> 
>>>> A full copy of proposed draft release notes are up at
>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Release+0.8.  Please
>>>> add/edit as appropriate.
>>>> 
>>>> IN PARTICULAR, PLEASE PAY CLOSE ATTENTION TO THE SECTION LABELLED __FUTURE
>>>> PLANS__, which I have included below.  This is purely my own opinion, but I
>>>> think it reflects conversations I've had w/ both Robin and Sebastian at
>>>> Berlin Buzzwords.   I'm also interested in opinions on my proposed
>>>> deprecation plan (which I haven't discussed with anyone) which is put forth
>>>> in the 1.0 plans below.
>>>> 
>>>> --------------------------  DRAFT -------------------------
>>>> FUTURE PLANS
>>>> 
>>>> 0.9
>>>> 
>>>> As the project moves towards a 1.0 release, the community is working to
>>>> clean up and/or remove parts of the code base that are under-supported or
>>>> that underperform as well as to better focus the energy and contributions
>>>> on key algorithms that are proven to scale in production and have seen
>>>> wide-spread adoption.  To this end, in the next release, the project is
>>>> planning on removing support for the following algorithms unless there is
>>>> sustained support and improvement of them before the next release.
>>>> 
>>>> The algorithms to be removed are:
>>>> - From Clustering:
>>>>       Dirichlet
>>>>       MeanShift
>>>>       MinHash
>>>> - From Classification (both are sequential implementations)
>>>>       Winnow
>>>>       Perceptron
>>>> - Frequent Pattern Mining
>>>> - Collaborative Filtering
>>>>       GSI: DO ANY GO HERE?
>>>> - Other
>>>>       GSI: ANYTHING?
>>>> 
>>>> If you are interested in supporting 1 or more of these algorithms, please
>>>> make it known on dev@mahout.apache.org and via JIRA issues that fix
>>>> and/or improve them.  Please also provide supporting evidence as to there
>>>> effectiveness for you in production.
>>>> 
>>>> 1.0 PLANS
>>>> 
>>>> Our plans as a community are to focus 0.9 on cleanup of bugs and the
>>>> removal of the code mentioned above and then to follow with a 1.0 release
>>>> soon thereafter, at which point the community is committing to the support
>>>> of the algorithms packaged in the 1.0 for at least two minor versions after
>>>> their release.  In the case of removal, we will deprecate the functionality
>>>> in the 1.(x+1) minor release and remove it in the 1.(x+2) release.  For
>>>> instance, if feature X is to be removed after the 1.2 release, it will be
>>>> deprecated in 1.3 and removed in 1.4.
>>>> 
>>>> ------------------- DRAFT ----------------------
>>>> 
>>>> -Grant
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> Thanks.
>> 
>> --------------------------------------------
>> Grant Ingersoll | @gsingers
>> http://www.lucidworks.com
>> 
>> 
>> 
>> 
>> 
>> 
> 

--------------------------------------------
Grant Ingersoll | @gsingers
http://www.lucidworks.com

Re: [DRAFT] 0.8 Release Announcement + Future Plans Discussion

Posted by Grant Ingersoll <gs...@apache.org>.

On Jun 8, 2013, at 1:26 PM, Sebastian Schelter <ss...@apache.org> wrote:

> Hi Grant,
> 
> Very good release announcement. I propose that we deprecate a lot more,
> I think we should be aggressive here to pave the way for a clean and
> slim 1.0 release.
> 
> I propose to additionally deprecate the following algorithms, as to my
> state of knowledge, they are not actively used:
> 
> Collaborative Filtering:
> 
> - all recommenders in o.a.m.cf.taste.impl.recommender.knn
> 
> - the TreeClusteringRecommender in o.a.m.cf.taste.impl.recommender
> 
> - the SlopeOne implementations in o.a.m..cf.taste.hadoop.slopeone and
> o.a.m.cf.taste.impl.recommender.slopeone
> 
> - the distributed pseudo recommender in o.a.m.cf.taste.hadoop.pseudo

Pseudo is useful, no?  Don't know about the others.

> 
> Classification:
> 
> - the Hidden Markov Models in o.a.m.classifier.sequencelearning.hmm

We have some parallel training stuff coming, so I'd say -1 here, as I think HMMs are pretty important, no?

> 
> Clustering
> 
> - Fuzzy k-Means o.a.m.clustering.fuzzykmeans
> - Spectral k-Means in o.a.m.clustering.spectral

-1 on spectral being dropped as that seems to receive decent traction.

Not sure on Fuzzy, as I think it is a pretty trivial extension of K-Means.

> 
> Math
> 
> - the tooling in o.a.m.math.stats.entropy
> 
> Furthermore, I think we should deprecate the Lanczos implementation in
> o.a.m.math.hadoop.decomposer and port all code that uses it to SSVD.

No opinion.

+1 on everything else.

> 
> To all users and other committers, this is a biased first proposal,
> please shout, if you see things different and want to have things kept.
> 
> Best,
> Sebastian
> 
> 
> On 08.06.2013 16:42, Grant Ingersoll wrote:
>> More tests are always welcome.
>> 
>> On Jun 8, 2013, at 10:29 AM, Ravi Mummulla <ra...@gmail.com> wrote:
>> 
>>> Hi Grant,
>>> Regarding 1.0 plans, do we also want to include a note on adding tests
>>> where they don't exist or improving them where needed or is that implicit?
>>> 
>>> Thanks.
>>> 
>>> 
>>> On Sat, Jun 8, 2013 at 3:55 AM, Grant Ingersoll <gs...@apache.org> wrote:
>>> 
>>>> Hi Mahouts,
>>>> 
>>>> A full copy of proposed draft release notes are up at
>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Release+0.8.  Please
>>>> add/edit as appropriate.
>>>> 
>>>> IN PARTICULAR, PLEASE PAY CLOSE ATTENTION TO THE SECTION LABELLED __FUTURE
>>>> PLANS__, which I have included below.  This is purely my own opinion, but I
>>>> think it reflects conversations I've had w/ both Robin and Sebastian at
>>>> Berlin Buzzwords.   I'm also interested in opinions on my proposed
>>>> deprecation plan (which I haven't discussed with anyone) which is put forth
>>>> in the 1.0 plans below.
>>>> 
>>>> --------------------------  DRAFT -------------------------
>>>> FUTURE PLANS
>>>> 
>>>> 0.9
>>>> 
>>>> As the project moves towards a 1.0 release, the community is working to
>>>> clean up and/or remove parts of the code base that are under-supported or
>>>> that underperform as well as to better focus the energy and contributions
>>>> on key algorithms that are proven to scale in production and have seen
>>>> wide-spread adoption.  To this end, in the next release, the project is
>>>> planning on removing support for the following algorithms unless there is
>>>> sustained support and improvement of them before the next release.
>>>> 
>>>> The algorithms to be removed are:
>>>> - From Clustering:
>>>>       Dirichlet
>>>>       MeanShift
>>>>       MinHash
>>>> - From Classification (both are sequential implementations)
>>>>       Winnow
>>>>       Perceptron
>>>> - Frequent Pattern Mining
>>>> - Collaborative Filtering
>>>>       GSI: DO ANY GO HERE?
>>>> - Other
>>>>       GSI: ANYTHING?
>>>> 
>>>> If you are interested in supporting 1 or more of these algorithms, please
>>>> make it known on dev@mahout.apache.org and via JIRA issues that fix
>>>> and/or improve them.  Please also provide supporting evidence as to there
>>>> effectiveness for you in production.
>>>> 
>>>> 1.0 PLANS
>>>> 
>>>> Our plans as a community are to focus 0.9 on cleanup of bugs and the
>>>> removal of the code mentioned above and then to follow with a 1.0 release
>>>> soon thereafter, at which point the community is committing to the support
>>>> of the algorithms packaged in the 1.0 for at least two minor versions after
>>>> their release.  In the case of removal, we will deprecate the functionality
>>>> in the 1.(x+1) minor release and remove it in the 1.(x+2) release.  For
>>>> instance, if feature X is to be removed after the 1.2 release, it will be
>>>> deprecated in 1.3 and removed in 1.4.
>>>> 
>>>> ------------------- DRAFT ----------------------
>>>> 
>>>> -Grant
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> Thanks.
>> 
>> --------------------------------------------
>> Grant Ingersoll | @gsingers
>> http://www.lucidworks.com
>> 
>> 
>> 
>> 
>> 
>> 
> 

--------------------------------------------
Grant Ingersoll | @gsingers
http://www.lucidworks.com

Re: [DRAFT] 0.8 Release Announcement + Future Plans Discussion

Posted by Sebastian Schelter <ss...@apache.org>.

Hi Grant,

Very good release announcement. I propose that we deprecate a lot more,
I think we should be aggressive here to pave the way for a clean and
slim 1.0 release.

I propose to additionally deprecate the following algorithms, as to my
state of knowledge, they are not actively used:

Collaborative Filtering:

- all recommenders in o.a.m.cf.taste.impl.recommender.knn

- the TreeClusteringRecommender in o.a.m.cf.taste.impl.recommender

- the SlopeOne implementations in o.a.m..cf.taste.hadoop.slopeone and
o.a.m.cf.taste.impl.recommender.slopeone

- the distributed pseudo recommender in o.a.m.cf.taste.hadoop.pseudo

Classification:

- the Hidden Markov Models in o.a.m.classifier.sequencelearning.hmm

Clustering

- Fuzzy k-Means o.a.m.clustering.fuzzykmeans
- Spectral k-Means in o.a.m.clustering.spectral

Math

- the tooling in o.a.m.math.stats.entropy

Furthermore, I think we should deprecate the Lanczos implementation in
o.a.m.math.hadoop.decomposer and port all code that uses it to SSVD.

To all users and other committers, this is a biased first proposal,
please shout, if you see things different and want to have things kept.

Best,
Sebastian


On 08.06.2013 16:42, Grant Ingersoll wrote:
> More tests are always welcome.
> 
> On Jun 8, 2013, at 10:29 AM, Ravi Mummulla <ra...@gmail.com> wrote:
> 
>> Hi Grant,
>> Regarding 1.0 plans, do we also want to include a note on adding tests
>> where they don't exist or improving them where needed or is that implicit?
>>
>> Thanks.
>>
>>
>> On Sat, Jun 8, 2013 at 3:55 AM, Grant Ingersoll <gs...@apache.org> wrote:
>>
>>> Hi Mahouts,
>>>
>>> A full copy of proposed draft release notes are up at
>>> https://cwiki.apache.org/confluence/display/MAHOUT/Release+0.8.  Please
>>> add/edit as appropriate.
>>>
>>> IN PARTICULAR, PLEASE PAY CLOSE ATTENTION TO THE SECTION LABELLED __FUTURE
>>> PLANS__, which I have included below.  This is purely my own opinion, but I
>>> think it reflects conversations I've had w/ both Robin and Sebastian at
>>> Berlin Buzzwords.   I'm also interested in opinions on my proposed
>>> deprecation plan (which I haven't discussed with anyone) which is put forth
>>> in the 1.0 plans below.
>>>
>>> --------------------------  DRAFT -------------------------
>>> FUTURE PLANS
>>>
>>> 0.9
>>>
>>> As the project moves towards a 1.0 release, the community is working to
>>> clean up and/or remove parts of the code base that are under-supported or
>>> that underperform as well as to better focus the energy and contributions
>>> on key algorithms that are proven to scale in production and have seen
>>> wide-spread adoption.  To this end, in the next release, the project is
>>> planning on removing support for the following algorithms unless there is
>>> sustained support and improvement of them before the next release.
>>>
>>> The algorithms to be removed are:
>>> - From Clustering:
>>>        Dirichlet
>>>        MeanShift
>>>        MinHash
>>> - From Classification (both are sequential implementations)
>>>        Winnow
>>>        Perceptron
>>> - Frequent Pattern Mining
>>> - Collaborative Filtering
>>>        GSI: DO ANY GO HERE?
>>> - Other
>>>        GSI: ANYTHING?
>>>
>>> If you are interested in supporting 1 or more of these algorithms, please
>>> make it known on dev@mahout.apache.org and via JIRA issues that fix
>>> and/or improve them.  Please also provide supporting evidence as to there
>>> effectiveness for you in production.
>>>
>>> 1.0 PLANS
>>>
>>> Our plans as a community are to focus 0.9 on cleanup of bugs and the
>>> removal of the code mentioned above and then to follow with a 1.0 release
>>> soon thereafter, at which point the community is committing to the support
>>> of the algorithms packaged in the 1.0 for at least two minor versions after
>>> their release.  In the case of removal, we will deprecate the functionality
>>> in the 1.(x+1) minor release and remove it in the 1.(x+2) release.  For
>>> instance, if feature X is to be removed after the 1.2 release, it will be
>>> deprecated in 1.3 and removed in 1.4.
>>>
>>> ------------------- DRAFT ----------------------
>>>
>>> -Grant
>>
>>
>>
>>
>> -- 
>> Thanks.
> 
> --------------------------------------------
> Grant Ingersoll | @gsingers
> http://www.lucidworks.com
> 
> 
> 
> 
> 
>

Re: [DRAFT] 0.8 Release Announcement + Future Plans Discussion

Posted by Sebastian Schelter <ss...@apache.org>.

Hi Grant,

Very good release announcement. I propose that we deprecate a lot more,
I think we should be aggressive here to pave the way for a clean and
slim 1.0 release.

I propose to additionally deprecate the following algorithms, as to my
state of knowledge, they are not actively used:

Collaborative Filtering:

- all recommenders in o.a.m.cf.taste.impl.recommender.knn

- the TreeClusteringRecommender in o.a.m.cf.taste.impl.recommender

- the SlopeOne implementations in o.a.m..cf.taste.hadoop.slopeone and
o.a.m.cf.taste.impl.recommender.slopeone

- the distributed pseudo recommender in o.a.m.cf.taste.hadoop.pseudo

Classification:

- the Hidden Markov Models in o.a.m.classifier.sequencelearning.hmm

Clustering

- Fuzzy k-Means o.a.m.clustering.fuzzykmeans
- Spectral k-Means in o.a.m.clustering.spectral

Math

- the tooling in o.a.m.math.stats.entropy

Furthermore, I think we should deprecate the Lanczos implementation in
o.a.m.math.hadoop.decomposer and port all code that uses it to SSVD.

To all users and other committers, this is a biased first proposal,
please shout, if you see things different and want to have things kept.

Best,
Sebastian


On 08.06.2013 16:42, Grant Ingersoll wrote:
> More tests are always welcome.
> 
> On Jun 8, 2013, at 10:29 AM, Ravi Mummulla <ra...@gmail.com> wrote:
> 
>> Hi Grant,
>> Regarding 1.0 plans, do we also want to include a note on adding tests
>> where they don't exist or improving them where needed or is that implicit?
>>
>> Thanks.
>>
>>
>> On Sat, Jun 8, 2013 at 3:55 AM, Grant Ingersoll <gs...@apache.org> wrote:
>>
>>> Hi Mahouts,
>>>
>>> A full copy of proposed draft release notes are up at
>>> https://cwiki.apache.org/confluence/display/MAHOUT/Release+0.8.  Please
>>> add/edit as appropriate.
>>>
>>> IN PARTICULAR, PLEASE PAY CLOSE ATTENTION TO THE SECTION LABELLED __FUTURE
>>> PLANS__, which I have included below.  This is purely my own opinion, but I
>>> think it reflects conversations I've had w/ both Robin and Sebastian at
>>> Berlin Buzzwords.   I'm also interested in opinions on my proposed
>>> deprecation plan (which I haven't discussed with anyone) which is put forth
>>> in the 1.0 plans below.
>>>
>>> --------------------------  DRAFT -------------------------
>>> FUTURE PLANS
>>>
>>> 0.9
>>>
>>> As the project moves towards a 1.0 release, the community is working to
>>> clean up and/or remove parts of the code base that are under-supported or
>>> that underperform as well as to better focus the energy and contributions
>>> on key algorithms that are proven to scale in production and have seen
>>> wide-spread adoption.  To this end, in the next release, the project is
>>> planning on removing support for the following algorithms unless there is
>>> sustained support and improvement of them before the next release.
>>>
>>> The algorithms to be removed are:
>>> - From Clustering:
>>>        Dirichlet
>>>        MeanShift
>>>        MinHash
>>> - From Classification (both are sequential implementations)
>>>        Winnow
>>>        Perceptron
>>> - Frequent Pattern Mining
>>> - Collaborative Filtering
>>>        GSI: DO ANY GO HERE?
>>> - Other
>>>        GSI: ANYTHING?
>>>
>>> If you are interested in supporting 1 or more of these algorithms, please
>>> make it known on dev@mahout.apache.org and via JIRA issues that fix
>>> and/or improve them.  Please also provide supporting evidence as to there
>>> effectiveness for you in production.
>>>
>>> 1.0 PLANS
>>>
>>> Our plans as a community are to focus 0.9 on cleanup of bugs and the
>>> removal of the code mentioned above and then to follow with a 1.0 release
>>> soon thereafter, at which point the community is committing to the support
>>> of the algorithms packaged in the 1.0 for at least two minor versions after
>>> their release.  In the case of removal, we will deprecate the functionality
>>> in the 1.(x+1) minor release and remove it in the 1.(x+2) release.  For
>>> instance, if feature X is to be removed after the 1.2 release, it will be
>>> deprecated in 1.3 and removed in 1.4.
>>>
>>> ------------------- DRAFT ----------------------
>>>
>>> -Grant
>>
>>
>>
>>
>> -- 
>> Thanks.
> 
> --------------------------------------------
> Grant Ingersoll | @gsingers
> http://www.lucidworks.com
> 
> 
> 
> 
> 
>

Re: [DRAFT] 0.8 Release Announcement + Future Plans Discussion

Posted by Grant Ingersoll <gs...@apache.org>.

More tests are always welcome.

On Jun 8, 2013, at 10:29 AM, Ravi Mummulla <ra...@gmail.com> wrote:

> Hi Grant,
> Regarding 1.0 plans, do we also want to include a note on adding tests
> where they don't exist or improving them where needed or is that implicit?
> 
> Thanks.
> 
> 
> On Sat, Jun 8, 2013 at 3:55 AM, Grant Ingersoll <gs...@apache.org> wrote:
> 
>> Hi Mahouts,
>> 
>> A full copy of proposed draft release notes are up at
>> https://cwiki.apache.org/confluence/display/MAHOUT/Release+0.8.  Please
>> add/edit as appropriate.
>> 
>> IN PARTICULAR, PLEASE PAY CLOSE ATTENTION TO THE SECTION LABELLED __FUTURE
>> PLANS__, which I have included below.  This is purely my own opinion, but I
>> think it reflects conversations I've had w/ both Robin and Sebastian at
>> Berlin Buzzwords.   I'm also interested in opinions on my proposed
>> deprecation plan (which I haven't discussed with anyone) which is put forth
>> in the 1.0 plans below.
>> 
>> --------------------------  DRAFT -------------------------
>> FUTURE PLANS
>> 
>> 0.9
>> 
>> As the project moves towards a 1.0 release, the community is working to
>> clean up and/or remove parts of the code base that are under-supported or
>> that underperform as well as to better focus the energy and contributions
>> on key algorithms that are proven to scale in production and have seen
>> wide-spread adoption.  To this end, in the next release, the project is
>> planning on removing support for the following algorithms unless there is
>> sustained support and improvement of them before the next release.
>> 
>> The algorithms to be removed are:
>> - From Clustering:
>>        Dirichlet
>>        MeanShift
>>        MinHash
>> - From Classification (both are sequential implementations)
>>        Winnow
>>        Perceptron
>> - Frequent Pattern Mining
>> - Collaborative Filtering
>>        GSI: DO ANY GO HERE?
>> - Other
>>        GSI: ANYTHING?
>> 
>> If you are interested in supporting 1 or more of these algorithms, please
>> make it known on dev@mahout.apache.org and via JIRA issues that fix
>> and/or improve them.  Please also provide supporting evidence as to there
>> effectiveness for you in production.
>> 
>> 1.0 PLANS
>> 
>> Our plans as a community are to focus 0.9 on cleanup of bugs and the
>> removal of the code mentioned above and then to follow with a 1.0 release
>> soon thereafter, at which point the community is committing to the support
>> of the algorithms packaged in the 1.0 for at least two minor versions after
>> their release.  In the case of removal, we will deprecate the functionality
>> in the 1.(x+1) minor release and remove it in the 1.(x+2) release.  For
>> instance, if feature X is to be removed after the 1.2 release, it will be
>> deprecated in 1.3 and removed in 1.4.
>> 
>> ------------------- DRAFT ----------------------
>> 
>> -Grant
> 
> 
> 
> 
> -- 
> Thanks.

--------------------------------------------
Grant Ingersoll | @gsingers
http://www.lucidworks.com

Re: [DRAFT] 0.8 Release Announcement + Future Plans Discussion

Posted by Ravi Mummulla <ra...@gmail.com>.

Hi Grant,
Regarding 1.0 plans, do we also want to include a note on adding tests
where they don't exist or improving them where needed or is that implicit?

Thanks.


On Sat, Jun 8, 2013 at 3:55 AM, Grant Ingersoll <gs...@apache.org> wrote:

> Hi Mahouts,
>
> A full copy of proposed draft release notes are up at
> https://cwiki.apache.org/confluence/display/MAHOUT/Release+0.8.  Please
> add/edit as appropriate.
>
> IN PARTICULAR, PLEASE PAY CLOSE ATTENTION TO THE SECTION LABELLED __FUTURE
> PLANS__, which I have included below.  This is purely my own opinion, but I
> think it reflects conversations I've had w/ both Robin and Sebastian at
> Berlin Buzzwords.   I'm also interested in opinions on my proposed
> deprecation plan (which I haven't discussed with anyone) which is put forth
> in the 1.0 plans below.
>
> --------------------------  DRAFT -------------------------
> FUTURE PLANS
>
> 0.9
>
> As the project moves towards a 1.0 release, the community is working to
> clean up and/or remove parts of the code base that are under-supported or
> that underperform as well as to better focus the energy and contributions
> on key algorithms that are proven to scale in production and have seen
> wide-spread adoption.  To this end, in the next release, the project is
> planning on removing support for the following algorithms unless there is
> sustained support and improvement of them before the next release.
>
> The algorithms to be removed are:
> - From Clustering:
>         Dirichlet
>         MeanShift
>         MinHash
> - From Classification (both are sequential implementations)
>         Winnow
>         Perceptron
> - Frequent Pattern Mining
> - Collaborative Filtering
>         GSI: DO ANY GO HERE?
> - Other
>         GSI: ANYTHING?
>
> If you are interested in supporting 1 or more of these algorithms, please
> make it known on dev@mahout.apache.org and via JIRA issues that fix
> and/or improve them.  Please also provide supporting evidence as to there
> effectiveness for you in production.
>
> 1.0 PLANS
>
> Our plans as a community are to focus 0.9 on cleanup of bugs and the
> removal of the code mentioned above and then to follow with a 1.0 release
> soon thereafter, at which point the community is committing to the support
> of the algorithms packaged in the 1.0 for at least two minor versions after
> their release.  In the case of removal, we will deprecate the functionality
> in the 1.(x+1) minor release and remove it in the 1.(x+2) release.  For
> instance, if feature X is to be removed after the 1.2 release, it will be
> deprecated in 1.3 and removed in 1.4.
>
> ------------------- DRAFT ----------------------
>
> -Grant




-- 
Thanks.