You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Benson Margulies <bi...@gmail.com> on 2010/01/25 18:08:56 UTC

Release thinking

I would be very happy to see a release with the Colt collections
before we switch, just so that there is something in the central repo
with an ASL license and some primitive collections. What's the current
release thinking?

p.s. I'll start on the math split tonight.

Re: Release thinking

Posted by Olivier Grisel <ol...@ensta.org>.
2010/2/8 Sean <sr...@gmail.com>:

As for MAHOUT-228 (SGD for logistic regression), I could not continue
the work I started on this during the last two week and don't have
time this week either. Furthermore, we need to align it to a common
api for document vectorizers (factorize out the murmurhash +
randomvectorizer related classes) and a generic classifier interface
to build F1 / precision / recall measures on top of it (to be shared
with bayes and pegasos classifiers).

So +1 for moving this to 0.4.

-- 
Olivier
http://twitter.com/ogrisel - http://code.oliviergrisel.name

Re: Release thinking

Posted by Sean <sr...@gmail.com>.
I suppose yeah :) but I was just being snarky and not serious.

On Mon, Feb 8, 2010 at 3:04 PM, Robin Anil <ro...@gmail.com> wrote:
> 0.2.5 you mean? otherwise we are skipping numbers :)

Re: Release thinking

Posted by Robin Anil <ro...@gmail.com>.
On Mon, Feb 8, 2010 at 8:32 PM, Sean <sr...@gmail.com> wrote:

> On Sun, Feb 7, 2010 at 10:56 PM, Jake Mannix <ja...@gmail.com>
> wrote:
> > MAHOUT-185 </jira/browse/MAHOUT-185>Add mahout shell script for easy
> > launching of various algorithms </jira/browse/MAHOUT-185>
>
> This one I'd disagree with, it's not a blocker in any sense, has been
> on the books a while and there's not evidence of progress. Marking it
> for 0.4 in no way means it won't happen, just indicates we're not
> waiting another week on it.
>
> > MAHOUT-228 </jira/browse/MAHOUT-228>Need sequential logistic regression
> > implementation using SGD techniques </jira/browse/MAHOUT-228>
> > MAHOUT-180 </jira/browse/MAHOUT-180>port Hadoop-ified Lanczos SVD
> > implementation from decomposer
> > </jira/browse/MAHOUT-180>MAHOUT-242</jira/browse/MAHOUT-242>LLR
> > Collocation Identifier </jira/browse/MAHOUT-242>
> > The first two have been on the 0.3 list for a while and should go in.
> >  MAHOUT-180 I've reopened to track the fact that I've got a patch coming
> > soon (before we cut 0.3) which puts Lanczos on Hadoop (It just needs
> > testing).  MAHOUT-242 was planned for 0.3 for ages, it just hadn't been
> > tagged with the release properly.
>
> Same question I guess, who is finishing these in the next couple days?
> we were supposed to be releasing 0.3 last week.
>
> Could I talk people into an 0.25 release? and do 0.3 in a couple weeks?
> And if so could we call 0.25 0.3, and 0.3 0.4? you get my drift.
>

0.2.5 you mean? otherwise we are skipping numbers :)

>
> We should be releasing more frequently. A call to release 0.3 isn't a
> call to finish everything you want to do in the next few months, it's
> a call to make sure there aren't glaring bugs or loose ends.
>

Re: Release thinking

Posted by Sean <sr...@gmail.com>.
On Sun, Feb 7, 2010 at 10:56 PM, Jake Mannix <ja...@gmail.com> wrote:
> MAHOUT-185 </jira/browse/MAHOUT-185>Add mahout shell script for easy
> launching of various algorithms </jira/browse/MAHOUT-185>

This one I'd disagree with, it's not a blocker in any sense, has been
on the books a while and there's not evidence of progress. Marking it
for 0.4 in no way means it won't happen, just indicates we're not
waiting another week on it.

> MAHOUT-228 </jira/browse/MAHOUT-228>Need sequential logistic regression
> implementation using SGD techniques </jira/browse/MAHOUT-228>
> MAHOUT-180 </jira/browse/MAHOUT-180>port Hadoop-ified Lanczos SVD
> implementation from decomposer
> </jira/browse/MAHOUT-180>MAHOUT-242</jira/browse/MAHOUT-242>LLR
> Collocation Identifier </jira/browse/MAHOUT-242>
> The first two have been on the 0.3 list for a while and should go in.
>  MAHOUT-180 I've reopened to track the fact that I've got a patch coming
> soon (before we cut 0.3) which puts Lanczos on Hadoop (It just needs
> testing).  MAHOUT-242 was planned for 0.3 for ages, it just hadn't been
> tagged with the release properly.

Same question I guess, who is finishing these in the next couple days?
we were supposed to be releasing 0.3 last week.

Could I talk people into an 0.25 release? and do 0.3 in a couple weeks?
And if so could we call 0.25 0.3, and 0.3 0.4? you get my drift.

We should be releasing more frequently. A call to release 0.3 isn't a
call to finish everything you want to do in the next few months, it's
a call to make sure there aren't glaring bugs or loose ends.

Re: Release thinking

Posted by Jake Mannix <ja...@gmail.com>.
On Sun, Feb 7, 2010 at 2:46 PM, Sean <sr...@gmail.com> wrote:

> The number's going up again though -- I think anything created since
> last week should marked 0.4 unless it's an important bug or very quick
> win.
>
> What if I put it this way -- if I marked all open 0.3 issues as 0.4
> right now, what would the objections be?
>

This is a good way to think of it, here would be my objections:

MAHOUT-185 </jira/browse/MAHOUT-185>Add mahout shell script for easy
launching of various algorithms </jira/browse/MAHOUT-185>
MAHOUT-228 </jira/browse/MAHOUT-228>Need sequential logistic regression
implementation using SGD techniques </jira/browse/MAHOUT-228>
MAHOUT-180 </jira/browse/MAHOUT-180>port Hadoop-ified Lanczos SVD
implementation from decomposer
</jira/browse/MAHOUT-180>MAHOUT-242</jira/browse/MAHOUT-242>LLR
Collocation Identifier </jira/browse/MAHOUT-242>
The first two have been on the 0.3 list for a while and should go in.
 MAHOUT-180 I've reopened to track the fact that I've got a patch coming
soon (before we cut 0.3) which puts Lanczos on Hadoop (It just needs
testing).  MAHOUT-242 was planned for 0.3 for ages, it just hadn't been
tagged with the release properly.

Other than these, I'd +1 pushing everything else out to 0.4

  -jake



>
>
>
> On Fri, Feb 5, 2010 at 4:31 PM, Robin Anil <ro...@gmail.com> wrote:
> > Yum Yum.
> >
> > 0.1 59 issues
> > 0.2 66 issues
> > 0.3 91 issues - 13 left
> >
>

Re: Release thinking

Posted by Sean <sr...@gmail.com>.
The number's going up again though -- I think anything created since
last week should marked 0.4 unless it's an important bug or very quick
win.

What if I put it this way -- if I marked all open 0.3 issues as 0.4
right now, what would the objections be?



On Fri, Feb 5, 2010 at 4:31 PM, Robin Anil <ro...@gmail.com> wrote:
> Yum Yum.
>
> 0.1 59 issues
> 0.2 66 issues
> 0.3 91 issues - 13 left
>

Re: Release thinking

Posted by Drew Farris <dr...@gmail.com>.
Sounds great to me.

On Fri, Feb 5, 2010 at 11:50 AM, Ted Dunning <te...@gmail.com> wrote:
> Makes a lot of sense.  Drew?
>
> On Fri, Feb 5, 2010 at 8:48 AM, Jake Mannix <ja...@gmail.com> wrote:
>
>> So are we really planning on all this structured document stuff and Avro
>> for
>> 0.3?  Can we just try and finish up what was already scoped for 0.3 and
>> have
>> a quick turnaround for getting things which have only been really started
>> worked on in the past week or so for 0.4 sometime next month?
>>
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>

Re: Release thinking

Posted by Ted Dunning <te...@gmail.com>.
Makes a lot of sense.  Drew?

On Fri, Feb 5, 2010 at 8:48 AM, Jake Mannix <ja...@gmail.com> wrote:

> So are we really planning on all this structured document stuff and Avro
> for
> 0.3?  Can we just try and finish up what was already scoped for 0.3 and
> have
> a quick turnaround for getting things which have only been really started
> worked on in the past week or so for 0.4 sometime next month?
>



-- 
Ted Dunning, CTO
DeepDyve

Re: Release thinking

Posted by Robin Anil <ro...@gmail.com>.
I just updated it here.

http://cwiki.apache.org/MAHOUT/creating-vectors-from-text.html

Lets rename/refactor the classes and get basic avro thing in for 0.3. So
that people who use gets a smooth upgrade to 0.4

Robin

On Fri, Feb 5, 2010 at 10:32 PM, Drew Farris <dr...@gmail.com> wrote:

> On Fri, Feb 5, 2010 at 11:53 AM, Jake Mannix <ja...@gmail.com>
> wrote:
>
> >
> > Which is not to say that we shouldn't continue work on them, let's keep
> the
> > patches going and up to date, let's just not worry about holding up 0.3
> > until they're fully tested and checked in.
>
> Yes absolutely. I'm also interested in hearing Robin's thoughts on how
> far the current document vectorizer, n-gram work should go for 0.3
>
> Drew
>

Re: Release thinking

Posted by Drew Farris <dr...@gmail.com>.
On Fri, Feb 5, 2010 at 11:53 AM, Jake Mannix <ja...@gmail.com> wrote:

>
> Which is not to say that we shouldn't continue work on them, let's keep the
> patches going and up to date, let's just not worry about holding up 0.3
> until they're fully tested and checked in.

Yes absolutely. I'm also interested in hearing Robin's thoughts on how
far the current document vectorizer, n-gram work should go for 0.3

Drew

Re: Release thinking

Posted by Jake Mannix <ja...@gmail.com>.
On Fri, Feb 5, 2010 at 8:48 AM, Jake Mannix <ja...@gmail.com> wrote:

> So are we really planning on all this structured document stuff and Avro
> for 0.3?  Can we just try and finish up what was already scoped for 0.3 and
> have a quick turnaround for getting things which have only been really
> started worked on in the past week or so for 0.4 sometime next month?


Which is not to say that we shouldn't continue work on them, let's keep the
patches going and up to date, let's just not worry about holding up 0.3
until they're fully tested and checked in.

  -jake

Re: Release thinking

Posted by Jake Mannix <ja...@gmail.com>.
So are we really planning on all this structured document stuff and Avro for
0.3?  Can we just try and finish up what was already scoped for 0.3 and have
a quick turnaround for getting things which have only been really started
worked on in the past week or so for 0.4 sometime next month?

  -jake

On Fri, Feb 5, 2010 at 8:31 AM, Robin Anil <ro...@gmail.com> wrote:

> Yum Yum.
>
> 0.1 59 issues
> 0.2 66 issues
> 0.3 91 issues - 13 left
>
>
>
>
>
> On Fri, Feb 5, 2010 at 9:47 PM, Ted Dunning <te...@gmail.com> wrote:
>
> > I just marked the 0.1 and 0.2 releases as released (about time).  This
> > makes
> > the JIRA road map feature more usable.
> >
> > See here for the live version of this summary:
> >
> >
> https://issues.apache.org/jira/browse/MAHOUT?report=com.atlassian.jira.plugin.system.project:roadmap-panel
> >
> > On Fri, Feb 5, 2010 at 3:16 AM, Robin Anil <ro...@gmail.com> wrote:
> >
> > > Reviving this thread. Copy paste the whole thing as we move forward
> > >
> > > Current Snapshot
> > >
> > > Key     Summary
> > > > MAHOUT-221      Implementation of FP-Bonsai Pruning for fast pattern
> > > mining
> > > >    Done
> > > > MAHOUT-227      Parallel SVM   In Progress
> > > > MAHOUT-240      Parallel version of Perceptron   Little Progress
> > > > MAHOUT-241      Example for perceptron     Little Progress
> > > > MAHOUT-185      Add mahout shell script for easy launching of various
> > > > algorithms   In Progress
> > > > MAHOUT-153      Implement kmeans++ for initial cluster selection in
> > > > kmeans    Little Progress  (There is discussion, but no patch yet)
> > > > MAHOUT-232      Implementation of sequential SVM solver based on
> > Pegasos
> > >    In
> > > > Progress
> > > > MAHOUT-228      Need sequential logistic regression implementation
> > using
> > > > SGD techniques     In Progress
> > >
> > > MAHOUT-263      Matrix interface should extend Iterable<Vector> for
> > better
> > > > integration with distributed storage   Done
> > > > MAHOUT-237      Map/Reduce Implementation of Document Vectorizer
> Done
> > > > MAHOUT-220      Mahout Bayes Code cleanup     Done
> > >
> > > MAHOUT-265      Error with creating MVC from Lucene Index or Arff
> > Done
> > > > MAHOUT-215      Provide jars with mahout release.     Done
> > > > MAHOUT-209      Add aggregate() methods for Vector     Done
> > > > MAHOUT-231      Upgrade QM reports to use Clover 2.6    Little
> Progress
> > > Not
> > > > that required in the release(developer thing)
> > > >  MAHOUT-106      PLSI/EM in pig based on hofmann's ACM 04 paper.
>  In
> > > > Progress
> > > > MAHOUT-155      ARFF VectorIterable      Little Progress
> > > > MAHOUT-214      Implement Stacked RBM     Little Progress
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > Ted Dunning, CTO
> > DeepDyve
> >
>

Re: Release thinking

Posted by Robin Anil <ro...@gmail.com>.
Yum Yum.

0.1 59 issues
0.2 66 issues
0.3 91 issues - 13 left





On Fri, Feb 5, 2010 at 9:47 PM, Ted Dunning <te...@gmail.com> wrote:

> I just marked the 0.1 and 0.2 releases as released (about time).  This
> makes
> the JIRA road map feature more usable.
>
> See here for the live version of this summary:
>
> https://issues.apache.org/jira/browse/MAHOUT?report=com.atlassian.jira.plugin.system.project:roadmap-panel
>
> On Fri, Feb 5, 2010 at 3:16 AM, Robin Anil <ro...@gmail.com> wrote:
>
> > Reviving this thread. Copy paste the whole thing as we move forward
> >
> > Current Snapshot
> >
> > Key     Summary
> > > MAHOUT-221      Implementation of FP-Bonsai Pruning for fast pattern
> > mining
> > >    Done
> > > MAHOUT-227      Parallel SVM   In Progress
> > > MAHOUT-240      Parallel version of Perceptron   Little Progress
> > > MAHOUT-241      Example for perceptron     Little Progress
> > > MAHOUT-185      Add mahout shell script for easy launching of various
> > > algorithms   In Progress
> > > MAHOUT-153      Implement kmeans++ for initial cluster selection in
> > > kmeans    Little Progress  (There is discussion, but no patch yet)
> > > MAHOUT-232      Implementation of sequential SVM solver based on
> Pegasos
> >    In
> > > Progress
> > > MAHOUT-228      Need sequential logistic regression implementation
> using
> > > SGD techniques     In Progress
> >
> > MAHOUT-263      Matrix interface should extend Iterable<Vector> for
> better
> > > integration with distributed storage   Done
> > > MAHOUT-237      Map/Reduce Implementation of Document Vectorizer   Done
> > > MAHOUT-220      Mahout Bayes Code cleanup     Done
> >
> > MAHOUT-265      Error with creating MVC from Lucene Index or Arff
> Done
> > > MAHOUT-215      Provide jars with mahout release.     Done
> > > MAHOUT-209      Add aggregate() methods for Vector     Done
> > > MAHOUT-231      Upgrade QM reports to use Clover 2.6    Little Progress
> > Not
> > > that required in the release(developer thing)
> > >  MAHOUT-106      PLSI/EM in pig based on hofmann's ACM 04 paper.    In
> > > Progress
> > > MAHOUT-155      ARFF VectorIterable      Little Progress
> > > MAHOUT-214      Implement Stacked RBM     Little Progress
> > >
> > >
> >
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>

Re: Release thinking

Posted by Drew Farris <dr...@gmail.com>.
On Fri, Feb 5, 2010 at 11:17 AM, Ted Dunning <te...@gmail.com> wrote:
> I just marked the 0.1 and 0.2 releases as released (about time).  This makes
> the JIRA road map feature more usable.
>
> See here for the live version of this summary:
> https://issues.apache.org/jira/browse/MAHOUT?report=com.atlassian.jira.plugin.system.project:roadmap-panel
>

Very nice, thanks Ted.

Re: Release thinking

Posted by Ted Dunning <te...@gmail.com>.
I just marked the 0.1 and 0.2 releases as released (about time).  This makes
the JIRA road map feature more usable.

See here for the live version of this summary:
https://issues.apache.org/jira/browse/MAHOUT?report=com.atlassian.jira.plugin.system.project:roadmap-panel

On Fri, Feb 5, 2010 at 3:16 AM, Robin Anil <ro...@gmail.com> wrote:

> Reviving this thread. Copy paste the whole thing as we move forward
>
> Current Snapshot
>
> Key     Summary
> > MAHOUT-221      Implementation of FP-Bonsai Pruning for fast pattern
> mining
> >    Done
> > MAHOUT-227      Parallel SVM   In Progress
> > MAHOUT-240      Parallel version of Perceptron   Little Progress
> > MAHOUT-241      Example for perceptron     Little Progress
> > MAHOUT-185      Add mahout shell script for easy launching of various
> > algorithms   In Progress
> > MAHOUT-153      Implement kmeans++ for initial cluster selection in
> > kmeans    Little Progress  (There is discussion, but no patch yet)
> > MAHOUT-232      Implementation of sequential SVM solver based on Pegasos
>    In
> > Progress
> > MAHOUT-228      Need sequential logistic regression implementation using
> > SGD techniques     In Progress
>
> MAHOUT-263      Matrix interface should extend Iterable<Vector> for better
> > integration with distributed storage   Done
> > MAHOUT-237      Map/Reduce Implementation of Document Vectorizer   Done
> > MAHOUT-220      Mahout Bayes Code cleanup     Done
>
> MAHOUT-265      Error with creating MVC from Lucene Index or Arff     Done
> > MAHOUT-215      Provide jars with mahout release.     Done
> > MAHOUT-209      Add aggregate() methods for Vector     Done
> > MAHOUT-231      Upgrade QM reports to use Clover 2.6    Little Progress
> Not
> > that required in the release(developer thing)
> >  MAHOUT-106      PLSI/EM in pig based on hofmann's ACM 04 paper.    In
> > Progress
> > MAHOUT-155      ARFF VectorIterable      Little Progress
> > MAHOUT-214      Implement Stacked RBM     Little Progress
> >
> >
>



-- 
Ted Dunning, CTO
DeepDyve

Re: Release thinking

Posted by Robin Anil <ro...@gmail.com>.
Reviving this thread. Copy paste the whole thing as we move forward

Current Snapshot

Key     Summary
> MAHOUT-221      Implementation of FP-Bonsai Pruning for fast pattern mining
>    Done
> MAHOUT-227      Parallel SVM   In Progress
> MAHOUT-240      Parallel version of Perceptron   Little Progress
> MAHOUT-241      Example for perceptron     Little Progress
> MAHOUT-185      Add mahout shell script for easy launching of various
> algorithms   In Progress
> MAHOUT-153      Implement kmeans++ for initial cluster selection in
> kmeans    Little Progress  (There is discussion, but no patch yet)
> MAHOUT-232      Implementation of sequential SVM solver based on Pegasos    In
> Progress
> MAHOUT-228      Need sequential logistic regression implementation using
> SGD techniques     In Progress

MAHOUT-263      Matrix interface should extend Iterable<Vector> for better
> integration with distributed storage   Done
> MAHOUT-237      Map/Reduce Implementation of Document Vectorizer   Done
> MAHOUT-220      Mahout Bayes Code cleanup     Done

MAHOUT-265      Error with creating MVC from Lucene Index or Arff     Done
> MAHOUT-215      Provide jars with mahout release.     Done
> MAHOUT-209      Add aggregate() methods for Vector     Done
> MAHOUT-231      Upgrade QM reports to use Clover 2.6    Little Progress Not
> that required in the release(developer thing)
>  MAHOUT-106      PLSI/EM in pig based on hofmann's ACM 04 paper.    In
> Progress
> MAHOUT-155      ARFF VectorIterable      Little Progress
> MAHOUT-214      Implement Stacked RBM     Little Progress
>
>

Re: Release thinking

Posted by Drew Farris <dr...@gmail.com>.
On Mon, Jan 25, 2010 at 8:34 PM, Drew Farris <dr...@gmail.com> wrote:
>
>> MAHOUT-215      Provide jars with mahout release.
>
> +1 working on this now.

A patch is now available for this in JIRA. Feedback appreciated.

Re: Release thinking

Posted by Jake Mannix <ja...@gmail.com>.
On Mon, Jan 25, 2010 at 5:34 PM, Drew Farris <dr...@gmail.com> wrote:
>
>
> I'd like to get the LLR Collocation finder in there too, but we'll
> have to see how much time I have to work on that wrt to the release
> timeframe. Perhaps the simpler version that's available in the current
> patch if I don't get to the refactoring.
>

+1 - I want to go over some of the stuff you have already, because we
can iterate - the initial form you already have looks pretty good, we can
make
it better later if need be.

  -jake

Re: Release thinking

Posted by Drew Farris <dr...@gmail.com>.
On Mon, Jan 25, 2010 at 1:55 PM, Sean Owen <sr...@gmail.com> wrote:

> MAHOUT-185      Add mahout shell script for easy launching of various algorithms

+1, if even because I hate trying to remember the command-line
arguments for mvn exec:java. It becomes much easier to write
documentation/examples if it is easy to launch the algorithms and/or
utilities.

> MAHOUT-215      Provide jars with mahout release.

+1 working on this now.

> MAHOUT-153      Implement kmeans++ for initial cluster selection in kmeans

Would be nice to see, I believe two different people are working on
it, but I haven't seen any patches.

> MAHOUT-237      Map/Reduce Implementation of Document Vectorizer

+1

I'd like to get the LLR Collocation finder in there too, but we'll
have to see how much time I have to work on that wrt to the release
timeframe. Perhaps the simpler version that's available in the current
patch if I don't get to the refactoring.

Drew

Re: Release thinking

Posted by zhao zhendong <zh...@gmail.com>.
Hi all,

I will do my best to get this in 0.3 release.

{quote}
> MAHOUT-232      Implementation of sequential SVM solver based on Pegasos
>
This patch looks to be progressing - it would be really nice to get it in.
{quote}

Cheers,
Zhendong

-- 
-------------------------------------------------------------

Zhen-Dong Zhao (Maxim)

<><<><><><><><><><>><><><><><>>>>>>

Department of Computer Science
School of Computing
National University of Singapore

>>>>>>><><><><><><><><<><>><><<<<<<

Re: Release thinking

Posted by Isabel Drost <is...@apache.org>.
On Mon Jake Mannix <ja...@gmail.com> wrote:
> On Mon, Jan 25, 2010 at 10:55 AM, Sean Owen <sr...@gmail.com> wrote:
> 
> > Agree that we should start planning 0.3, as it will take over a
> > month I bet to actually be ready.
> >
> 
> +1 to releasing within a month or so.

+1 here as well. I think it would be great to reach a shorter
release cycle for Mahout.

Isabel

Re: Release thinking

Posted by Isabel Drost <is...@apache.org>.
On Mon Ted Dunning <te...@gmail.com> wrote:

> 240 can be WONT-FIX'ed.

+1


> I think that Isabel may have something for 241.

Nothing that I see as ready to go into 0.3.

Isabel

Re: Release thinking

Posted by Ted Dunning <te...@gmail.com>.
I am glad that I waited to read Jake's summary since it saved my typing and
agreeing with every one of his points.

The only difference I have is here:

> Key     Summary
> MAHOUT-221      Implementation of FP-Bonsai Pruning for fast pattern
mining
> MAHOUT-227      Parallel SVM
> MAHOUT-240      Parallel version of Perceptron
> MAHOUT-241      Example for perceptron
>

I think that 227 should be delayed (all the action is on the Pegasos issue
right now and it may make 227 irrelevant) and 240 can be WONT-FIX'ed.

It sounds like 221 is done and I think that Isabel may have something for
241.

On Mon, Jan 25, 2010 at 11:13 AM, Jake Mannix <ja...@gmail.com> wrote:

> Them's my $0.03 (inflation).


I only added $0.01.  Our average is good, though.

-- 
Ted Dunning, CTO
DeepDyve

Re: Release thinking

Posted by Isabel Drost <is...@apache.org>.
On Mon Grant Ingersoll <gs...@apache.org> wrote:
> >> MAHOUT-231      Upgrade QM reports to use Clover 2.6
> >> 
> > 
> > No idea on this one.
> 
> That should be independent of a release, I would think.

It is. What would be needed is adjusting our pom and the Hudson job
that builds the reports.

Isabel

Re: Release thinking

Posted by Grant Ingersoll <gs...@apache.org>.
On Jan 25, 2010, at 2:13 PM, Jake Mannix wrote:
> 
> 
>> MAHOUT-215      Provide jars with mahout release.
>> 
> 
> ++1 on this one getting in.  "showstopper" I'd say.

Not sure it is a showstopper, but it is important.


> 
> 
>> MAHOUT-209      Add aggregate() methods for Vector
>> 
> 
> It would be really nice to stop monkeying around with the basic linear
> primitive interfaces, because even though we have AbstractXYZ base classes
> which can implement most of this stuff... we just should.  So that's my way
> of
> saying I should either code this up, or close it as Won't Fix.  Should not
> be
> postponed to 0.4
> 
> 
>> MAHOUT-231      Upgrade QM reports to use Clover 2.6
>> 
> 
> No idea on this one.

That should be independent of a release, I would think.

> 
> 
>> MAHOUT-106      PLSI/EM in pig based on hofmann's ACM 04 paper.
>> 
> 
> Has anyone looked at this in a million years?

I keep promising to do it, but then never do.

> 
> 
>> MAHOUT-155      ARFF VectorIterable
>> 
> 
> We already can convert ARFF to our Vector, do we also need an iterable?
> Should this just be folded into some kind of "Vectorizer", the output being
> the usual SequenceFile<Integer, VectorWritable> which will be a basic input
> into HDFS-backed matrices?

I think we have some support for ARFF, but I believe there are still some open needs here, which is why I never closed it.


-Grant

Re: Release thinking

Posted by Jake Mannix <ja...@gmail.com>.
On Mon, Jan 25, 2010 at 10:55 AM, Sean Owen <sr...@gmail.com> wrote:

> Agree that we should start planning 0.3, as it will take over a month
> I bet to actually be ready.
>

+1 to releasing within a month or so.


> How about everyone take a moment to focus on what's marked for 0.3?
> for any issue that concerns you:
>

Some thoughts inline, before we start reclassifying JIRA tickets
all over the place and I lose track of them:


> Key     Summary
> MAHOUT-221      Implementation of FP-Bonsai Pruning for fast pattern mining
> MAHOUT-227      Parallel SVM
> MAHOUT-240      Parallel version of Perceptron
> MAHOUT-241      Example for perceptron
>

I don't know about any of these really.


> MAHOUT-185      Add mahout shell script for easy launching of various
> algorithms
>

This is pretty key, I can add in some Properties file based ways of doing
this as well,
so that not everything is on the CLI.  We don't need a perfect patch here,
but a good
start would be nice to commit.


> MAHOUT-153      Implement kmeans++ for initial cluster selection in kmeans
>

Is there progress on this one?


> MAHOUT-232      Implementation of sequential SVM solver based on Pegasos
>

This patch looks to be progressing - it would be really nice to get it in.


> MAHOUT-228      Need sequential logistic regression implementation using
> SGD techniques
>

This is looking great so far and should make it in for this release.


> MAHOUT-263      Matrix interface should extend Iterable<Vector> for better
> integration with distributed storage
>

I've got a patch with this already, but I need to integrate the usage of
this with
the o.a.m.math.decomposer impls properly.  Unit tests aren't succeeding with
this
patch yet.  But it should be in for this release.


> MAHOUT-237      Map/Reduce Implementation of Document Vectorizer
>

Basically done, right?  Should be in 0.3


> MAHOUT-220      Mahout Bayes Code cleanup
>

Ditto for this one.


> MAHOUT-265      Error with creating MVC from Lucene Index or Arff
>

One-line fix for me, I'll get to this shortly.


> MAHOUT-215      Provide jars with mahout release.
>

++1 on this one getting in.  "showstopper" I'd say.


> MAHOUT-209      Add aggregate() methods for Vector
>

It would be really nice to stop monkeying around with the basic linear
primitive interfaces, because even though we have AbstractXYZ base classes
which can implement most of this stuff... we just should.  So that's my way
of
saying I should either code this up, or close it as Won't Fix.  Should not
be
postponed to 0.4


> MAHOUT-231      Upgrade QM reports to use Clover 2.6
>

No idea on this one.


> MAHOUT-106      PLSI/EM in pig based on hofmann's ACM 04 paper.
>

Has anyone looked at this in a million years?


> MAHOUT-155      ARFF VectorIterable
>

We already can convert ARFF to our Vector, do we also need an iterable?
Should this just be folded into some kind of "Vectorizer", the output being
the usual SequenceFile<Integer, VectorWritable> which will be a basic input
into HDFS-backed matrices?


> MAHOUT-214      Implement Stacked RBM
>

This needs to go to 0.4, no progress has been made on this, but I don't
want to see it disappear from view into the black whole of "someday" just
yet.

Them's my $0.03 (inflation).

  -jake

Re: Release thinking

Posted by Sean Owen <sr...@gmail.com>.
Agree that we should start planning 0.3, as it will take over a month
I bet to actually be ready.

How about everyone take a moment to focus on what's marked for 0.3?
for any issue that concerns you:

- defer to 0.4
- or resolve if already done
- or go ahead and fix/implement it

... and ideally this is done for all issues in the next couple weeks.

Key	Summary
MAHOUT-221	Implementation of FP-Bonsai Pruning for fast pattern mining
MAHOUT-227	Parallel SVM
MAHOUT-240	Parallel version of Perceptron
MAHOUT-241	Example for perceptron
MAHOUT-185	Add mahout shell script for easy launching of various algorithms
MAHOUT-153	Implement kmeans++ for initial cluster selection in kmeans
MAHOUT-232	Implementation of sequential SVM solver based on Pegasos
MAHOUT-228	Need sequential logistic regression implementation using
SGD techniques
MAHOUT-263	Matrix interface should extend Iterable<Vector> for better
integration with distributed storage
MAHOUT-237	Map/Reduce Implementation of Document Vectorizer
MAHOUT-220	Mahout Bayes Code cleanup
MAHOUT-265	Error with creating MVC from Lucene Index or Arff
MAHOUT-215	Provide jars with mahout release.
MAHOUT-209	Add aggregate() methods for Vector
MAHOUT-231	Upgrade QM reports to use Clover 2.6
MAHOUT-106	PLSI/EM in pig based on hofmann's ACM 04 paper.
MAHOUT-155	ARFF VectorIterable
MAHOUT-214	Implement Stacked RBM


On Mon, Jan 25, 2010 at 6:16 PM, Dawid Weiss <da...@gmail.com> wrote:
> I strongly support this -- ironically, we in Carrot2 also need such a
> release (versioned, with Maven artefact to refer to). HPPC works more
> than fine for us, but portions of the code are bound to Colt and we
> can't easily switch all of it to HPPC yet.
>
> I'd apply that patch for sorting first though, it's definitely a bug.
>
> Dawid
>
> On Mon, Jan 25, 2010 at 6:08 PM, Benson Margulies <bi...@gmail.com> wrote:
>> I would be very happy to see a release with the Colt collections
>> before we switch, just so that there is something in the central repo
>> with an ASL license and some primitive collections. What's the current
>> release thinking?
>>
>> p.s. I'll start on the math split tonight.
>>
>

Re: Release thinking

Posted by Dawid Weiss <da...@gmail.com>.
I strongly support this -- ironically, we in Carrot2 also need such a
release (versioned, with Maven artefact to refer to). HPPC works more
than fine for us, but portions of the code are bound to Colt and we
can't easily switch all of it to HPPC yet.

I'd apply that patch for sorting first though, it's definitely a bug.

Dawid

On Mon, Jan 25, 2010 at 6:08 PM, Benson Margulies <bi...@gmail.com> wrote:
> I would be very happy to see a release with the Colt collections
> before we switch, just so that there is something in the central repo
> with an ASL license and some primitive collections. What's the current
> release thinking?
>
> p.s. I'll start on the math split tonight.
>