You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Grant Ingersoll <gs...@apache.org> on 2009/09/28 19:53:34 UTC

0.2

Not too many open at this point:  https://issues.apache.org/jira/secure/BrowseVersion.jspa?id=12310751&versionId=12313278&showOpenIssuesOnly=true

Some are relatively minor, others are ready, but just need a final  
review.  Can we push towards mid-October for a release?  Anyone  
volunteer to be the release mgr?

-Grant

Re: 0.2

Posted by Robin Anil <ro...@gmail.com>.
I would like to add a few generic util classes like n-tuple of string or int
or float. I am sure it will help a lot of algorithms.
FPGrowth it seems can go with this release. I have done some work over the
weekend, the perf has improved, I am adding filters for checking closed
pattern to remove repetition. i am sure 1-2 weeks will be enough to complete
the parallel version



On Tue, Sep 29, 2009 at 12:26 AM, David Hall <dl...@cs.berkeley.edu> wrote:

> 2009/9/28 Grant Ingersoll <gs...@apache.org>:
> >
> > On Sep 28, 2009, at 2:16 PM, Ted Dunning wrote:
> >
> >> Many of these are actually nearly (completely) done.
> >>
> >> Is there a goal for the 0.2 release other than fixing outstanding
> issues?
> >
> > I'd like to see some of the performance issues around SparseVector taken
> care of.  I think we also said we wanted to get the Random Forest and Bayes
> stuff in that Robin and Deneche are working on.
> >
> > Beyond that, I plan on doing some profiling of the LDA stuff.  I'd say we
> are pretty close.
>
> From my memory, using hprof, it looks like most of the time is spent doing
> math.
>
> (I haven't had a chance to try out YourKit, though.)
>
> -- David
>
> >
> >
> >>
> >> On Mon, Sep 28, 2009 at 10:53 AM, Grant Ingersoll <gsingers@apache.org
> >wrote:
> >>
> >>> Not too many open at this point:
> >>>
> https://issues.apache.org/jira/secure/BrowseVersion.jspa?id=12310751&versionId=12313278&showOpenIssuesOnly=true
> >>>
> >>> Some are relatively minor, others are ready, but just need a final
> review.
> >>> Can we push towards mid-October for a release?  Anyone volunteer to be
> the
> >>> release mgr?
> >>>
> >>> -Grant
> >>>
> >>
> >>
> >>
> >> --
> >> Ted Dunning, CTO
> >> DeepDyve
> >
> > --------------------------
> > Grant Ingersoll
> > http://www.lucidimagination.com/
> >
> > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> > http://www.lucidimagination.com/search
> >
> >
>

Re: 0.2

Posted by David Hall <dl...@cs.berkeley.edu>.
2009/9/28 Grant Ingersoll <gs...@apache.org>:
>
> On Sep 28, 2009, at 2:16 PM, Ted Dunning wrote:
>
>> Many of these are actually nearly (completely) done.
>>
>> Is there a goal for the 0.2 release other than fixing outstanding issues?
>
> I'd like to see some of the performance issues around SparseVector taken care of.  I think we also said we wanted to get the Random Forest and Bayes stuff in that Robin and Deneche are working on.
>
> Beyond that, I plan on doing some profiling of the LDA stuff.  I'd say we are pretty close.

>From my memory, using hprof, it looks like most of the time is spent doing math.

(I haven't had a chance to try out YourKit, though.)

-- David

>
>
>>
>> On Mon, Sep 28, 2009 at 10:53 AM, Grant Ingersoll <gs...@apache.org>wrote:
>>
>>> Not too many open at this point:
>>> https://issues.apache.org/jira/secure/BrowseVersion.jspa?id=12310751&versionId=12313278&showOpenIssuesOnly=true
>>>
>>> Some are relatively minor, others are ready, but just need a final review.
>>> Can we push towards mid-October for a release?  Anyone volunteer to be the
>>> release mgr?
>>>
>>> -Grant
>>>
>>
>>
>>
>> --
>> Ted Dunning, CTO
>> DeepDyve
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
> http://www.lucidimagination.com/search
>
>

Re: 0.2

Posted by Isabel Drost <is...@apache.org>.
On Monday 28 September 2009 20:27:11 Grant Ingersoll wrote:
> Beyond that, I plan on doing some profiling of the LDA stuff.  I'd say
> we are pretty close.

+1

Would like to brush up our web page in the coming weeks - but this can be done 
after the release as well: I would like to add some links into the wiki to 
the Getting started stuff, add some more detailed information on svn access, 
restructure some menus and the like.

Isabel

-- 
  |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
  /,`.-'`'    -.  ;-;;,_  
 |,4-  ) )-,_..;\ (  `'-' 
'---''(_/--'  `-'\_) (fL)  IM:  <xm...@spaceboyz.net>


Re: 0.2

Posted by Grant Ingersoll <gs...@apache.org>.
On Sep 28, 2009, at 2:16 PM, Ted Dunning wrote:

> Many of these are actually nearly (completely) done.
>
> Is there a goal for the 0.2 release other than fixing outstanding  
> issues?

I'd like to see some of the performance issues around SparseVector  
taken care of.  I think we also said we wanted to get the Random  
Forest and Bayes stuff in that Robin and Deneche are working on.

Beyond that, I plan on doing some profiling of the LDA stuff.  I'd say  
we are pretty close.


>
> On Mon, Sep 28, 2009 at 10:53 AM, Grant Ingersoll  
> <gs...@apache.org>wrote:
>
>> Not too many open at this point:
>> https://issues.apache.org/jira/secure/BrowseVersion.jspa?id=12310751&versionId=12313278&showOpenIssuesOnly=true
>>
>> Some are relatively minor, others are ready, but just need a final  
>> review.
>> Can we push towards mid-October for a release?  Anyone volunteer to  
>> be the
>> release mgr?
>>
>> -Grant
>>
>
>
>
> -- 
> Ted Dunning, CTO
> DeepDyve

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


Re: 0.2

Posted by Ted Dunning <te...@gmail.com>.
Many of these are actually nearly (completely) done.

Is there a goal for the 0.2 release other than fixing outstanding issues?

On Mon, Sep 28, 2009 at 10:53 AM, Grant Ingersoll <gs...@apache.org>wrote:

> Not too many open at this point:
> https://issues.apache.org/jira/secure/BrowseVersion.jspa?id=12310751&versionId=12313278&showOpenIssuesOnly=true
>
> Some are relatively minor, others are ready, but just need a final review.
>  Can we push towards mid-October for a release?  Anyone volunteer to be the
> release mgr?
>
> -Grant
>



-- 
Ted Dunning, CTO
DeepDyve

Re: 0.2

Posted by Sean Owen <sr...@gmail.com>.
Sure. We've gone a while without a release though. I only suggest
that, at this point, anything that's not done is something that should
be finished in 0.3, including said speedups. How about that? recall
the downside to *not* releasing all these improvements we've been
sitting on for even another 2 weeks.

On Tue, Sep 29, 2009 at 2:08 AM, Grant Ingersoll <gs...@apache.org> wrote:
> Well, we should go through and evaluate what is open and whether it really
> should be in 0.2, instead of just fixing a date and cutting things off.
>  There are a few open items that I think need to be in 0.2, most importantly
> the SparseVector speedups.

Re: 0.2

Posted by Grant Ingersoll <gs...@apache.org>.
On Oct 12, 2009, at 8:52 PM, Jake Mannix wrote:

> On Mon, Oct 12, 2009 at 5:37 PM, Grant Ingersoll  
> <gs...@apache.org>wrote:
>
>>
>> On Oct 12, 2009, at 6:18 PM, Jake Mannix wrote:
>>
>> Yeah, I'm suggesting that any discussion about Colt/cMath/etc be  
>> for 0.3,
>>> not now.  The changes in M-165 don't require any library changes -  
>>> they're
>>> all internal to Mahout's vector impls.
>>>
>>
>> Yeah, except Shashi says it doesn't perform.
>
>
> Was the problem with Ted's "impossible confusion" fixed?  If not,  
> that could
> be
> the cause of performance problems, in that it may keep looking for  
> things
> that it
> thinks should be in the OpenMap, but aren't found...

I don't think it has.

Re: 0.2

Posted by Jake Mannix <ja...@gmail.com>.
On Mon, Oct 12, 2009 at 5:37 PM, Grant Ingersoll <gs...@apache.org>wrote:

>
> On Oct 12, 2009, at 6:18 PM, Jake Mannix wrote:
>
>  Yeah, I'm suggesting that any discussion about Colt/cMath/etc be for 0.3,
>> not now.  The changes in M-165 don't require any library changes - they're
>> all internal to Mahout's vector impls.
>>
>
> Yeah, except Shashi says it doesn't perform.


Was the problem with Ted's "impossible confusion" fixed?  If not, that could
be
the cause of performance problems, in that it may keep looking for things
that it
thinks should be in the OpenMap, but aren't found...

I don't think I can even get the current patch to apply at all, actually.  I
tried checking
out a couple of revisions ago,  but I couldn't get your 10/1 patch to apply
cleanly to
any of them.

  -jake

>
>
>
>>  -jake
>>
>> On Mon, Oct 12, 2009 at 3:09 PM, Sean Owen <sr...@gmail.com> wrote:
>>
>>  I don't have a strong view on Colt vs anything else. The only thing
>>> that would concern me here would be to let this block 0.2, if it's not
>>> even fully clear what the change will be, or implemented or tested.
>>> This is months off at this rate? Without a clear picture that this is
>>> getting wrapped up in a week, I'd strongly push the modest suggestion
>>> that it simply not be part of 0.2. Absolutely not saying it shouldn't
>>> be done. Not even saying it should be done soon -- I think 0.3 should
>>> follow soon and in general we should release more often.
>>>
>>> We're another week on in the discussion about releasing 0.2. Two folks
>>> seem ready to go. May I ask again what it seems 0.2 can't be released
>>> without? Having put a load of changes I'm keen to get into the wild
>>> myself, I'm aware of the drawbacks to letting this drag on a while. I
>>> really feel like people have "1.0" in mind when they say "0.2". This
>>> definitely doesn't need to be perfect, just roughly stable and a
>>> significant iteration over 0.1, and it is.
>>>
>>> Could I ask anyone that really wants this issue to be in 0.2 to at
>>> least name a deadline and create a plan to make it happen? seems like
>>> a reasonable request now. Otherwise it's 0.3.
>>>
>>> On Mon, Oct 12, 2009 at 9:43 PM, Grant Ingersoll <gs...@apache.org>
>>> wrote:
>>>
>>>> I think 165 needs to be in this release, it is a pretty big performance
>>>> issue.  I'm leaning towards the Colt stuff at the moment.  Perhaps in
>>>>
>>> 0.3,
>>>
>>>> we can refocus on how we want to attack the matrix stuff.
>>>>
>>>>
>>>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
>
>

Re: 0.2

Posted by Jeff Eastman <jd...@windwardsolutions.com>.
I'm inclined towards Sean's perspective. Making the kinds of significant 
changes to the vector implementation that 165 entails strike me as 
non-trivial and likely to delay 0.2 significantly. I vote to not include 
it in this point release so that the functionality which is ready to go 
public can get released. What we have now seems to work adequately even 
if it does not scale as well as we can imagine it should. Support for 
100k cardinality sparse vectors would be a fine focus point for 0.3 and 
I'm willing to help make it happen.

Jeff



Grant Ingersoll wrote:
>
> On Oct 12, 2009, at 6:18 PM, Jake Mannix wrote:
>
>> Yeah, I'm suggesting that any discussion about Colt/cMath/etc be for 
>> 0.3,
>> not now.  The changes in M-165 don't require any library changes - 
>> they're
>> all internal to Mahout's vector impls.
>
> Yeah, except Shashi says it doesn't perform.
>
>>
>>  -jake
>>
>> On Mon, Oct 12, 2009 at 3:09 PM, Sean Owen <sr...@gmail.com> wrote:
>>
>>> I don't have a strong view on Colt vs anything else. The only thing
>>> that would concern me here would be to let this block 0.2, if it's not
>>> even fully clear what the change will be, or implemented or tested.
>>> This is months off at this rate? Without a clear picture that this is
>>> getting wrapped up in a week, I'd strongly push the modest suggestion
>>> that it simply not be part of 0.2. Absolutely not saying it shouldn't
>>> be done. Not even saying it should be done soon -- I think 0.3 should
>>> follow soon and in general we should release more often.
>>>
>>> We're another week on in the discussion about releasing 0.2. Two folks
>>> seem ready to go. May I ask again what it seems 0.2 can't be released
>>> without? Having put a load of changes I'm keen to get into the wild
>>> myself, I'm aware of the drawbacks to letting this drag on a while. I
>>> really feel like people have "1.0" in mind when they say "0.2". This
>>> definitely doesn't need to be perfect, just roughly stable and a
>>> significant iteration over 0.1, and it is.
>>>
>>> Could I ask anyone that really wants this issue to be in 0.2 to at
>>> least name a deadline and create a plan to make it happen? seems like
>>> a reasonable request now. Otherwise it's 0.3.
>>>
>>> On Mon, Oct 12, 2009 at 9:43 PM, Grant Ingersoll <gs...@apache.org>
>>> wrote:
>>>> I think 165 needs to be in this release, it is a pretty big 
>>>> performance
>>>> issue.  I'm leaning towards the Colt stuff at the moment.  Perhaps in
>>> 0.3,
>>>> we can refocus on how we want to attack the matrix stuff.
>>>>
>>>
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) 
> using Solr/Lucene:
> http://www.lucidimagination.com/search
>
>
>


Re: 0.2

Posted by Grant Ingersoll <gs...@apache.org>.
On Oct 12, 2009, at 6:18 PM, Jake Mannix wrote:

> Yeah, I'm suggesting that any discussion about Colt/cMath/etc be for  
> 0.3,
> not now.  The changes in M-165 don't require any library changes -  
> they're
> all internal to Mahout's vector impls.

Yeah, except Shashi says it doesn't perform.

>
>  -jake
>
> On Mon, Oct 12, 2009 at 3:09 PM, Sean Owen <sr...@gmail.com> wrote:
>
>> I don't have a strong view on Colt vs anything else. The only thing
>> that would concern me here would be to let this block 0.2, if it's  
>> not
>> even fully clear what the change will be, or implemented or tested.
>> This is months off at this rate? Without a clear picture that this is
>> getting wrapped up in a week, I'd strongly push the modest suggestion
>> that it simply not be part of 0.2. Absolutely not saying it shouldn't
>> be done. Not even saying it should be done soon -- I think 0.3 should
>> follow soon and in general we should release more often.
>>
>> We're another week on in the discussion about releasing 0.2. Two  
>> folks
>> seem ready to go. May I ask again what it seems 0.2 can't be released
>> without? Having put a load of changes I'm keen to get into the wild
>> myself, I'm aware of the drawbacks to letting this drag on a while. I
>> really feel like people have "1.0" in mind when they say "0.2". This
>> definitely doesn't need to be perfect, just roughly stable and a
>> significant iteration over 0.1, and it is.
>>
>> Could I ask anyone that really wants this issue to be in 0.2 to at
>> least name a deadline and create a plan to make it happen? seems like
>> a reasonable request now. Otherwise it's 0.3.
>>
>> On Mon, Oct 12, 2009 at 9:43 PM, Grant Ingersoll  
>> <gs...@apache.org>
>> wrote:
>>> I think 165 needs to be in this release, it is a pretty big  
>>> performance
>>> issue.  I'm leaning towards the Colt stuff at the moment.  Perhaps  
>>> in
>> 0.3,
>>> we can refocus on how we want to attack the matrix stuff.
>>>
>>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


Re: 0.2

Posted by Jake Mannix <ja...@gmail.com>.
Yeah, I'm suggesting that any discussion about Colt/cMath/etc be for 0.3,
not now.  The changes in M-165 don't require any library changes - they're
all internal to Mahout's vector impls.

  -jake

On Mon, Oct 12, 2009 at 3:09 PM, Sean Owen <sr...@gmail.com> wrote:

> I don't have a strong view on Colt vs anything else. The only thing
> that would concern me here would be to let this block 0.2, if it's not
> even fully clear what the change will be, or implemented or tested.
> This is months off at this rate? Without a clear picture that this is
> getting wrapped up in a week, I'd strongly push the modest suggestion
> that it simply not be part of 0.2. Absolutely not saying it shouldn't
> be done. Not even saying it should be done soon -- I think 0.3 should
> follow soon and in general we should release more often.
>
> We're another week on in the discussion about releasing 0.2. Two folks
> seem ready to go. May I ask again what it seems 0.2 can't be released
> without? Having put a load of changes I'm keen to get into the wild
> myself, I'm aware of the drawbacks to letting this drag on a while. I
> really feel like people have "1.0" in mind when they say "0.2". This
> definitely doesn't need to be perfect, just roughly stable and a
> significant iteration over 0.1, and it is.
>
> Could I ask anyone that really wants this issue to be in 0.2 to at
> least name a deadline and create a plan to make it happen? seems like
> a reasonable request now. Otherwise it's 0.3.
>
> On Mon, Oct 12, 2009 at 9:43 PM, Grant Ingersoll <gs...@apache.org>
> wrote:
> > I think 165 needs to be in this release, it is a pretty big performance
> > issue.  I'm leaning towards the Colt stuff at the moment.  Perhaps in
> 0.3,
> > we can refocus on how we want to attack the matrix stuff.
> >
>

Re: 0.2

Posted by Sean Owen <sr...@gmail.com>.
I don't have a strong view on Colt vs anything else. The only thing
that would concern me here would be to let this block 0.2, if it's not
even fully clear what the change will be, or implemented or tested.
This is months off at this rate? Without a clear picture that this is
getting wrapped up in a week, I'd strongly push the modest suggestion
that it simply not be part of 0.2. Absolutely not saying it shouldn't
be done. Not even saying it should be done soon -- I think 0.3 should
follow soon and in general we should release more often.

We're another week on in the discussion about releasing 0.2. Two folks
seem ready to go. May I ask again what it seems 0.2 can't be released
without? Having put a load of changes I'm keen to get into the wild
myself, I'm aware of the drawbacks to letting this drag on a while. I
really feel like people have "1.0" in mind when they say "0.2". This
definitely doesn't need to be perfect, just roughly stable and a
significant iteration over 0.1, and it is.

Could I ask anyone that really wants this issue to be in 0.2 to at
least name a deadline and create a plan to make it happen? seems like
a reasonable request now. Otherwise it's 0.3.

On Mon, Oct 12, 2009 at 9:43 PM, Grant Ingersoll <gs...@apache.org> wrote:
> I think 165 needs to be in this release, it is a pretty big performance
> issue.  I'm leaning towards the Colt stuff at the moment.  Perhaps in 0.3,
> we can refocus on how we want to attack the matrix stuff.
>

Re: 0.2

Posted by Jake Mannix <ja...@gmail.com>.
On Mon, Oct 12, 2009 at 1:43 PM, Grant Ingersoll <gs...@apache.org>wrote:
> I think 165 needs to be in this release, it is a pretty big performance
issue.  I'm leaning towards the Colt stuff at the moment.  Perhaps in 0.3,
we can refocus on > how we want to attack the matrix stuff.

Didn't Ted say that he thought Colt wasn't the best, he thought?

Ted said (in some other thread) :
> Regarding Colt, I don't think that Colt is even all that close to the
state
> of the art for linear algebra performance available in Java.  It used to
be
> the pinnacle, but various other libraries have substantially eclipsed it.
> Other libraries have the added benefit of being able to make use of Atlas
> or other platform specific implementations where available.  This can give
> outrageous performance.

I've got a copy of steal-able (unnacceptably licensed hep.aida.* code
removed) colt if we did want to try and do some performance comparisons
of colt vs commons-math vs [what are the other contenders now?] for 0.3.

What's the status of the patch on MAHOUT-165?  I brought it up to compiling
status a week or so ago, but I don't know if anyone's been looking at it
since...

  -jake

Re: 0.2

Posted by Grant Ingersoll <gs...@apache.org>.
I think 165 needs to be in this release, it is a pretty big  
performance issue.  I'm leaning towards the Colt stuff at the moment.   
Perhaps in 0.3, we can refocus on how we want to attack the matrix  
stuff.

On Oct 12, 2009, at 3:14 PM, Jake Mannix wrote:

> Has it been decided to push MAHOUT-165 out to 0.3?  It seems like the
> discussion around linear algebra primitives last left off that we'd  
> probably
> be moving to:
>
>  a) incorporate commons-math for our vectors and matrices,
>
>  b) provide a consistent API wrapper which deals with indexing via  
> Labels
>
> In light of this, MAHOUT-165 could go either 0.2 or 0.3, yes?  (ie,  
> if it's
> not
> in 0.2, then maybe it will be obviated by a) and b) above)
>
>  -jake
>
> On Mon, Oct 12, 2009 at 11:05 AM, Sean Owen <sr...@gmail.com> wrote:
>
>> I am ready too. Same question, what is left that must block 0.2 and  
>> what is
>> the ETA looking like?
>>
>> On Oct 12, 2009 6:07 PM, "Robin Anil" <ro...@gmail.com> wrote:
>>
>> Everything looks good from my side. I will work on the launcher and  
>> tidying
>> up Bayes classifier, the next couple of days. Any idea on a target  
>> date? If
>> there is time, I would like to spend those precious amazon credits to
>> register some performance numbers.
>> Robin
>>
>> On Tue, Oct 6, 2009 at 5:53 PM, Isabel Drost <is...@apache.org>  
>> wrote: >
>> On
>> Tue, 6 Oct 2009 17:36...
>>



Re: 0.2

Posted by Jake Mannix <ja...@gmail.com>.
Has it been decided to push MAHOUT-165 out to 0.3?  It seems like the
discussion around linear algebra primitives last left off that we'd probably
be moving to:

  a) incorporate commons-math for our vectors and matrices,

  b) provide a consistent API wrapper which deals with indexing via Labels

In light of this, MAHOUT-165 could go either 0.2 or 0.3, yes?  (ie, if it's
not
in 0.2, then maybe it will be obviated by a) and b) above)

  -jake

On Mon, Oct 12, 2009 at 11:05 AM, Sean Owen <sr...@gmail.com> wrote:

> I am ready too. Same question, what is left that must block 0.2 and what is
> the ETA looking like?
>
> On Oct 12, 2009 6:07 PM, "Robin Anil" <ro...@gmail.com> wrote:
>
> Everything looks good from my side. I will work on the launcher and tidying
> up Bayes classifier, the next couple of days. Any idea on a target date? If
> there is time, I would like to spend those precious amazon credits to
> register some performance numbers.
> Robin
>
> On Tue, Oct 6, 2009 at 5:53 PM, Isabel Drost <is...@apache.org> wrote: >
> On
> Tue, 6 Oct 2009 17:36...
>

Re: 0.2

Posted by Grant Ingersoll <gs...@apache.org>.
http://cwiki.apache.org/MAHOUT/how-to-release.html


On Oct 16, 2009, at 1:04 AM, Sean Owen wrote:

> I suppose I have volunteered for the release. What does it entail,
> making the release? I don't knowledge of this.
>
> ... or MAHOUT-114 or what it means to sign these jars?
>
> If info is available I can try to figure these out.
>
> On Thu, Oct 15, 2009 at 10:19 AM, Grant Ingersoll  
> <gs...@apache.org> wrote:
>> OK.  The Sparse vector improvements we have now are already a lot  
>> faster
>> than what was in 0.1, so that is good.  I'd suggest that whoever is  
>> the
>> Release Mgr. for this release takes care of the signing stuff.   
>> I'll look at
>> the Label (LLR) stuff by Monday.
>>



Re: 0.2

Posted by Sean Owen <sr...@gmail.com>.
I suppose I have volunteered for the release. What does it entail,
making the release? I don't knowledge of this.

... or MAHOUT-114 or what it means to sign these jars?

If info is available I can try to figure these out.

On Thu, Oct 15, 2009 at 10:19 AM, Grant Ingersoll <gs...@apache.org> wrote:
> OK.  The Sparse vector improvements we have now are already a lot faster
> than what was in 0.1, so that is good.  I'd suggest that whoever is the
> Release Mgr. for this release takes care of the signing stuff.  I'll look at
> the Label (LLR) stuff by Monday.
>

Re: 0.2

Posted by Grant Ingersoll <gs...@apache.org>.
OK.  The Sparse vector improvements we have now are already a lot  
faster than what was in 0.1, so that is good.  I'd suggest that  
whoever is the Release Mgr. for this release takes care of the signing  
stuff.  I'll look at the Label (LLR) stuff by Monday.

On Oct 15, 2009, at 1:02 PM, Jeff Eastman wrote:

> I'd vote to delay 165 for 0.3 but do it in trunk asap after 0.2 so  
> folks can get their hands on it.
>
> Sean Owen wrote:
>> It still sounds somewhat significant to me. Either it's rushed or
>> takes a while and both seem negative.
>>
> +1 This is why
>> I think it is vital, at least, to put a schedule on this, or else we
>> are basically saying 0.2 is to not be released indefinitely, and
>> that's no good. Last time we said we'd finish up and release this was
>> 2 weeks ago, and there hasn't been progress on this issue.
>>
>> I'm starting to feel strongly enough to call for a vote?
>>
>> On Thu, Oct 15, 2009 at 6:47 AM, Grant Ingersoll  
>> <gs...@apache.org> wrote:
>>
>>> I don't think it is that big.  We can likely just make another
>>> implementation of Vector.  We don't have to convert everything to  
>>> Colt.
>>>
>>
>>
>>
>



Re: 0.2

Posted by Jeff Eastman <jd...@windwardsolutions.com>.
I'd vote to delay 165 for 0.3 but do it in trunk asap after 0.2 so folks 
can get their hands on it.

Sean Owen wrote:
> It still sounds somewhat significant to me. Either it's rushed or
> takes a while and both seem negative.
>   
+1 This is why
> I think it is vital, at least, to put a schedule on this, or else we
> are basically saying 0.2 is to not be released indefinitely, and
> that's no good. Last time we said we'd finish up and release this was
> 2 weeks ago, and there hasn't been progress on this issue.
>
> I'm starting to feel strongly enough to call for a vote?
>
> On Thu, Oct 15, 2009 at 6:47 AM, Grant Ingersoll <gs...@apache.org> wrote:
>   
>> I don't think it is that big.  We can likely just make another
>> implementation of Vector.  We don't have to convert everything to Colt.
>>     
>
>
>   


Re: 0.2

Posted by Sean Owen <sr...@gmail.com>.
It still sounds somewhat significant to me. Either it's rushed or
takes a while and both seem negative.

I think it is vital, at least, to put a schedule on this, or else we
are basically saying 0.2 is to not be released indefinitely, and
that's no good. Last time we said we'd finish up and release this was
2 weeks ago, and there hasn't been progress on this issue.

I'm starting to feel strongly enough to call for a vote?

On Thu, Oct 15, 2009 at 6:47 AM, Grant Ingersoll <gs...@apache.org> wrote:
> I don't think it is that big.  We can likely just make another
> implementation of Vector.  We don't have to convert everything to Colt.

Re: 0.2

Posted by Jake Mannix <ja...@gmail.com>.
On Thu, Oct 15, 2009 at 6:47 AM, Grant Ingersoll <gs...@apache.org>wrote:

>
> On Oct 15, 2009, at 8:22 AM, Sean Owen wrote:
>
>  On Thu, Oct 15, 2009 at 4:57 AM, Grant Ingersoll <gs...@apache.org>
>> wrote:
>>
>>>       MAHOUT-165      Using better primitives hash for sparse vector for
>>>> performance gains                Open    14/Oct/09
>>>>
>>>> Per discussion, move the remainder (migration to Colt or something) to
>>>> 0.3
>>>>
>>>
>>> I will try to get to this, as I think it is important.
>>>
>>
>> I agree with Jeff that the migration to a new framework is a big
>> change and should be left to 0.3. (Vote?) There is a whole lot of
>> change already, more than might normally go into a point release.
>> Since you have another blocker below, and limited time, I say don't
>> kill yourself to work on this. It's going to be hard to get it done in
>> a weekend.
>>
>>
>
> I don't think it is that big.  We can likely just make another
> implementation of Vector.  We don't have to convert everything to Colt.
>

Ted's patch (since monkeyed with my you and myself) has the other
implementation of Vector, but testing showed it's slower?  This patch also
had  a significant refactoring of the Vector hierarchy so it's not just "a
new class".

I'm all for getting this in as soon as we can, because this issue (well,
finalizing on a linear api) pretty much blocks my donating decomposer to
Mahout, but it looks like you're the only one who feels strongly about
resolving M-165 for 0.2, Grant.

Can we not just have 0.3 in another 6-8 weeks or so which covers this?  What
Mahout user is getting blocked by having too-slow sparse vectors currently?

  -jake

Re: 0.2

Posted by Grant Ingersoll <gs...@apache.org>.
On Oct 15, 2009, at 8:22 AM, Sean Owen wrote:

> On Thu, Oct 15, 2009 at 4:57 AM, Grant Ingersoll  
> <gs...@apache.org> wrote:
>>>        MAHOUT-165      Using better primitives hash for sparse  
>>> vector for
>>> performance gains                Open    14/Oct/09
>>>
>>> Per discussion, move the remainder (migration to Colt or  
>>> something) to 0.3
>>
>> I will try to get to this, as I think it is important.
>
> I agree with Jeff that the migration to a new framework is a big
> change and should be left to 0.3. (Vote?) There is a whole lot of
> change already, more than might normally go into a point release.
> Since you have another blocker below, and limited time, I say don't
> kill yourself to work on this. It's going to be hard to get it done in
> a weekend.
>


I don't think it is that big.  We can likely just make another  
implementation of Vector.  We don't have to convert everything to Colt.

>
>>>        MAHOUT-114      Release Process Needs to sign published
>>> dependencies such
>>> as Hadoop, etc.          Open    06/Apr/09
>>>
>>> Not clear on status here, mark as 0.3?
>>
>> This is a blocker for 0.2 and thus must be completed.  That being  
>> said, I
>> think Hadoop is now publishing to the Maven repo, so we may be able  
>> to stop
>> our own publishing of Hadoop.
>>
>>



Re: 0.2

Posted by Sean Owen <sr...@gmail.com>.
On Thu, Oct 15, 2009 at 4:57 AM, Grant Ingersoll <gs...@apache.org> wrote:
>>        MAHOUT-165      Using better primitives hash for sparse vector for
>> performance gains                Open    14/Oct/09
>>
>> Per discussion, move the remainder (migration to Colt or something) to 0.3
>
> I will try to get to this, as I think it is important.

I agree with Jeff that the migration to a new framework is a big
change and should be left to 0.3. (Vote?) There is a whole lot of
change already, more than might normally go into a point release.
Since you have another blocker below, and limited time, I say don't
kill yourself to work on this. It's going to be hard to get it done in
a weekend.


>>        MAHOUT-114      Release Process Needs to sign published
>> dependencies such
>> as Hadoop, etc.          Open    06/Apr/09
>>
>> Not clear on status here, mark as 0.3?
>
> This is a blocker for 0.2 and thus must be completed.  That being said, I
> think Hadoop is now publishing to the Maven repo, so we may be able to stop
> our own publishing of Hadoop.
>
>

Re: 0.2

Posted by Grant Ingersoll <gs...@apache.org>.
On Oct 15, 2009, at 7:21 AM, Sean Owen wrote:

> Here's what is marked 0.2 plus suggested actions. I am basically
> suggesting the things that are 'pretty ready' be submitted and
> published -- if they're 85% done, definitely good enough for an 0.2
> release, and worth getting them play-tested. (Or else, decide they
> need another month or two, and mark for 0.3) And then that takes care
> of just about everything for 0.2
>
>
> 	MAHOUT-163	Get (better) cluster labels using Log Likelihood Ratio		
> Open	 17/Sep/09
>
> No recent action here, but seemed ready enough to submit as of last
> patch. Do so or mark 0.3?

I will make sure to get this one in before the release.

>
> 	MAHOUT-171	Move deployment to repository.apache.org		 Open	 02/Oct/09
>
> Seems ready to submit?

+1. It would be great to have Maven snapshots available for nightly  
builds.

>
> 	MAHOUT-185	Add mahout shell script for easy launching of various
> algorithms		 Open	 06/Oct/09
>
> Very new, sounds like something for 0.3

This is mostly for convenience.  Would be nice to have in 0.2, but not  
a show stopper.

>
> 	MAHOUT-170	Enable Java compile optimize flag during build		 Open	  
> 07/Oct/09
>
> Go ahead and submit? the original change seemed quite uncontroversial.
> Robin suggested a further change. Either submit or mark 0.3
>
> 	MAHOUT-186	Classifier PriorityQueue returns erroneous results		
> Patch Available	 08/Oct/09
>
> Two patches available. I would like my patch for this issue to get
> some feedback -- would prefer it be submitted or some even better
> hybrid of it and the first patch.
>
> 	MAHOUT-148	Convert Classification Algs to use richer Writable
> syntax		 Patch Available	 10/Oct/09
>
> Ready to submit?
>
> 	MAHOUT-157	Frequent Pattern Mining using Parallel FP-Growth		 Patch
> Available	 13/Oct/09
>
> Seems like still work in progress. If it's 'good enough', submit and
> continue iterating. Or mark 0.3
>
> 	MAHOUT-165	Using better primitives hash for sparse vector for
> performance gains		 Open	 14/Oct/09
>
> Per discussion, move the remainder (migration to Colt or something)  
> to 0.3

I will try to get to this, as I think it is important.

>
> 	MAHOUT-106	PLSI/EM in pig based on hofmann's ACM 04 paper.		 Patch
> Available	 27/Aug/09
>
> This looks like something better tagged as 'unknown version'; don't
> understand the status


I had hoped to do this, but let's move it to 0.3

>
> 	MAHOUT-114	Release Process Needs to sign published dependencies such
> as Hadoop, etc.		 Open	 06/Apr/09
>
> Not clear on status here, mark as 0.3?

This is a blocker for 0.2 and thus must be completed.  That being  
said, I think Hadoop is now publishing to the Maven repo, so we may be  
able to stop our own publishing of Hadoop.


Re: 0.2

Posted by Grant Ingersoll <gs...@apache.org>.
On Oct 19, 2009, at 12:14 PM, Sean Owen wrote:

> Almost there. MAHOUT-114 is OK as far as I can tell but need to
> verify, during the actual release, it does push out signatures.
>
> That leaves...
>
> On Fri, Oct 16, 2009 at 2:26 PM, Sean Owen <sr...@gmail.com> wrote:
>> MAHOUT-163 Grant
>
> Grant you wanted to work on this? looking close or want to push it  
> out?

I'm going to push it.  It's too Lucene specific at this point.  I will  
mark it as such.

Re: 0.2

Posted by Sean Owen <sr...@gmail.com>.
Almost there. MAHOUT-114 is OK as far as I can tell but need to
verify, during the actual release, it does push out signatures.

That leaves...

On Fri, Oct 16, 2009 at 2:26 PM, Sean Owen <sr...@gmail.com> wrote:
> MAHOUT-163 Grant

Grant you wanted to work on this? looking close or want to push it out?

> MAHOUT-171 Isabel

I think this is ready to submit? by all means

Re: 0.2

Posted by Sean Owen <sr...@gmail.com>.
I'm targeting Monday to begin releasing 0.2. The remaining issues are
MAHOUT-114 which I just followed up on, and the following, which
largely look like they are ready to submit?

MAHOUT-163 Grant
MAHOUT-171 Isabel
MAHOUT-186 Robin
MAHOUT-148 Robin
MAHOUT-157 Robin
MAHOUT-170 Robin

I think you're clear to submit what you have, and, continue afterwards
as needed. Otherwise I'll mark 'em 0.3 on Monday.

On Thu, Oct 15, 2009 at 4:21 AM, Sean Owen <sr...@gmail.com> wrote:
> Here's what is marked 0.2 plus suggested actions. I am basically
> suggesting the things that are 'pretty ready' be submitted and
> published -- if they're 85% done, definitely good enough for an 0.2
> release, and worth getting them play-tested. (Or else, decide they
> need another month or two, and mark for 0.3) And then that takes care
> of just about everything for 0.2

Re: 0.2

Posted by Sean Owen <sr...@gmail.com>.
Here's what is marked 0.2 plus suggested actions. I am basically
suggesting the things that are 'pretty ready' be submitted and
published -- if they're 85% done, definitely good enough for an 0.2
release, and worth getting them play-tested. (Or else, decide they
need another month or two, and mark for 0.3) And then that takes care
of just about everything for 0.2


	MAHOUT-163	Get (better) cluster labels using Log Likelihood Ratio		
Open	 17/Sep/09

No recent action here, but seemed ready enough to submit as of last
patch. Do so or mark 0.3?

 	MAHOUT-171	Move deployment to repository.apache.org		 Open	 02/Oct/09

Seems ready to submit?

 	MAHOUT-185	Add mahout shell script for easy launching of various
algorithms		 Open	 06/Oct/09

Very new, sounds like something for 0.3

 	MAHOUT-170	Enable Java compile optimize flag during build		 Open	 07/Oct/09

Go ahead and submit? the original change seemed quite uncontroversial.
Robin suggested a further change. Either submit or mark 0.3

 	MAHOUT-186	Classifier PriorityQueue returns erroneous results		
Patch Available	 08/Oct/09

Two patches available. I would like my patch for this issue to get
some feedback -- would prefer it be submitted or some even better
hybrid of it and the first patch.

 	MAHOUT-148	Convert Classification Algs to use richer Writable
syntax		 Patch Available	 10/Oct/09

Ready to submit?

 	MAHOUT-157	Frequent Pattern Mining using Parallel FP-Growth		 Patch
Available	 13/Oct/09

Seems like still work in progress. If it's 'good enough', submit and
continue iterating. Or mark 0.3

 	MAHOUT-165	Using better primitives hash for sparse vector for
performance gains		 Open	 14/Oct/09

Per discussion, move the remainder (migration to Colt or something) to 0.3

 	MAHOUT-106	PLSI/EM in pig based on hofmann's ACM 04 paper.		 Patch
Available	 27/Aug/09

This looks like something better tagged as 'unknown version'; don't
understand the status

 	MAHOUT-114	Release Process Needs to sign published dependencies such
as Hadoop, etc.		 Open	 06/Apr/09

Not clear on status here, mark as 0.3?

On Mon, Oct 12, 2009 at 11:05 AM, Sean Owen <sr...@gmail.com> wrote:
> I am ready too. Same question, what is left that must block 0.2 and what is
> the ETA looking like?
>
> On Oct 12, 2009 6:07 PM, "Robin Anil" <ro...@gmail.com> wrote:
>
> Everything looks good from my side. I will work on the launcher and tidying
> up Bayes classifier, the next couple of days. Any idea on a target date? If
> there is time, I would like to spend those precious amazon credits to
> register some performance numbers.
> Robin
>
> On Tue, Oct 6, 2009 at 5:53 PM, Isabel Drost <is...@apache.org> wrote: > On
> Tue, 6 Oct 2009 17:36...

Re: 0.2

Posted by Sean Owen <sr...@gmail.com>.
I am ready too. Same question, what is left that must block 0.2 and what is
the ETA looking like?

On Oct 12, 2009 6:07 PM, "Robin Anil" <ro...@gmail.com> wrote:

Everything looks good from my side. I will work on the launcher and tidying
up Bayes classifier, the next couple of days. Any idea on a target date? If
there is time, I would like to spend those precious amazon credits to
register some performance numbers.
Robin

On Tue, Oct 6, 2009 at 5:53 PM, Isabel Drost <is...@apache.org> wrote: > On
Tue, 6 Oct 2009 17:36...

Re: 0.2

Posted by Robin Anil <ro...@gmail.com>.
Everything looks good from my side. I will work on the launcher and tidying
up Bayes classifier, the next couple of days. Any idea on a target date? If
there is time, I would like to spend those precious amazon credits to
register some performance numbers.
Robin

On Tue, Oct 6, 2009 at 5:53 PM, Isabel Drost <is...@apache.org> wrote:

> On Tue, 6 Oct 2009 17:36:10 +0530
> Robin Anil <ro...@gmail.com> wrote:
>
> > On Tue, Oct 6, 2009 at 4:54 PM, Sean Owen <sr...@gmail.com> wrote:
> >
> > > I vote for setting them by default -- do we have a driver script in
> > > which we could set this? if so I say go for it.
> > >
> > > Wouldn't it be good, if we could group major launch points(main) of
> > various(atleast the stable ones) algorithms in to a shell script? I
> > don't think much work is needed for a bare bones one and could make
> > it to 0.2?
>
> If you do so, please also update the Getting-started documentation (+
> the pages on classification and clustering with Mahout) in the wiki.
>
> Cheers,
> Isabel
>

Re: 0.2

Posted by Isabel Drost <is...@apache.org>.
On Tue, 6 Oct 2009 17:36:10 +0530
Robin Anil <ro...@gmail.com> wrote:

> On Tue, Oct 6, 2009 at 4:54 PM, Sean Owen <sr...@gmail.com> wrote:
> 
> > I vote for setting them by default -- do we have a driver script in
> > which we could set this? if so I say go for it.
> >
> > Wouldn't it be good, if we could group major launch points(main) of
> various(atleast the stable ones) algorithms in to a shell script? I
> don't think much work is needed for a bare bones one and could make
> it to 0.2?

If you do so, please also update the Getting-started documentation (+
the pages on classification and clustering with Mahout) in the wiki.

Cheers,
Isabel

Re: 0.2

Posted by Robin Anil <ro...@gmail.com>.
On Tue, Oct 6, 2009 at 4:54 PM, Sean Owen <sr...@gmail.com> wrote:

> I vote for setting them by default -- do we have a driver script in
> which we could set this? if so I say go for it.
>
> Wouldn't it be good, if we could group major launch points(main) of
various(atleast the stable ones) algorithms in to a shell script? I don't
think much work is needed for a bare bones one and could make it to 0.2?

mahout classify -algorithm bayes [OPTIONS]
mahout cluster -algorithm canopy  [OPTIONS]
mahout fpm -algorithm pfpgrowth [OPTIONS]
mahout taste -algorithm slopeone [OPTIONS]
mahout misc -algorithm createVectors [OPTIONS]
mahout examples WikipediaExample

Re: 0.2

Posted by Sean Owen <sr...@gmail.com>.
I vote for setting them by default -- do we have a driver script in
which we could set this? if so I say go for it.

On Tue, Oct 6, 2009 at 12:21 PM, Robin Anil <ro...@gmail.com> wrote:
> I mean't the JVM(java) not javac . The question is should we enforce them as
> default values(As we deal processor intensive algorithms) or leave it to the
> user to try the same for better performance.

Re: 0.2

Posted by Robin Anil <ro...@gmail.com>.
I mean't the JVM(java) not javac . The question is should we enforce them as
default values(As we deal processor intensive algorithms) or leave it to the
user to try the same for better performance.


On Tue, Oct 6, 2009 at 4:28 PM, Sean Owen <sr...@gmail.com> wrote:

> I support adding the -O javac flag (which is what the original patch)
> does, as it doesn't hurt, even if it does little. But yeah like I said
> the in the issue comments, the other flags are not javac flags. They
> are java flags.
>
> So just to be clear, where are you wanting to put the flags?
>
> On Tue, Oct 6, 2009 at 10:28 AM, Robin Anil <ro...@gmail.com> wrote:
> > About
> > https://issues.apache.org/jira/browse/MAHOUT-170  Enable Java compile
> > optimize flag during buildAny reason why we shouldnt add the
> > -XX:+AggressiveOpts
> > -XX:+UseFastAccessorMethods flags.  to the JVM.
> >
> > Robin
> >
> >
>

Re: 0.2

Posted by Sean Owen <sr...@gmail.com>.
I support adding the -O javac flag (which is what the original patch)
does, as it doesn't hurt, even if it does little. But yeah like I said
the in the issue comments, the other flags are not javac flags. They
are java flags.

So just to be clear, where are you wanting to put the flags?

On Tue, Oct 6, 2009 at 10:28 AM, Robin Anil <ro...@gmail.com> wrote:
> About
> https://issues.apache.org/jira/browse/MAHOUT-170  Enable Java compile
> optimize flag during buildAny reason why we shouldnt add the
> -XX:+AggressiveOpts
> -XX:+UseFastAccessorMethods flags.  to the JVM.
>
> Robin
>
>

Re: 0.2

Posted by Robin Anil <ro...@gmail.com>.
About
https://issues.apache.org/jira/browse/MAHOUT-170  Enable Java compile
optimize flag during buildAny reason why we shouldnt add the
-XX:+AggressiveOpts
-XX:+UseFastAccessorMethods flags.  to the JVM.

Robin


On Tue, Oct 6, 2009 at 2:55 PM, Robin Anil <ro...@gmail.com> wrote:

> I am currently converting entire Bayes/CBayes to Writable + Writable
> Comparable StringTuple format, instead of using Text. So that takes care of
> MAHOUT-148
> I would  be taking up MAHOUT-157 to finish the parallel version, Once
> Bayes/Cbayes looks clean enough.
>
>
>
> On Tue, Oct 6, 2009 at 2:41 PM, Isabel Drost <is...@apache.org> wrote:
>
>> On Tue, 6 Oct 2009 09:18:38 +0100
>> Sean Owen <sr...@gmail.com> wrote:
>>
>> > How is everyone feeling about 0.2? it's a week later, some issues have
>> > been closed. If there hasn't been movement on an issue marked for 0.2
>> > in the last week, might it be a good time to consider moving it to
>> > 0.3? or else I guess I'm interested to hear a game plan on anything
>> > that hasn't been touched in a week, yet must be part of 0.2, keeping
>> > in mind the benefits of getting the large amount of work since 0.1 out
>> > to the public. Release early/often, especially when you're in 0.x
>> > versions.
>> >
>> > Concretely, let me propose we fix the two bugs open for 0.2, and mark
>> > the rest as 0.3?
>> >
>> > http://issues.apache.org/jira/browse/MAHOUT-181
>> > http://issues.apache.org/jira/browse/MAHOUT-114 (is this a 'bug'?)
>>
>> MAHOUT-157 Frequent Pattern Mining using Parallel FP-Growth
>>
>> I think this should go in. Robin has made great progress and put in an
>> updated patch late last week. I would love to thoroughly review said
>> patch, but currently am unable to find time to do so. From what I
>> looked at over the weekend, it does look good. Currently the status is:
>> It implements a very fast, highly optimized serial version of the
>> algorithm that I would love to see for 0.2. The parallel version than
>> can go into 0.3. Sean, could you please have a closer look at the code
>> to spot any problems that would block it from being committed?
>>
>>
>> MAHOUT-165 Using better primitives hash for sparse vector for
>> performance gains
>>
>> Judging from the comments, people are still working on it.
>>
>> MAHOUT-171 Move deployment to repository.apache.org
>>
>> I am fine if that is thrown out, yet the upcoming release would be a
>> nice chance to test the setup. I would suggest to set a timebox for
>> testing the changes - if it does not work out, move it on to 0.3
>>
>> MAHOUT-138 Convert main() methods to use Commons CLI
>>
>> As there are quite a few methods that need changes I am fine with
>> leaving the issue as is and moving it over to 0.3 until all is
>> converted.
>>
>> MAHOUT-54 parallelize k-means sharing the predominance of canopies
>>
>> Judging from the comments this can savely be moved to 0.3 or even
>> closed as won't fix.
>>
>> MAHOUT-78 HBase RowResult/BatchUpdate access via Mahout Vector
>> interface
>>
>> Judging from the comments this can safely be moved to 0.3 or even be
>> marked as won't fix.
>>
>> As for the other few issues, I cannot comment.
>>
>> Isabel
>>
>
>

Re: 0.2

Posted by Robin Anil <ro...@gmail.com>.
I am currently converting entire Bayes/CBayes to Writable + Writable
Comparable StringTuple format, instead of using Text. So that takes care of
MAHOUT-148
I would  be taking up MAHOUT-157 to finish the parallel version, Once
Bayes/Cbayes looks clean enough.


On Tue, Oct 6, 2009 at 2:41 PM, Isabel Drost <is...@apache.org> wrote:

> On Tue, 6 Oct 2009 09:18:38 +0100
> Sean Owen <sr...@gmail.com> wrote:
>
> > How is everyone feeling about 0.2? it's a week later, some issues have
> > been closed. If there hasn't been movement on an issue marked for 0.2
> > in the last week, might it be a good time to consider moving it to
> > 0.3? or else I guess I'm interested to hear a game plan on anything
> > that hasn't been touched in a week, yet must be part of 0.2, keeping
> > in mind the benefits of getting the large amount of work since 0.1 out
> > to the public. Release early/often, especially when you're in 0.x
> > versions.
> >
> > Concretely, let me propose we fix the two bugs open for 0.2, and mark
> > the rest as 0.3?
> >
> > http://issues.apache.org/jira/browse/MAHOUT-181
> > http://issues.apache.org/jira/browse/MAHOUT-114 (is this a 'bug'?)
>
> MAHOUT-157 Frequent Pattern Mining using Parallel FP-Growth
>
> I think this should go in. Robin has made great progress and put in an
> updated patch late last week. I would love to thoroughly review said
> patch, but currently am unable to find time to do so. From what I
> looked at over the weekend, it does look good. Currently the status is:
> It implements a very fast, highly optimized serial version of the
> algorithm that I would love to see for 0.2. The parallel version than
> can go into 0.3. Sean, could you please have a closer look at the code
> to spot any problems that would block it from being committed?
>
>
> MAHOUT-165 Using better primitives hash for sparse vector for
> performance gains
>
> Judging from the comments, people are still working on it.
>
> MAHOUT-171 Move deployment to repository.apache.org
>
> I am fine if that is thrown out, yet the upcoming release would be a
> nice chance to test the setup. I would suggest to set a timebox for
> testing the changes - if it does not work out, move it on to 0.3
>
> MAHOUT-138 Convert main() methods to use Commons CLI
>
> As there are quite a few methods that need changes I am fine with
> leaving the issue as is and moving it over to 0.3 until all is
> converted.
>
> MAHOUT-54 parallelize k-means sharing the predominance of canopies
>
> Judging from the comments this can savely be moved to 0.3 or even
> closed as won't fix.
>
> MAHOUT-78 HBase RowResult/BatchUpdate access via Mahout Vector
> interface
>
> Judging from the comments this can safely be moved to 0.3 or even be
> marked as won't fix.
>
> As for the other few issues, I cannot comment.
>
> Isabel
>

Re: 0.2

Posted by Isabel Drost <is...@apache.org>.
On Tue, 6 Oct 2009 09:18:38 +0100
Sean Owen <sr...@gmail.com> wrote:

> How is everyone feeling about 0.2? it's a week later, some issues have
> been closed. If there hasn't been movement on an issue marked for 0.2
> in the last week, might it be a good time to consider moving it to
> 0.3? or else I guess I'm interested to hear a game plan on anything
> that hasn't been touched in a week, yet must be part of 0.2, keeping
> in mind the benefits of getting the large amount of work since 0.1 out
> to the public. Release early/often, especially when you're in 0.x
> versions.
> 
> Concretely, let me propose we fix the two bugs open for 0.2, and mark
> the rest as 0.3?
> 
> http://issues.apache.org/jira/browse/MAHOUT-181
> http://issues.apache.org/jira/browse/MAHOUT-114 (is this a 'bug'?)

MAHOUT-157 Frequent Pattern Mining using Parallel FP-Growth

I think this should go in. Robin has made great progress and put in an
updated patch late last week. I would love to thoroughly review said
patch, but currently am unable to find time to do so. From what I
looked at over the weekend, it does look good. Currently the status is:
It implements a very fast, highly optimized serial version of the
algorithm that I would love to see for 0.2. The parallel version than
can go into 0.3. Sean, could you please have a closer look at the code
to spot any problems that would block it from being committed?


MAHOUT-165 Using better primitives hash for sparse vector for
performance gains 

Judging from the comments, people are still working on it.

MAHOUT-171 Move deployment to repository.apache.org

I am fine if that is thrown out, yet the upcoming release would be a
nice chance to test the setup. I would suggest to set a timebox for
testing the changes - if it does not work out, move it on to 0.3

MAHOUT-138 Convert main() methods to use Commons CLI

As there are quite a few methods that need changes I am fine with
leaving the issue as is and moving it over to 0.3 until all is
converted.

MAHOUT-54 parallelize k-means sharing the predominance of canopies

Judging from the comments this can savely be moved to 0.3 or even
closed as won't fix.

MAHOUT-78 HBase RowResult/BatchUpdate access via Mahout Vector
interface 

Judging from the comments this can safely be moved to 0.3 or even be
marked as won't fix.

As for the other few issues, I cannot comment.

Isabel

Re: 0.2

Posted by Sean Owen <sr...@gmail.com>.
How is everyone feeling about 0.2? it's a week later, some issues have
been closed. If there hasn't been movement on an issue marked for 0.2
in the last week, might it be a good time to consider moving it to
0.3? or else I guess I'm interested to hear a game plan on anything
that hasn't been touched in a week, yet must be part of 0.2, keeping
in mind the benefits of getting the large amount of work since 0.1 out
to the public. Release early/often, especially when you're in 0.x
versions.

Concretely, let me propose we fix the two bugs open for 0.2, and mark
the rest as 0.3?

http://issues.apache.org/jira/browse/MAHOUT-181
http://issues.apache.org/jira/browse/MAHOUT-114 (is this a 'bug'?)



On Tue, Sep 29, 2009 at 5:47 PM, Sean Owen <sr...@gmail.com> wrote:
> I'd like to understand how you guys imagine the schedule for 0.2 then.
> For example, you're suggesting 0.2 is blocked by MAHOUT-54, which has
> been on the books for 16 months. Is it going to happen along with the
> others in... 1 more month? In the meantime, from where I'm sitting, I
> see sitting on massive improvements to the recommender engine that I'm
> eager to put out.
>
> I'm curious what the 'ideal' frequency of release is. I think it's
> about 6 weeks at this stage. Even if one disagrees by a factor of 4,
> we're late on 0.2. When can we draw a line on 0.2? Honestly asking the
> question.
>

Re: 0.2

Posted by Grant Ingersoll <gs...@apache.org>.
On Sep 29, 2009, at 12:47 PM, Sean Owen wrote:

> I'd like to understand how you guys imagine the schedule for 0.2 then.
> For example, you're suggesting 0.2 is blocked by MAHOUT-54, which has
> been on the books for 16 months. Is it going to happen along with the
> others in... 1 more month? In the meantime, from where I'm sitting, I
> see sitting on massive improvements to the recommender engine that I'm
> eager to put out.

We can push out M-54.  I was just proposing that we take a look at the  
issues open and address what needs to be in 0.2, instead of picking  
some date and saying if it isn't done by then, it's out.  Not to say  
we should ignore dates completely, just that we as committers should  
just make the decision in the next day or two, then that can drive a  
date.


>
> I'm curious what the 'ideal' frequency of release is. I think it's
> about 6 weeks at this stage.

That sounds fine, release early, release often.  But since we are all  
volunteers so far, it is hard to drive a date based schedule, IMO.  I  
think it is easier to drive a feature based schedule.

Re: 0.2

Posted by Sean Owen <sr...@gmail.com>.
I'd like to understand how you guys imagine the schedule for 0.2 then.
For example, you're suggesting 0.2 is blocked by MAHOUT-54, which has
been on the books for 16 months. Is it going to happen along with the
others in... 1 more month? In the meantime, from where I'm sitting, I
see sitting on massive improvements to the recommender engine that I'm
eager to put out.

I'm curious what the 'ideal' frequency of release is. I think it's
about 6 weeks at this stage. Even if one disagrees by a factor of 4,
we're late on 0.2. When can we draw a line on 0.2? Honestly asking the
question.

On Tue, Sep 29, 2009 at 11:42 AM, Grant Ingersoll <gs...@apache.org> wrote:
> Instead of talking in theory:
> https://issues.apache.org/jira/secure/BrowseVersion.jspa?id=12310751&versionId=12313278&showOpenIssuesOnly=true
>
> I'd say the following can be pushed:
> MAHOUT-155
> M-78
> M-148
>
> I'd like to get M-106 in, but it could be pushed.  All the rest of the items
> I think are important to be in 0.2.
>
> Of course, this is just my opinion.  And others can pick up any of the above
> and get them done.
>
> -Grant
>
> On Sep 29, 2009, at 6:24 AM, Sean Owen wrote:
>
>> OK. Don't want to be pushy of course. I would suggest we simply
>> replace '0.2' with '0.3' in that sentence myself. (Or perhaps ask --
>> if we cut a release right now, what if anything would make you say
>> 'oops, that really should have been fixed...') I think we're not
>> releasing nearly often enough myself, and that has implications. But
>> without just about unanimous consent we can't push something out.
>>
>> On Tue, Sep 29, 2009 at 7:39 AM, Isabel Drost <is...@apache.org> wrote:
>>>
>>> On Mon, 28 Sep 2009 21:08:32 -0400
>>> Grant Ingersoll <gs...@apache.org> wrote:
>>>
>>>>> I can drive this and propose we publish 0.2 from code as of this
>>>>> Friday. Is that too soon to polish what's there?
>>>
>>> Sorry, I wanted to spend some time on some of the issues this weekend* -
>>> won't have time before Friday.
>>>
>>>
>>>> Well, we should go through and evaluate what is open and whether it
>>>> really should be in 0.2, instead of just fixing a date and cutting
>>>> things off.  There are a few open items that I think need to be in
>>>> 0.2, most importantly the SparseVector speedups.
>>>
>>> +1
>>>
>>> Isabel
>>>
>>> * http://wiki.upstream-berlin.com/index.php/DevHouseBerlin2
>
>

Re: 0.2

Posted by Grant Ingersoll <gs...@apache.org>.
Instead of talking in theory:
https://issues.apache.org/jira/secure/BrowseVersion.jspa?id=12310751&versionId=12313278&showOpenIssuesOnly=true

I'd say the following can be pushed:
MAHOUT-155
M-78
M-148

I'd like to get M-106 in, but it could be pushed.  All the rest of the  
items I think are important to be in 0.2.

Of course, this is just my opinion.  And others can pick up any of the  
above and get them done.

-Grant

On Sep 29, 2009, at 6:24 AM, Sean Owen wrote:

> OK. Don't want to be pushy of course. I would suggest we simply
> replace '0.2' with '0.3' in that sentence myself. (Or perhaps ask --
> if we cut a release right now, what if anything would make you say
> 'oops, that really should have been fixed...') I think we're not
> releasing nearly often enough myself, and that has implications. But
> without just about unanimous consent we can't push something out.
>
> On Tue, Sep 29, 2009 at 7:39 AM, Isabel Drost <is...@apache.org>  
> wrote:
>> On Mon, 28 Sep 2009 21:08:32 -0400
>> Grant Ingersoll <gs...@apache.org> wrote:
>>
>>>> I can drive this and propose we publish 0.2 from code as of this
>>>> Friday. Is that too soon to polish what's there?
>>
>> Sorry, I wanted to spend some time on some of the issues this  
>> weekend* -
>> won't have time before Friday.
>>
>>
>>> Well, we should go through and evaluate what is open and whether it
>>> really should be in 0.2, instead of just fixing a date and cutting
>>> things off.  There are a few open items that I think need to be in
>>> 0.2, most importantly the SparseVector speedups.
>>
>> +1
>>
>> Isabel
>>
>> * http://wiki.upstream-berlin.com/index.php/DevHouseBerlin2


Re: 0.2

Posted by Sean Owen <sr...@gmail.com>.
OK. Don't want to be pushy of course. I would suggest we simply
replace '0.2' with '0.3' in that sentence myself. (Or perhaps ask --
if we cut a release right now, what if anything would make you say
'oops, that really should have been fixed...') I think we're not
releasing nearly often enough myself, and that has implications. But
without just about unanimous consent we can't push something out.

On Tue, Sep 29, 2009 at 7:39 AM, Isabel Drost <is...@apache.org> wrote:
> On Mon, 28 Sep 2009 21:08:32 -0400
> Grant Ingersoll <gs...@apache.org> wrote:
>
>> > I can drive this and propose we publish 0.2 from code as of this
>> > Friday. Is that too soon to polish what's there?
>
> Sorry, I wanted to spend some time on some of the issues this weekend* -
> won't have time before Friday.
>
>
>> Well, we should go through and evaluate what is open and whether it
>> really should be in 0.2, instead of just fixing a date and cutting
>> things off.  There are a few open items that I think need to be in
>> 0.2, most importantly the SparseVector speedups.
>
> +1
>
> Isabel
>
> * http://wiki.upstream-berlin.com/index.php/DevHouseBerlin2
>

Re: 0.2

Posted by Isabel Drost <is...@apache.org>.
On Mon, 28 Sep 2009 21:08:32 -0400
Grant Ingersoll <gs...@apache.org> wrote:

> > I can drive this and propose we publish 0.2 from code as of this
> > Friday. Is that too soon to polish what's there?

Sorry, I wanted to spend some time on some of the issues this weekend* -
won't have time before Friday.

 
> Well, we should go through and evaluate what is open and whether it  
> really should be in 0.2, instead of just fixing a date and cutting  
> things off.  There are a few open items that I think need to be in  
> 0.2, most importantly the SparseVector speedups.

+1

Isabel

* http://wiki.upstream-berlin.com/index.php/DevHouseBerlin2

Re: 0.2

Posted by Grant Ingersoll <gs...@apache.org>.
On Sep 28, 2009, at 5:05 PM, Sean Owen wrote:

> +1 since it's probably been too long, and, it is appropriate to
> publish point releases at this stage simply to get bug fixes, API
> changes, and enhancements out there early. If you believe 1.0 should
> be released in about a year, we should think about releasing every 6
> weeks or so!
>
> There are most certainly enough changes in 0.2 already. I'd strongly
> suggest we not view this as a call to figure out what else we might
> like to start, work on, and maybe finish in a month -- just mark that
> 0.3.
>
> I can drive this and propose we publish 0.2 from code as of this
> Friday. Is that too soon to polish what's there?

Well, we should go through and evaluate what is open and whether it  
really should be in 0.2, instead of just fixing a date and cutting  
things off.  There are a few open items that I think need to be in  
0.2, most importantly the SparseVector speedups.

>
> On Mon, Sep 28, 2009 at 6:53 PM, Grant Ingersoll  
> <gs...@apache.org> wrote:
>> Not too many open at this point:
>>  https://issues.apache.org/jira/secure/BrowseVersion.jspa?id=12310751&versionId=12313278&showOpenIssuesOnly=true
>>
>> Some are relatively minor, others are ready, but just need a final  
>> review.
>>  Can we push towards mid-October for a release?  Anyone volunteer  
>> to be the
>> release mgr?
>>
>> -Grant
>>



Re: 0.2

Posted by Sean Owen <sr...@gmail.com>.
I imagine we're small and loose enough now that these things can go
together. What if I said, make it look releasable by Friday, and then
don't touch much until we release 0.2 the next week in short order?

Don't want to be too formal about this since 0.2 is overdue and we
should be in the habit of releasing more often.

All I am really opposed to, and I am not clear if anyone took it this
way, was *now* starting a month or two period of development on some
ideas people have had knocking around for a while. That's '0.3'
material now. We have plenty to push out with 0.2.

On Tue, Sep 29, 2009 at 1:30 AM, Ted Dunning <te...@gmail.com> wrote:
> To rephrase for confirmation, are you suggesting a function freeze this
> Friday followed before long in quick succession by a code freeze, bug fix
> fest and release?

Re: 0.2

Posted by Ted Dunning <te...@gmail.com>.
To rephrase for confirmation, are you suggesting a function freeze this
Friday followed before long in quick succession by a code freeze, bug fix
fest and release?

On Mon, Sep 28, 2009 at 2:05 PM, Sean Owen <sr...@gmail.com> wrote:

>
> I can drive this and propose we publish 0.2 from code as of this
> Friday. Is that too soon to polish what's there?




-- 
Ted Dunning, CTO
DeepDyve

Re: 0.2

Posted by Sean Owen <sr...@gmail.com>.
+1 since it's probably been too long, and, it is appropriate to
publish point releases at this stage simply to get bug fixes, API
changes, and enhancements out there early. If you believe 1.0 should
be released in about a year, we should think about releasing every 6
weeks or so!

There are most certainly enough changes in 0.2 already. I'd strongly
suggest we not view this as a call to figure out what else we might
like to start, work on, and maybe finish in a month -- just mark that
0.3.

I can drive this and propose we publish 0.2 from code as of this
Friday. Is that too soon to polish what's there?

On Mon, Sep 28, 2009 at 6:53 PM, Grant Ingersoll <gs...@apache.org> wrote:
> Not too many open at this point:
>  https://issues.apache.org/jira/secure/BrowseVersion.jspa?id=12310751&versionId=12313278&showOpenIssuesOnly=true
>
> Some are relatively minor, others are ready, but just need a final review.
>  Can we push towards mid-October for a release?  Anyone volunteer to be the
> release mgr?
>
> -Grant
>