You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Grant Ingersoll <gs...@apache.org> on 2010/08/08 21:20:21 UTC

Thinking about next release

Now that GSOC is almost done, seems like the time to start thinking about 0.4 (or maybe 0.9, i.e. the last release before 1.0?)  Obviously, we still need to incorporate in much of the GSOC work, so reality says Sept. or October for a release, but maybe sooner if we are sufficiently motivated.

Thoughts?

-Grant

Re: Thinking about next release

Posted by Isabel Drost <is...@apache.org>.
On Mon, 9 Aug 2010  Robin Anil <ro...@gmail.com> wrote:
> +1 for calling it 0.4.

+1


> I feel 0.9 should be almost like 1.0 in terms of behavior which it
> is not at the moment.

I think we are somewhere close to 0.9 in terms of functionality, that
is supported algorithms. However when thinking of users of Mahout there
are still quite some tasks that would make life way easier (see also
Ted's and Jake's comments.)

One option to getting closer to a final 1.0 that I see is to do the 0.4
now and concentrate on cleaning up APIs, supported data formats and
such for the following release. There seems to be quite some interest
in that right now on the dev list.


Isabel


Re: Thinking about next release

Posted by Robin Anil <ro...@gmail.com>.
+1 for calling it 0.4. I feel 0.9 should be almost like 1.0 in terms of
behavior which it is not at the moment. Lets get the features first into
this release. My plate for this release include a FPGrowth bug and Zhao's
pegasos and liblinear classifiers

Robin

On Mon, Aug 9, 2010 at 5:41 PM, Jake Mannix <ja...@gmail.com> wrote:

> On Mon, Aug 9, 2010 at 5:22 PM, Ted Dunning <te...@gmail.com> wrote:
>
> > It doesn't quite feel like 0.9 to me yet.
> >
> > To clarify, to me 0.9 means a compatibility release like in Lucene with
> > little functionality difference between 0.9 and 1.0.
> >
> > Jake and Robin and I were talking the other evening and a common lament
> was
> > that our classification (and clustering) stuff was all over the map in
> > terms
> > of data structures.  Driving that to rest and getting those comments even
> > vaguely as plug and play as our much more advanced recommendation
> > components
> > would be very, very helpful.
> >
>
> I definitely agree.  I think we need to standardize classification, LDA,
> clustering, and
> recommendations to use some similar formats.  It would be really nice if
> someone
> could take the *same* input, and do LDA on it, SVD, and fuzzy k-means, and
> then use the induced metric on your inputs to decide which works best, and
> then
> use that to feed into an item-based recommender, for example.
>
> It's definitely not yet "almost 1.0", in my eyes, at least.  Once we have
> 1.0, we
> are going to have to be way careful on changing APIs, and I personally
> think
> we have some APIs which could change a bit in the next few minor releases,
> especially in the distributed linear algebra space (ie things related to
> DistributedRowMatrix and its cousins).
>
> We may also, for example, want to explore an API change/addition which
> allows "vectors" which are boolean or float valued (for space savings),
> and keyed on longs (for the truly large-scale case).
>
> Another thing that we will want to do before 1.0 is fully integrate what
> we want out of COLT.
>
> That's just off the top of my head, for things we should at least have
> some story about before 1.0.
>
>  -jake
>
>
> >
> > On Mon, Aug 9, 2010 at 5:48 AM, Grant Ingersoll <gs...@apache.org>
> > wrote:
> >
> > > So, how about we shoot for pencils down on September 1?  That should
> give
> > > us enough time to incorporate GSOC, M-228, etc.  Then, we can do a 5-7
> > day
> > > freeze and then release.
> > >
> > > Any thoughts on 0.4 vs. 0.9?  I'm kind of leaning towards 0.9, but I
> > don't
> > > want to paint us into a corner either.  From what I've seen, many of
> our
> > > APIs our firming up.  That being said, maybe two more releases pre 1.0
> > would
> > > be good.
> > >
> > > -Grant
> > >
> > > On Aug 9, 2010, at 5:54 AM, Sebastian Schelter wrote:
> > >
> > > > Regarding the issues I work on, I wanna see MAHOUT-460 (add
> > > > "maxPreferencesPerItemConsidered" to the ItemSimilarityJob) and
> > > > MAHOUT-457 (make ItemSimilarityJob and RecommenderJob work on
> > > > ElasticMapReduce) being included in the 0.4 release. It should be no
> > > > problem to get them done until September.
> > > >
> > > > --sebastian
> > > >
> > > > Am 08.08.2010 21:20, schrieb Grant Ingersoll:
> > > >> Now that GSOC is almost done, seems like the time to start thinking
> > > about 0.4 (or maybe 0.9, i.e. the last release before 1.0?)  Obviously,
> > we
> > > still need to incorporate in much of the GSOC work, so reality says
> Sept.
> > or
> > > October for a release, but maybe sooner if we are sufficiently
> motivated.
> > > >>
> > > >> Thoughts?
> > > >>
> > > >> -Grant
> > > >
> > >
> > >
> > >
> >
>

Re: Thinking about next release

Posted by Jake Mannix <ja...@gmail.com>.
On Mon, Aug 9, 2010 at 5:22 PM, Ted Dunning <te...@gmail.com> wrote:

> It doesn't quite feel like 0.9 to me yet.
>
> To clarify, to me 0.9 means a compatibility release like in Lucene with
> little functionality difference between 0.9 and 1.0.
>
> Jake and Robin and I were talking the other evening and a common lament was
> that our classification (and clustering) stuff was all over the map in
> terms
> of data structures.  Driving that to rest and getting those comments even
> vaguely as plug and play as our much more advanced recommendation
> components
> would be very, very helpful.
>

I definitely agree.  I think we need to standardize classification, LDA,
clustering, and
recommendations to use some similar formats.  It would be really nice if
someone
could take the *same* input, and do LDA on it, SVD, and fuzzy k-means, and
then use the induced metric on your inputs to decide which works best, and
then
use that to feed into an item-based recommender, for example.

It's definitely not yet "almost 1.0", in my eyes, at least.  Once we have
1.0, we
are going to have to be way careful on changing APIs, and I personally think
we have some APIs which could change a bit in the next few minor releases,
especially in the distributed linear algebra space (ie things related to
DistributedRowMatrix and its cousins).

We may also, for example, want to explore an API change/addition which
allows "vectors" which are boolean or float valued (for space savings),
and keyed on longs (for the truly large-scale case).

Another thing that we will want to do before 1.0 is fully integrate what
we want out of COLT.

That's just off the top of my head, for things we should at least have
some story about before 1.0.

  -jake


>
> On Mon, Aug 9, 2010 at 5:48 AM, Grant Ingersoll <gs...@apache.org>
> wrote:
>
> > So, how about we shoot for pencils down on September 1?  That should give
> > us enough time to incorporate GSOC, M-228, etc.  Then, we can do a 5-7
> day
> > freeze and then release.
> >
> > Any thoughts on 0.4 vs. 0.9?  I'm kind of leaning towards 0.9, but I
> don't
> > want to paint us into a corner either.  From what I've seen, many of our
> > APIs our firming up.  That being said, maybe two more releases pre 1.0
> would
> > be good.
> >
> > -Grant
> >
> > On Aug 9, 2010, at 5:54 AM, Sebastian Schelter wrote:
> >
> > > Regarding the issues I work on, I wanna see MAHOUT-460 (add
> > > "maxPreferencesPerItemConsidered" to the ItemSimilarityJob) and
> > > MAHOUT-457 (make ItemSimilarityJob and RecommenderJob work on
> > > ElasticMapReduce) being included in the 0.4 release. It should be no
> > > problem to get them done until September.
> > >
> > > --sebastian
> > >
> > > Am 08.08.2010 21:20, schrieb Grant Ingersoll:
> > >> Now that GSOC is almost done, seems like the time to start thinking
> > about 0.4 (or maybe 0.9, i.e. the last release before 1.0?)  Obviously,
> we
> > still need to incorporate in much of the GSOC work, so reality says Sept.
> or
> > October for a release, but maybe sooner if we are sufficiently motivated.
> > >>
> > >> Thoughts?
> > >>
> > >> -Grant
> > >
> >
> >
> >
>

Re: Thinking about next release

Posted by Ted Dunning <te...@gmail.com>.
It doesn't quite feel like 0.9 to me yet.

To clarify, to me 0.9 means a compatibility release like in Lucene with
little functionality difference between 0.9 and 1.0.

Jake and Robin and I were talking the other evening and a common lament was
that our classification (and clustering) stuff was all over the map in terms
of data structures.  Driving that to rest and getting those comments even
vaguely as plug and play as our much more advanced recommendation components
would be very, very helpful.

On Mon, Aug 9, 2010 at 5:48 AM, Grant Ingersoll <gs...@apache.org> wrote:

> So, how about we shoot for pencils down on September 1?  That should give
> us enough time to incorporate GSOC, M-228, etc.  Then, we can do a 5-7 day
> freeze and then release.
>
> Any thoughts on 0.4 vs. 0.9?  I'm kind of leaning towards 0.9, but I don't
> want to paint us into a corner either.  From what I've seen, many of our
> APIs our firming up.  That being said, maybe two more releases pre 1.0 would
> be good.
>
> -Grant
>
> On Aug 9, 2010, at 5:54 AM, Sebastian Schelter wrote:
>
> > Regarding the issues I work on, I wanna see MAHOUT-460 (add
> > "maxPreferencesPerItemConsidered" to the ItemSimilarityJob) and
> > MAHOUT-457 (make ItemSimilarityJob and RecommenderJob work on
> > ElasticMapReduce) being included in the 0.4 release. It should be no
> > problem to get them done until September.
> >
> > --sebastian
> >
> > Am 08.08.2010 21:20, schrieb Grant Ingersoll:
> >> Now that GSOC is almost done, seems like the time to start thinking
> about 0.4 (or maybe 0.9, i.e. the last release before 1.0?)  Obviously, we
> still need to incorporate in much of the GSOC work, so reality says Sept. or
> October for a release, but maybe sooner if we are sufficiently motivated.
> >>
> >> Thoughts?
> >>
> >> -Grant
> >
>
>
>

Re: Thinking about next release

Posted by Grant Ingersoll <gs...@apache.org>.
So, how about we shoot for pencils down on September 1?  That should give us enough time to incorporate GSOC, M-228, etc.  Then, we can do a 5-7 day freeze and then release.

Any thoughts on 0.4 vs. 0.9?  I'm kind of leaning towards 0.9, but I don't want to paint us into a corner either.  From what I've seen, many of our APIs our firming up.  That being said, maybe two more releases pre 1.0 would be good.

-Grant

On Aug 9, 2010, at 5:54 AM, Sebastian Schelter wrote:

> Regarding the issues I work on, I wanna see MAHOUT-460 (add
> "maxPreferencesPerItemConsidered" to the ItemSimilarityJob) and
> MAHOUT-457 (make ItemSimilarityJob and RecommenderJob work on
> ElasticMapReduce) being included in the 0.4 release. It should be no
> problem to get them done until September.
> 
> --sebastian
> 
> Am 08.08.2010 21:20, schrieb Grant Ingersoll:
>> Now that GSOC is almost done, seems like the time to start thinking about 0.4 (or maybe 0.9, i.e. the last release before 1.0?)  Obviously, we still need to incorporate in much of the GSOC work, so reality says Sept. or October for a release, but maybe sooner if we are sufficiently motivated.
>> 
>> Thoughts?
>> 
>> -Grant
> 



Re: Thinking about next release

Posted by Sebastian Schelter <ss...@googlemail.com>.
Regarding the issues I work on, I wanna see MAHOUT-460 (add
"maxPreferencesPerItemConsidered" to the ItemSimilarityJob) and
MAHOUT-457 (make ItemSimilarityJob and RecommenderJob work on
ElasticMapReduce) being included in the 0.4 release. It should be no
problem to get them done until September.

--sebastian

Am 08.08.2010 21:20, schrieb Grant Ingersoll:
> Now that GSOC is almost done, seems like the time to start thinking about 0.4 (or maybe 0.9, i.e. the last release before 1.0?)  Obviously, we still need to incorporate in much of the GSOC work, so reality says Sept. or October for a release, but maybe sooner if we are sufficiently motivated.
>
> Thoughts?
>
> -Grant


Re: Thinking about next release

Posted by Sean Owen <sr...@gmail.com>.
Agree, I had been saying "September" in board reports. Sounds about
right. Let's call it 0.4 for now I think.

We all know it takes some time. Now would be the time to dig through
JIRA and update status. If it's done, mark it as such. If it's going
to happen by 0.4, mark it as such and finish it up. Otherwise mark as
0.5.

In a short while I will start that process of pushing out mails with a
list of open issues that don't match the above and we'll start hashing
it out. It goes faster if this can be pre-addressed by everyone here.

On Sun, Aug 8, 2010 at 2:20 PM, Grant Ingersoll <gs...@apache.org> wrote:
> Now that GSOC is almost done, seems like the time to start thinking about 0.4 (or maybe 0.9, i.e. the last release before 1.0?)  Obviously, we still need to incorporate in much of the GSOC work, so reality says Sept. or October for a release, but maybe sooner if we are sufficiently motivated.
>
> Thoughts?
>
> -Grant

Re: Thinking about next release

Posted by Ted Dunning <te...@gmail.com>.
I think we should push for September if we can. I personally push  
m-228 in time for that.

Sent from my iPhone

On Aug 8, 2010, at 12:20 PM, Grant Ingersoll <gs...@apache.org>  
wrote:

> Thoughts?