You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Tanton Gibbs <ta...@gmail.com> on 2009/09/14 18:23:06 UTC

Areas needing help

Hi,

I'd like to start working more with the mahout code, making small
improvements here and there.  I want to primarily focus on performance
improvements and unit testing (mainly because I enjoy doing that).
However, I'd like to improve a place that needs improvement.  If you
know of a section of code that you would like to see refactored/sped
up/tested could you please send it to the list or to me?  Or, if there
is a wiki page on this, please point me to it and accept my apologies.

Thanks!
Tanton

Re: Areas needing help

Posted by Ted Dunning <te...@gmail.com>.
Tanton,

Thanks for the effort in advance. I can't point you just at the moment, but
I (and others) will come up with a list in short order.

On Mon, Sep 14, 2009 at 9:23 AM, Tanton Gibbs <ta...@gmail.com>wrote:

> Hi,
>
> I'd like to start working more with the mahout code, making small
> improvements here and there.  I want to primarily focus on performance
> improvements and unit testing (mainly because I enjoy doing that).
> However, I'd like to improve a place that needs improvement.  If you
> know of a section of code that you would like to see refactored/sped
> up/tested could you please send it to the list or to me?  Or, if there
> is a wiki page on this, please point me to it and accept my apologies.
>
> Thanks!
> Tanton
>



-- 
Ted Dunning, CTO
DeepDyve

Re: Areas needing help

Posted by Isabel Drost <is...@apache.org>.
On Mon, 14 Sep 2009 18:42:30 +0100
Sean Owen <sr...@gmail.com> wrote:

> EMMA is super, and not too hard to setup. Run some code that exercises
> the library and it'll tell you what doesn't ever seem to be executed.
> Running all unit tests through it therefore shows quite clearly what's
> not getting touched by test execution.

There is an eclipse plugin that makes running EMMA trivial (just
right-click on the src/test/java folder and select the "coverage as
unit tests") button. I would assume, something similar exists for
IntelliJ.

There is also an Emma-Maven Plugin that checks coverage at build time
and can generate a nice coverage report for the project reports page.


Isabel

Re: Areas needing help

Posted by Sean Owen <sr...@gmail.com>.
Yeah, well at some point they were quite thorough. I admit I haven't
been good about adding tests for new code, but, still it's mostly
covered, versus having a couple tests here and there and most not
tested at all.

EMMA is super, and not too hard to setup. Run some code that exercises
the library and it'll tell you what doesn't ever seem to be executed.
Running all unit tests through it therefore shows quite clearly what's
not getting touched by test execution.

Go for it -- I would consider profiling a slightly higher priority,
but either is a great contribution. Along the way I am sure you will
both get familiar with the code and be able to suggest more changes
and extensions to it. And that is always good for a project.

On Mon, Sep 14, 2009 at 6:25 PM, Tanton Gibbs <ta...@gmail.com> wrote:
> If the CF unit tests have been started, then fleshing them out sounds
> like a good place to start.  I've not used EMMA, so that sounds like
> fun.
>
> Thanks for the pointers!
> Tanton
>
> On Mon, Sep 14, 2009 at 10:12 AM, Sean Owen <sr...@gmail.com> wrote:
>> FWIW I have been profiling the CF code a lot lately so think it has had a
>> recent, close look. I might turn profilers to areas that haven't had as
>> close a look.
>>
>> The CF unit tests are fine if lagging in completeness. I think it would be
>> perhaps relatively simpler to locate some untested bits and add a test or
>> two - if you can run EMMA on the current tests you will quickly find a few
>> gaps.
>>
>> I think as a result of trying to test some code you will locate some useful
>> ways to refactor and structure the code. That sort of thing strikes me as
>> important now. Best to make some big code moves early while the API is
>> understood to be very in flux, and to build a solid foundation.
>>
>> For instance this week I would like to continue by proposing to merge and
>> reshuffle the utils and common packages.
>>
>> On Sep 14, 2009 5:33 PM, "Grant Ingersoll" <gs...@apache.org> wrote:
>>
>> On Sep 14, 2009, at 12:23 PM, Tanton Gibbs wrote: > Hi, > > I'd like to
>> start working more with th...
>> Testing and profiling of the clustering, classification and collab filtering
>> code would be very welcome.   There are several open issues in JIRA related
>> to these (MAHOUT-165 comes to mind).
>>
>> I think just running some examples at scale and reporting back results would
>> be great as well.  You can also start by looking at
>> https://issues.apache.org/jira/browse/MAHOUT
>>
>> One idea is to take the Wikipedia examples I put up at
>> https://www.ibm.com/developerworks/java/library/j-mahout/index.html (I will
>> donate the code soon) and try running them at larger scale for Wikipedia.
>>
>

Re: Areas needing help

Posted by Ted Dunning <te...@gmail.com>.
Does the standard maven build run profilng.

On Mon, Sep 14, 2009 at 10:25 AM, Tanton Gibbs <ta...@gmail.com>wrote:

> If the CF unit tests have been started, then fleshing them out sounds
> like a good place to start.  I've not used EMMA, so that sounds like
> fun.
>
>

Re: Areas needing help

Posted by Tanton Gibbs <ta...@gmail.com>.
If the CF unit tests have been started, then fleshing them out sounds
like a good place to start.  I've not used EMMA, so that sounds like
fun.

Thanks for the pointers!
Tanton

On Mon, Sep 14, 2009 at 10:12 AM, Sean Owen <sr...@gmail.com> wrote:
> FWIW I have been profiling the CF code a lot lately so think it has had a
> recent, close look. I might turn profilers to areas that haven't had as
> close a look.
>
> The CF unit tests are fine if lagging in completeness. I think it would be
> perhaps relatively simpler to locate some untested bits and add a test or
> two - if you can run EMMA on the current tests you will quickly find a few
> gaps.
>
> I think as a result of trying to test some code you will locate some useful
> ways to refactor and structure the code. That sort of thing strikes me as
> important now. Best to make some big code moves early while the API is
> understood to be very in flux, and to build a solid foundation.
>
> For instance this week I would like to continue by proposing to merge and
> reshuffle the utils and common packages.
>
> On Sep 14, 2009 5:33 PM, "Grant Ingersoll" <gs...@apache.org> wrote:
>
> On Sep 14, 2009, at 12:23 PM, Tanton Gibbs wrote: > Hi, > > I'd like to
> start working more with th...
> Testing and profiling of the clustering, classification and collab filtering
> code would be very welcome.   There are several open issues in JIRA related
> to these (MAHOUT-165 comes to mind).
>
> I think just running some examples at scale and reporting back results would
> be great as well.  You can also start by looking at
> https://issues.apache.org/jira/browse/MAHOUT
>
> One idea is to take the Wikipedia examples I put up at
> https://www.ibm.com/developerworks/java/library/j-mahout/index.html (I will
> donate the code soon) and try running them at larger scale for Wikipedia.
>

Re: Areas needing help

Posted by Sean Owen <sr...@gmail.com>.
FWIW I have been profiling the CF code a lot lately so think it has had a
recent, close look. I might turn profilers to areas that haven't had as
close a look.

The CF unit tests are fine if lagging in completeness. I think it would be
perhaps relatively simpler to locate some untested bits and add a test or
two - if you can run EMMA on the current tests you will quickly find a few
gaps.

I think as a result of trying to test some code you will locate some useful
ways to refactor and structure the code. That sort of thing strikes me as
important now. Best to make some big code moves early while the API is
understood to be very in flux, and to build a solid foundation.

For instance this week I would like to continue by proposing to merge and
reshuffle the utils and common packages.

On Sep 14, 2009 5:33 PM, "Grant Ingersoll" <gs...@apache.org> wrote:

On Sep 14, 2009, at 12:23 PM, Tanton Gibbs wrote: > Hi, > > I'd like to
start working more with th...
Testing and profiling of the clustering, classification and collab filtering
code would be very welcome.   There are several open issues in JIRA related
to these (MAHOUT-165 comes to mind).

I think just running some examples at scale and reporting back results would
be great as well.  You can also start by looking at
https://issues.apache.org/jira/browse/MAHOUT

One idea is to take the Wikipedia examples I put up at
https://www.ibm.com/developerworks/java/library/j-mahout/index.html (I will
donate the code soon) and try running them at larger scale for Wikipedia.

Re: Areas needing help

Posted by Grant Ingersoll <gs...@apache.org>.
On Sep 14, 2009, at 12:23 PM, Tanton Gibbs wrote:

> Hi,
>
> I'd like to start working more with the mahout code, making small
> improvements here and there.  I want to primarily focus on performance
> improvements and unit testing (mainly because I enjoy doing that).
> However, I'd like to improve a place that needs improvement.  If you
> know of a section of code that you would like to see refactored/sped
> up/tested could you please send it to the list or to me?  Or, if there
> is a wiki page on this, please point me to it and accept my apologies.
>

Testing and profiling of the clustering, classification and collab  
filtering code would be very welcome.   There are several open issues  
in JIRA related to these (MAHOUT-165 comes to mind).

I think just running some examples at scale and reporting back results  
would be great as well.  You can also start by looking at https://issues.apache.org/jira/browse/MAHOUT

One idea is to take the Wikipedia examples I put up at https://www.ibm.com/developerworks/java/library/j-mahout/index.html 
  (I will donate the code soon) and try running them at larger scale  
for Wikipedia.