You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Benson Margulies <bi...@gmail.com> on 2010/01/16 18:33:17 UTC

A modest proposal for the Carrot integration

I propose a branch. Diffs from the branch to the trunk can still be
posted on the JIRA, but I think that a branch would be worthwhile in
facilitating collaboration.

I volunteer to fight with the maven-release-plugin to make it.

Re: A modest proposal for the Carrot integration

Posted by Ted Dunning <te...@gmail.com>.
I try to never say anything that decreases the output of a very productive
person.  I often fail, but I try.

On Sat, Jan 16, 2010 at 10:11 AM, Benson Margulies <bi...@gmail.com>wrote:

> Sure you could. The 'refine patches attached to JIRA' approach is the
> classic Lucene project methodology, and I'm the new kid on the block
> here.
>



-- 
Ted Dunning, CTO
DeepDyve

Re: A modest proposal for the Carrot integration

Posted by Benson Margulies <bi...@gmail.com>.
Sure you could. The 'refine patches attached to JIRA' approach is the
classic Lucene project methodology, and I'm the new kid on the block
here.

On Sat, Jan 16, 2010 at 12:50 PM, Ted Dunning <te...@gmail.com> wrote:
> How can we say no?
>
> On Sat, Jan 16, 2010 at 9:33 AM, Benson Margulies <bi...@gmail.com>wrote:
>
>> I volunteer to fight with the maven-release-plugin to make it.
>
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>

Re: A modest proposal for the Carrot integration

Posted by Ted Dunning <te...@gmail.com>.
How can we say no?

On Sat, Jan 16, 2010 at 9:33 AM, Benson Margulies <bi...@gmail.com>wrote:

> I volunteer to fight with the maven-release-plugin to make it.




-- 
Ted Dunning, CTO
DeepDyve

Re: A modest proposal for the Carrot integration

Posted by Dawid Weiss <da...@gmail.com>.
> I'm not quite done with Colt.

No, no -- you didn't understand me right. Let's work in parallel, I'll
try to polish the edges of HPPC in those places that I know are not
exactly the way I feel they should be, you finish with Colt's
integration -- having Apache-licensed Colt is a value on its own. I
will provide a cleaner patch, but branching is a good idea since
moving from Colt collections may require major code sweeps and we
don't want everyone to suffer because of this.

I think I should be done with this "cleaner" HPPC release by
Wednesday, if it's all right.

D.

>
> If you think you can refine a patch to go straight into the mahout
> trunk, don't let me stop you.
>
>
> On Sat, Jan 16, 2010 at 3:48 PM, Dawid Weiss <da...@gmail.com> wrote:
>> Have you finished with Colt? I think this is still worth completing
>> before we proceed to HPPC. Just talked to Staszek, we will move HPPC
>> code to Carrot2 labs SVN repository (sourceforge) because we want to
>> get rid of PCJ as soon as possible and need something versioned and
>> sticky. I plan to make a few additions to HPPC that I could work on
>> while you're completing the Colt stuff. Hopefully we can also get this
>> ArrayIndexOutOfBounds beast in the mean time.
>>
>> If you're done with Colt, I can commit directly to Mahout's branch and
>> work from there.
>>
>> Dawid
>>
>

Re: A modest proposal for the Carrot integration

Posted by Benson Margulies <bi...@gmail.com>.
I'm not quite done with Colt.

If you think you can refine a patch to go straight into the mahout
trunk, don't let me stop you.


On Sat, Jan 16, 2010 at 3:48 PM, Dawid Weiss <da...@gmail.com> wrote:
> Have you finished with Colt? I think this is still worth completing
> before we proceed to HPPC. Just talked to Staszek, we will move HPPC
> code to Carrot2 labs SVN repository (sourceforge) because we want to
> get rid of PCJ as soon as possible and need something versioned and
> sticky. I plan to make a few additions to HPPC that I could work on
> while you're completing the Colt stuff. Hopefully we can also get this
> ArrayIndexOutOfBounds beast in the mean time.
>
> If you're done with Colt, I can commit directly to Mahout's branch and
> work from there.
>
> Dawid
>

Re: A modest proposal for the Carrot integration

Posted by Dawid Weiss <da...@gmail.com>.
Have you finished with Colt? I think this is still worth completing
before we proceed to HPPC. Just talked to Staszek, we will move HPPC
code to Carrot2 labs SVN repository (sourceforge) because we want to
get rid of PCJ as soon as possible and need something versioned and
sticky. I plan to make a few additions to HPPC that I could work on
while you're completing the Colt stuff. Hopefully we can also get this
ArrayIndexOutOfBounds beast in the mean time.

If you're done with Colt, I can commit directly to Mahout's branch and
work from there.

Dawid

Re: A modest proposal for the Carrot integration

Posted by Benson Margulies <bi...@gmail.com>.
On Sat, Jan 16, 2010 at 1:15 PM, Dawid Weiss <da...@gmail.com> wrote:
>> I propose a branch. Diffs from the branch to the trunk can still be
>> posted on the JIRA, but I think that a branch would be worthwhile in
>> facilitating collaboration.
>
> Do you mean -- for merging with the code I posted earlier?

Yes, To be specific:

1) make a branch
2) in the branch, make a module for HPPC, and check in.
3) in the branch, fiddle the other math code to use HPPC instead of
the colt collections.
4) Stir vigorously until the sort of thing you're reporting is dealt with.
5) Patch across to the trunk.

Re: A modest proposal for the Carrot integration

Posted by Dawid Weiss <da...@gmail.com>.
> I propose a branch. Diffs from the branch to the trunk can still be
> posted on the JIRA, but I think that a branch would be worthwhile in
> facilitating collaboration.

Do you mean -- for merging with the code I posted earlier?

By the way, I've intergrated Colt from Mahout with our code base.
Interesting things started to happen. First, we had this:

Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
        at org.apache.mahout.math.matrix.doublealgo.Sorting$4.compare(Sorting.java:214)
        at org.apache.mahout.math.Sorting.quickSort0(Sorting.java:725)
        at org.apache.mahout.math.Sorting.quickSort0(Sorting.java:773)
        at org.apache.mahout.math.Sorting.quickSort(Sorting.java:662)
        at org.apache.mahout.math.matrix.doublealgo.Sorting.runSort(Sorting.java:80)
        at org.apache.mahout.math.matrix.doublealgo.Sorting.sort(Sorting.java:236)
        at org.carrot2.matrix.factorization.IterativeMatrixFactorizationBase.order(IterativeMatrixFactorizationBase.java:149)

When we added debugging statements -- the exception was gone. After a
(longer) while, I checked for VM bugs. Yes, that was it -- there was a
bug in the release of SUN's JVM 1.5 that we had on our server (for
running 1.5-compliance builds). We upgraded that release and... we
still have random exceptions with the above stack. More -- we have
them with the newest 1.6 as well... Adding debugging statements makes
the builds pass in flying colors. The bug only happens on one machine
(which does have memory correction and is a server-class stuff).

In other words -- I've no idea what is happening.

D.