You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Marko Ciric <ci...@gmail.com> on 2011/06/23 15:25:27 UTC

Mahout and Kolt

How similar are Mahout collections (like FastMap) with Kolt (cern.kolt)?


--
Marko Ćirić
ciric.marko@gmail.com

Re: Mahout and Kolt

Posted by Ken Krugler <kk...@transpac.com>.
On Jun 23, 2011, at 7:21am, Sean Owen wrote:

> Colt? the mahout-math and mahout-collections code are based on that.
> 
> However FastMap isn't -- that's my own creation from the old collaborative
> filtering framework.
> 
> But recently there has been talk about switching all of this to use fastutil

We use the fastutil native set/map implementations in a number of projects, and they work well.

The issues we've run into are:

1. The full jar is pretty big - 12MB.

They've got versions of every nxm combination of native types for maps, as an example.

So we often wind up pulling out just the versions we need.

2. Wrapping these classes with Hadoop serialization can be slow (they support Java serialization)

Not sure if that would be a factor, but serializing a 20M entry map takes some time.

-- Ken

> On Thu, Jun 23, 2011 at 2:25 PM, Marko Ciric <ci...@gmail.com> wrote:
> 
>> How similar are Mahout collections (like FastMap) with Kolt (cern.kolt)?
>> 
>> 
>> --
>> Marko Ćirić
>> ciric.marko@gmail.com
>> 

--------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
custom data mining solutions







Re: Mahout and Kolt

Posted by Ted Dunning <te...@gmail.com>.
We changed lots of names as we pulled them over.  We also added test cases.

The changes at this point are pretty substantial.  At the lower level, we
changed the way things worked and added new kinds of collections.  At the
math layer, we pretty massively changed things by adding the ability to
create new types of vectors and matrices more efficiently and easily than
with Colt and by deleting vats of code.  Pretty much all that survived was
the API style and some of the higher level algorithms like probability
distributions.  The Vector and Matrix classes are very different internally
from Colt.

On Thu, Jun 23, 2011 at 7:26 AM, Marko Ciric <ci...@gmail.com> wrote:

> Thanks.
> I was just browsing the Kolt source and haven't found them. Just asking.
>
>
> On 06/23/2011 04:21 PM, Sean Owen wrote:
>
>> Colt? the mahout-math and mahout-collections code are based on that.
>>
>> However FastMap isn't -- that's my own creation from the old collaborative
>> filtering framework.
>>
>> But recently there has been talk about switching all of this to use
>> fastutil
>> (?)
>>
>> On Thu, Jun 23, 2011 at 2:25 PM, Marko Ciric<ci...@gmail.com>
>>  wrote:
>>
>>  How similar are Mahout collections (like FastMap) with Kolt (cern.kolt)?
>>>
>>>
>>> --
>>> Marko Ćirić
>>> ciric.marko@gmail.com
>>>
>>>
>

Re: Mahout and Kolt

Posted by Marko Ciric <ci...@gmail.com>.
Thanks.
I was just browsing the Kolt source and haven't found them. Just asking.

On 06/23/2011 04:21 PM, Sean Owen wrote:
> Colt? the mahout-math and mahout-collections code are based on that.
>
> However FastMap isn't -- that's my own creation from the old collaborative
> filtering framework.
>
> But recently there has been talk about switching all of this to use fastutil
> (?)
>
> On Thu, Jun 23, 2011 at 2:25 PM, Marko Ciric<ci...@gmail.com>  wrote:
>
>> How similar are Mahout collections (like FastMap) with Kolt (cern.kolt)?
>>
>>
>> --
>> Marko Ćirić
>> ciric.marko@gmail.com
>>


Re: Mahout and Kolt

Posted by Sean Owen <sr...@gmail.com>.
Colt? the mahout-math and mahout-collections code are based on that.

However FastMap isn't -- that's my own creation from the old collaborative
filtering framework.

But recently there has been talk about switching all of this to use fastutil
(?)

On Thu, Jun 23, 2011 at 2:25 PM, Marko Ciric <ci...@gmail.com> wrote:

> How similar are Mahout collections (like FastMap) with Kolt (cern.kolt)?
>
>
> --
> Marko Ćirić
> ciric.marko@gmail.com
>