You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Jason Rennie <jr...@gmail.com> on 2008/03/17 21:00:03 UTC

matrix implementation tradeoffs: space and time

Realized I should have been using IntKeyDoubleChainedHashMap instead of
IntKeyDoubleOpenHashMap for comparison vs. java.util.HashMap since the
HashMap implementation is almost certainly a Chained Hash Table.  Chained
beats Open, but HashMap still wins by a small margin, again showing better
results when load is high (50%).  See below for details.

I also did some testing of heap space using using the technique described by
Vladimir Roubtsov in this article:
http://www.javaworld.com/javaworld/javatips/jw-javatip130.html

The primitive versions save space, but the difference isn't as large as I
would have expected:

HashMap<Integer, Double> - 691127 bytes for 10000 entries, or appx. 69
bytes/entry
IntKeyDoubleChainedHashMap - 400092 bytes for 10000 entries, or appx. 40
bytes/entry
IntKeyDoubleOpenHashMap - 266400 bytes for 10000 entries, or appx. 27
bytes/entry

I initialized all the maps w/ a capacity of 20000.  I used random values to
populate.  HashMap would see some duplicate value savings and you'd need a
larger capacity to bring the primitive OpenHashMap up to
java.util.HashMapspeed.  So, at best, a primitive map would use half
the space of
java.util.HashMap.

Jason

Speed test details:

    /**
     * <ul>
     * <li> Finished HashMap dot product in 3.255 seconds.
     * <li> Finished PrimitiveOpenMap dot product in 4.010 seconds.
     * <li> Finished PrimitiveChainedMap dot product in 3.251 seconds.
     * <li> Finished CRS dot product in 1.202 seconds.
     * <li> Finished SW base time in 0.548 seconds.
     * <li> numTrials=500000 vectorSize=1000 nnz1=100 nnz2=100
loadFactor=25%
     * </ul>
     * <ul>
     * <li> Finished HashMap dot product in 2.117 seconds.
     * <li> Finished PrimitiveOpenMap dot product in 2.136 seconds.
     * <li> Finished PrimitiveChainedMap dot product in 1.919 seconds.
     * <li> Finished CRS dot product in 1.138 seconds.
     * <li> Finished SW base time in 0.568 seconds.
     * <li> numTrials=500000 vectorSize=1000 nnz1=200 nnz2=50 loadFactor=25%
     * </ul>
     * <ul>
     * <li> Finished HashMap dot product in 3.012 seconds.
     * <li> Finished PrimitiveOpenMap dot product in 4.245 seconds.
     * <li> Finished PrimitiveChainedMap dot product in 3.316 seconds.
     * <li> Finished CRS dot product in 1.243 seconds.
     * <li> Finished SW base time in 0.557 seconds.
     * <li> numTrials=500000 vectorSize=1000 nnz1=100 nnz2=100
loadFactor=50%
     * </ul>
     * <ul>
     * <li> Finished HashMap dot product in 1.999 seconds.
     * <li> Finished PrimitiveOpenMap dot product in 2.354 seconds.
     * <li> Finished PrimitiveChainedMap dot product in 2.235 seconds.
     * <li> Finished CRS dot product in 1.157 seconds.
     * <li> Finished SW base time in 0.568 seconds.
     * <li> numTrials=500000 vectorSize=1000 nnz1=200 nnz2=50 loadFactor=50%
     * </ul>
     */