You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mahout.apache.org by co...@apache.org on 2010/04/03 20:47:00 UTC

[CONF] Apache Lucene Mahout > mahout-collections

Space: Apache Lucene Mahout (http://cwiki.apache.org/confluence/display/MAHOUT)
Page: mahout-collections (http://cwiki.apache.org/confluence/display/MAHOUT/mahout-collections)

Added by Benson Margulies:
---------------------------------------------------------------------
h1. Introduction

The Mahout Collections library is a set of container classes that address some limitations of the standard collections in Java. [This presentation|http://domino.research.ibm.com/comm/research_people.nsf/pages/sevitsky.pubs.html/$FILE/oopsla08%20memory-efficient%20java%20slides.pdf] describes a number of performance problems with the standard collections. 

Mahout collections addresses two of the more glaring: the lack of support for primitive types and the lack of open hashing.

h1. Primitive Types

The most visible feature of Mahout Collections is the large collection of primitive type collections. Given Java's asymmetrical support for the primitive types, the only efficient way to handle them is with many classes. So, there are ArrayList-like containers for all of the primitive types, and hash maps for all the useful combinations of primitive type and object keys and values.

These classes do not, in general, implement interfaces from {{java.util}}. Even when the {{java.util}} interfaces could be type-compatible, they tend to include requirements that are not consistent with efficient use of primitive types.

h1. Open Addressing

All of the sets and maps in Mahout Collections are open-addressed hash tables. Open addressing has a much smaller memory footprint than chaining. Since the purpose of these collections is to avoid the memory cost of autoboxing, open addressing is a consistent design choice.

h1. Sets

Mahout Collections includes open hash sets. Unlike {{java.util}}, a set is not a recycled hash table; the sets are separately implemented and do not have any additional storage usage for unused keys.

h1. Credit where Credit is due

The implementation of Mahout Collections is derived from [Cern Colt|http://acs.lbl.gov/~hoschek/colt/].








Change your notification preferences: http://cwiki.apache.org/confluence/users/viewnotifications.action