You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mahout.apache.org by ra...@apache.org on 2018/06/29 16:10:49 UTC
[05/18] mahout git commit: MAHOUT-2033 Fixed Map-Reduce Refactor

http://git-wip-us.apache.org/repos/asf/mahout/blob/49ad8cb4/core/src/main/java/org/apache/mahout/math/list/package-info.java
----------------------------------------------------------------------
diff --git a/core/src/main/java/org/apache/mahout/math/list/package-info.java b/core/src/main/java/org/apache/mahout/math/list/package-info.java
deleted file mode 100644
index 43b5c4b..0000000
--- a/core/src/main/java/org/apache/mahout/math/list/package-info.java
+++ /dev/null
@@ -1,144 +0,0 @@
-/**
- * <HTML>
- * <BODY>
- * Resizable lists holding objects or primitive data types such as <tt>int</tt>,
- * <tt>double</tt>, etc. For non-resizable lists (1-dimensional matrices) see
- * package <code>org.apache.mahout.math.matrix</code>.<p></p>
- * <h1><a name="Overview"></a>Getting Started</h1>
- * <h2>1. Overview</h2>
- * <p>The list package offers flexible object oriented abstractions modelling dynamically
- * resizing lists holding objects or primitive data types such as <tt>int</tt>,
- * <tt>double</tt>, etc. It is designed to be scalable in terms of performance
- * and memory requirements.</p>
- * <p>Features include: </p>
- * <p></p>
- * <ul>
- * <li>Lists operating on objects as well as all primitive data types such as <tt>int</tt>,
- * <tt>double</tt>, etc.
- * </li>
- * <li>Compact representations</li>
- * <li>A number of general purpose list operations including: adding, inserting,
- * removing, iterating, searching, sorting, extracting ranges and copying. All
- * operations are designed to perform well on mass data.
- * </li>
- * <li>Support for quick access to list elements. This is achieved by bounds-checking
- * and non-bounds-checking accessor methods as well as zero-copy transformations
- * to primitive arrays such as <tt>int[]</tt>, <tt>double[]</tt>, etc.
- * </li>
- * <li>Allows to use high level algorithms on primitive data types without any
- * space and time overhead. Operations on primitive arrays, Colt lists and JAL
- * algorithms can freely be mixed at zero copy overhead.
- * </li>
- * </ul>
- * <p>File-based I/O can be achieved through the standard Java built-in serialization
- * mechanism. All classes implement the {@link java.io.Serializable} interface.
- * However, the toolkit is entirely decoupled from advanced I/O. It provides data
- * structures and algorithms only.
- * <p> This toolkit borrows concepts and terminology from the Javasoft <a
- * href="http://www.javasoft.com/products/jdk/1.2/docs/guide/collections/index.html">
- * Collections framework</a> written by Josh Bloch and introduced in JDK 1.2.
- * <h2>2. Introduction</h2>
- * <p>Lists are fundamental to virtually any application. Large scale resizable lists
- * are, for example, used in scientific computations, simulations database management
- * systems, to name just a few.</p>
- * <h2></h2>
- * <p>A list is a container holding elements that can be accessed via zero-based
- * indexes. Lists may be implemented in different ways (most commonly with arrays).
- * A resizable list automatically grows as elements are added. The lists of this
- * package do not automatically shrink. Shrinking needs to be triggered by explicitly
- * calling <tt>trimToSize()</tt> methods.</p>
- * <p><i>Growing policy</i>: A list implemented with arrays initially has a certain
- * <tt>initialCapacity</tt> - per default 10 elements, but customizable upon instance
- * construction. As elements are added, this capacity may nomore be sufficient.
- * When a list is automatically grown, its capacity is expanded to <tt>1.5*currentCapacity</tt>.
- * Thus, excessive resizing (involving copying) is avoided.</p>
- * <h4>Copying</h4>
- * <p>
- * <p>Any list can be copied. A copy is <i>equal</i> to the original but entirely
- * independent of the original. So changes in the copy are not reflected in the
- * original, and vice-versa.
- * <h2>3. Organization of this package</h2>
- * <p>Class naming follows the schema <tt>&lt;ElementType&gt;&lt;ImplementationTechnique&gt;List</tt>.
- * For example, we have a {@link org.apache.mahout.math.list.DoubleArrayList}, which is a list
- * holding <tt>double</tt> elements implemented with <tt>double</tt>[] arrays.
- * </p>
- * <p>The classes for lists of a given value type are derived from a common abstract
- * base class tagged <tt>Abstract&lt;ElementType&gt;</tt><tt>List</tt>. For example,
- * all lists operating on <tt>double</tt> elements are derived from
- * {@link org.apache.mahout.math.list.AbstractDoubleList},
- * which in turn is derived from an abstract base class tying together all lists
- * regardless of value type, {@link org.apache.mahout.math.list.AbstractList}. The abstract
- * base classes provide skeleton implementations for all but few methods. Experimental
- * data layouts (such as compressed, sparse, linked, etc.) can easily be implemented
- * and inherit a rich set of functionality. Have a look at the javadoc <a href="package-tree.html">tree
- * view</a> to get the broad picture.</p>
- * <h2>4. Example usage</h2>
- * <p>The following snippet fills a list, randomizes it, extracts the first half
- * of the elements, sums them up and prints the result. It is implemented entirely
- * with accessor methods.</p>
- * <table>
- * <td class="PRE">
- * <pre>
- * int s = 1000000;<br>AbstractDoubleList list = new DoubleArrayList();
- * for (int i=0; i&lt;s; i++) { list.add((double)i); }
- * list.shuffle();
- * AbstractDoubleList part = list.partFromTo(0,list.size()/2 - 1);
- * double sum = 0.0;
- * for (int i=0; i&lt;part.size(); i++) { sum += part.get(i); }
- * log.info(sum);
- * </pre>
- * </td>
- * </table>
- * <p> For efficiency, all classes provide back doors to enable getting/setting the
- * backing array directly. In this way, the high level operations of these classes
- * can be used where appropriate, and one can switch to <tt>[]</tt>-array index
- * notations where necessary. The key methods for this are <tt>public &lt;ElementType&gt;[]
- * elements()</tt> and <tt>public void elements(&lt;ElementType&gt;[])</tt>. The
- * former trustingly returns the array it internally keeps to store the elements.
- * Holding this array in hand, we can use the <tt>[]</tt>-array operator to
- * perform iteration over large lists without needing to copy the array or paying
- * the performance penalty introduced by accessor methods. Alternatively any JAL
- * algorithm (or other algorithm) can operate on the returned primitive array.
- * The latter method forces a list to internally hold a user provided array. Using
- * this approach one can avoid needing to copy the elements into the list.
- * <p>As a consequence, operations on primitive arrays, Colt lists and JAL algorithms
- * can freely be mixed at zero-copy overhead.
- * <p> Note that such special treatment certainly breaks encapsulation. This functionality
- * is provided for performance reasons only and should only be used when absolutely
- * necessary. Here is the above example in mixed notation:
- * <table>
- * <td class="PRE">
- * <pre>
- * int s = 1000000;<br>DoubleArrayList list = new DoubleArrayList(s); // list.size()==0, capacity==s
- * list.setSize(s); // list.size()==s<br>double[] values = list.elements();
- * // zero copy, values.length==s<br>for (int i=0; i&lt;s; i++) { values[i]=(double)i; }
- * list.shuffle();
- * double sum = 0.0;
- * int limit = values.length/2;
- * for (int i=0; i&lt;limit; i++) { sum += values[i]; }
- * log.info(sum);
- * </pre>
- * </td>
- * </table>
- * <p> Or even more compact using lists as algorithm objects:
- * <table>
- * <td class="PRE">
- * <pre>
- * int s = 1000000;<br>double[] values = new double[s];
- * for (int i=0; i&lt;s; i++) { values[i]=(double)i; }
- * new DoubleArrayList(values).shuffle(); // zero-copy, shuffle via back door
- * double sum = 0.0;
- * int limit = values.length/2;
- * for (int i=0; i&lt;limit; i++) { sum += values[i]; }
- * log.info(sum);
- * </pre>
- * </td>
- * </table>
- * <p>
- * <h2>5. Notes </h2>
- * <p>The quicksorts and mergesorts are the JDK 1.2 V1.26 algorithms, modified as
- * necessary to operate on the given data types.
- * </BODY>
- * </HTML>
- */
-package org.apache.mahout.math.list;

http://git-wip-us.apache.org/repos/asf/mahout/blob/49ad8cb4/core/src/main/java/org/apache/mahout/math/map/HashFunctions.java
----------------------------------------------------------------------
diff --git a/core/src/main/java/org/apache/mahout/math/map/HashFunctions.java b/core/src/main/java/org/apache/mahout/math/map/HashFunctions.java
deleted file mode 100644
index b749307..0000000
--- a/core/src/main/java/org/apache/mahout/math/map/HashFunctions.java
+++ /dev/null
@@ -1,115 +0,0 @@
-/*
-Copyright 1999 CERN - European Organization for Nuclear Research.
-Permission to use, copy, modify, distribute and sell this software and its documentation for any purpose 
-is hereby granted without fee, provided that the above copyright notice appear in all copies and 
-that both that copyright notice and this permission notice appear in supporting documentation. 
-CERN makes no representations about the suitability of this software for any purpose. 
-It is provided "as is" without expressed or implied warranty.
-*/
-package org.apache.mahout.math.map;
-
-
-/**
- * Provides various hash functions.
- */
-public final class HashFunctions {
-
-  /**
-   * Utility class pattern: all static members, no inheritance.
-   */
-  private HashFunctions() {
-  }
-
-  /**
-   * Returns a hashcode for the specified value.
-   *
-   * @return a hash code value for the specified value.
-   */
-  public static int hash(char value) {
-    return value;
-  }
-
-  /**
-   * Returns a hashcode for the specified value.
-   *
-   * @return a hash code value for the specified value.
-   */
-  public static int hash(double value) {
-    long bits = Double.doubleToLongBits(value);
-    return (int) (bits ^ (bits >>> 32));
-
-    //return (int) Double.doubleToLongBits(value*663608941.737);
-    // this avoids excessive hashCollisions in the case values are of the form (1.0, 2.0, 3.0, ...)
-  }
-
-  /**
-   * Returns a hashcode for the specified value.
-   *
-   * @return a hash code value for the specified value.
-   */
-  public static int hash(float value) {
-    return Float.floatToIntBits(value * 663608941.737f);
-    // this avoids excessive hashCollisions in the case values are of the form (1.0, 2.0, 3.0, ...)
-  }
-
-  /**
-   * Returns a hashcode for the specified value.
-   * The hashcode computation is similar to the last step
-   * of MurMurHash3.
-   *
-   * @return a hash code value for the specified value.
-   */
-  public static int hash(int value) {
-    int h = value;
-    h ^= h >>> 16;
-    h *= 0x85ebca6b;
-    h ^= h >>> 13;
-    h *= 0xc2b2ae35;
-    h ^= h >>> 16;
-    return h;
-  }
-
-  /**
-   * Returns a hashcode for the specified value.
-   *
-   * @return a hash code value for the specified value.
-   */
-  public static int hash(long value) {
-    return (int) (value ^ (value >> 32));
-    /*
-    value &= 0x7FFFFFFFFFFFFFFFL; // make it >=0 (0x7FFFFFFFFFFFFFFFL==Long.MAX_VALUE)
-    int hashCode = 0;
-    do hashCode = 31*hashCode + (int) (value%10);
-    while ((value /= 10) > 0);
-
-    return 28629151*hashCode; // spread even further; h*31^5
-    */
-  }
-
-  /**
-   * Returns a hashcode for the specified object.
-   *
-   * @return a hash code value for the specified object.
-   */
-  public static int hash(Object object) {
-    return object == null ? 0 : object.hashCode();
-  }
-
-  /**
-   * Returns a hashcode for the specified value.
-   *
-   * @return a hash code value for the specified value.
-   */
-  public static int hash(short value) {
-    return value;
-  }
-
-  /**
-   * Returns a hashcode for the specified value.
-   *
-   * @return a hash code value for the specified value.
-   */
-  public static int hash(boolean value) {
-    return value ? 1231 : 1237;
-  }
-}

http://git-wip-us.apache.org/repos/asf/mahout/blob/49ad8cb4/core/src/main/java/org/apache/mahout/math/map/OpenHashMap.java
----------------------------------------------------------------------
diff --git a/core/src/main/java/org/apache/mahout/math/map/OpenHashMap.java b/core/src/main/java/org/apache/mahout/math/map/OpenHashMap.java
deleted file mode 100644
index 271abc1..0000000
--- a/core/src/main/java/org/apache/mahout/math/map/OpenHashMap.java
+++ /dev/null
@@ -1,654 +0,0 @@
-/**
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied. See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-
-/*
-Copyright ï¿½ 1999 CERN - European Organization for Nuclear Research.
-Permission to use, copy, modify, distribute and sell this software and its documentation for any purpose 
-is hereby granted without fee, provided that the above copyright notice appear in all copies and 
-that both that copyright notice and this permission notice appear in supporting documentation. 
-CERN makes no representations about the suitability of this software for any purpose. 
-It is provided "as is" without expressed or implied warranty.
-*/
-package org.apache.mahout.math.map;
-
-import java.util.ArrayList;
-import java.util.Arrays;
-import java.util.Collection;
-import java.util.List;
-import java.util.Map;
-import java.util.Set;
-// Error building Javadocs if java.util.Map.Entry not explictly included... (?)
-import java.util.Map.Entry;
-
-import org.apache.mahout.math.function.ObjectObjectProcedure;
-import org.apache.mahout.math.function.ObjectProcedure;
-import org.apache.mahout.math.set.AbstractSet;
-import org.apache.mahout.math.set.OpenHashSet;
-
-/**
-  * Open hash map. This implements Map, but it does not respect several aspects of the Map contract
-  * that impose the very sorts of performance penalities that this class exists to avoid.
-  * {@link #entrySet}, {@link #values}, and {@link #keySet()} do <strong>not</strong> return
-  * collections that share storage with the main map, and changes to those returned objects
-  * are <strong>not</strong> reflected in the container.
- **/
-public class OpenHashMap<K,V> extends AbstractSet implements Map<K,V> {
-  protected static final byte FREE = 0;
-  protected static final byte FULL = 1;
-  protected static final byte REMOVED = 2;
-  protected static final Object NO_KEY_VALUE = null;
-
-  /** The hash table keys. */
-  protected Object[] table;
-
-  /** The hash table values. */
-  protected Object[] values;
-
-  /** The state of each hash table entry (FREE, FULL, REMOVED). */
-  protected byte[] state;
-
-  /** The number of table entries in state==FREE. */
-  protected int freeEntries;
-
-
-  /** Constructs an empty map with default capacity and default load factors. */
-  public OpenHashMap() {
-    this(DEFAULT_CAPACITY);
-  }
-
-  /**
-   * Constructs an empty map with the specified initial capacity and default load factors.
-   *
-   * @param initialCapacity the initial capacity of the map.
-   * @throws IllegalArgumentException if the initial capacity is less than zero.
-   */
-  public OpenHashMap(int initialCapacity) {
-    this(initialCapacity, DEFAULT_MIN_LOAD_FACTOR, DEFAULT_MAX_LOAD_FACTOR);
-  }
-
-  /**
-   * Constructs an empty map with the specified initial capacity and the specified minimum and maximum load factor.
-   *
-   * @param initialCapacity the initial capacity.
-   * @param minLoadFactor   the minimum load factor.
-   * @param maxLoadFactor   the maximum load factor.
-   * @throws IllegalArgumentException if <tt>initialCapacity < 0 || (minLoadFactor < 0.0 || minLoadFactor >= 1.0) ||
-   *                                  (maxLoadFactor <= 0.0 || maxLoadFactor >= 1.0) || (minLoadFactor >=
-   *                                  maxLoadFactor)</tt>.
-   */
-  public OpenHashMap(int initialCapacity, double minLoadFactor, double maxLoadFactor) {
-    setUp(initialCapacity, minLoadFactor, maxLoadFactor);
-  }
-
-  /** Removes all (key,value) associations from the receiver. Implicitly calls <tt>trimToSize()</tt>. */
-  @Override
-  public void clear() {
-    Arrays.fill(this.state, FREE);
-    distinct = 0;
-    freeEntries = table.length; // delta
-    trimToSize();
-  }
-
-  /**
-   * Returns a deep copy of the receiver.
-   *
-   * @return a deep copy of the receiver.
-   */
-  @Override
-  @SuppressWarnings("unchecked")
-  public Object clone() {
-    OpenHashMap<K,V> copy = (OpenHashMap<K,V>) super.clone();
-    copy.table = copy.table.clone();
-    copy.values = copy.values.clone();
-    copy.state = copy.state.clone();
-    return copy;
-  }
-
-  /**
-   * Returns <tt>true</tt> if the receiver contains the specified key.
-   *
-   * @return <tt>true</tt> if the receiver contains the specified key.
-   */
-  @SuppressWarnings("unchecked")
-  @Override
-  public boolean containsKey(Object key) {
-    return indexOfKey((K)key) >= 0;
-  }
-
-  /**
-   * Returns <tt>true</tt> if the receiver contains the specified value.
-   *
-   * @return <tt>true</tt> if the receiver contains the specified value.
-   */
-  @SuppressWarnings("unchecked")
-  @Override
-  public boolean containsValue(Object value) {
-    return indexOfValue((V)value) >= 0;
-  }
-
-  /**
-   * Ensures that the receiver can hold at least the specified number of associations without needing to allocate new
-   * internal memory. If necessary, allocates new internal memory and increases the capacity of the receiver. <p> This
-   * method never need be called; it is for performance tuning only. Calling this method before <tt>put()</tt>ing a
-   * large number of associations boosts performance, because the receiver will grow only once instead of potentially
-   * many times and hash collisions get less probable.
-   *
-   * @param minCapacity the desired minimum capacity.
-   */
-  @Override
-  public void ensureCapacity(int minCapacity) {
-    if (table.length < minCapacity) {
-      int newCapacity = nextPrime(minCapacity);
-      rehash(newCapacity);
-    }
-  }
-
-  /**
-   * Applies a procedure to each key of the receiver, if any. Note: Iterates over the keys in no particular order.
-   * Subclasses can define a particular order, for example, "sorted by key". All methods which <i>can</i> be expressed
-   * in terms of this method (most methods can) <i>must guarantee</i> to use the <i>same</i> order defined by this
-   * method, even if it is no particular order. This is necessary so that, for example, methods <tt>keys</tt> and
-   * <tt>values</tt> will yield association pairs, not two uncorrelated lists.
-   *
-   * @param procedure the procedure to be applied. Stops iteration if the procedure returns <tt>false</tt>, otherwise
-   *                  continues.
-   * @return <tt>false</tt> if the procedure stopped before all keys where iterated over, <tt>true</tt> otherwise.
-   */
-  @SuppressWarnings("unchecked")
-  public boolean forEachKey(ObjectProcedure<K> procedure) {
-    for (int i = table.length; i-- > 0;) {
-      if (state[i] == FULL && !procedure.apply((K)table[i])) {
-        return false;
-      }
-    }
-    return true;
-  }
-
-  /**
-   * Applies a procedure to each (key,value) pair of the receiver, if any. Iteration order is guaranteed to be
-   * <i>identical</i> to the order used by method {@link #forEachKey(ObjectProcedure)}.
-   *
-   * @param procedure the procedure to be applied. Stops iteration if the procedure returns <tt>false</tt>, otherwise
-   *                  continues.
-   * @return <tt>false</tt> if the procedure stopped before all keys where iterated over, <tt>true</tt> otherwise.
-   */
-  @SuppressWarnings("unchecked")
-  public boolean forEachPair(ObjectObjectProcedure<K,V> procedure) {
-    for (int i = table.length; i-- > 0;) {
-      if (state[i] == FULL && !procedure.apply((K)table[i], (V)values[i])) {
-        return false;
-      }
-    }
-    return true;
-  }
-
-  /**
-   * Returns the value associated with the specified key. It is often a good idea to first check with {@link
-   * #containsKey(Object)} whether the given key has a value associated or not, i.e. whether there exists an association
-   * for the given key or not.
-   *
-   * @param key the key to be searched for.
-   * @return the value associated with the specified key; <tt>0</tt> if no such key is present.
-   */
-  @SuppressWarnings("unchecked")
-  @Override
-  public V get(Object key) {
-    int i = indexOfKey((K)key);
-    if (i < 0) {
-      return null;
-    } //not contained
-    return (V)values[i];
-  }
-
-  /**
-   * @param key the key to be added to the receiver.
-   * @return the index where the key would need to be inserted, if it is not already contained. Returns -index-1 if the
-   *         key is already contained at slot index. Therefore, if the returned index < 0, then it is already contained
-   *         at slot -index-1. If the returned index >= 0, then it is NOT already contained and should be inserted at
-   *         slot index.
-   */
-  protected int indexOfInsertion(K key) {
-    Object[] tab = table;
-    byte[] stat = state;
-    int length = tab.length;
-
-    int hash = key.hashCode() & 0x7FFFFFFF;
-    int i = hash % length;
-    int decrement = hash % (length - 2); // double hashing, see http://www.eece.unm.edu/faculty/heileman/hash/node4.html
-    //int decrement = (hash / length) % length;
-    if (decrement == 0) {
-      decrement = 1;
-    }
-
-    // stop if we find a removed or free slot, or if we find the key itself
-    // do NOT skip over removed slots (yes, open addressing is like that...)
-    while (stat[i] == FULL && !equalsMindTheNull(key, tab[i])) {
-      i -= decrement;
-      //hashCollisions++;
-      if (i < 0) {
-        i += length;
-      }
-    }
-
-    if (stat[i] == REMOVED) {
-      // stop if we find a free slot, or if we find the key itself.
-      // do skip over removed slots (yes, open addressing is like that...)
-      // assertion: there is at least one FREE slot.
-      int j = i;
-      while (stat[i] != FREE && (stat[i] == REMOVED || tab[i] != key)) {
-        i -= decrement;
-        //hashCollisions++;
-        if (i < 0) {
-          i += length;
-        }
-      }
-      if (stat[i] == FREE) {
-        i = j;
-      }
-    }
-
-
-    if (stat[i] == FULL) {
-      // key already contained at slot i.
-      // return a negative number identifying the slot.
-      return -i - 1;
-    }
-    // not already contained, should be inserted at slot i.
-    // return a number >= 0 identifying the slot.
-    return i;
-  }
-
-  /**
-   * @param key the key to be searched in the receiver.
-   * @return the index where the key is contained in the receiver, returns -1 if the key was not found.
-   */
-  protected int indexOfKey(K key) {
-    Object[] tab = table;
-    byte[] stat = state;
-    int length = tab.length;
-
-    int hash = key.hashCode() & 0x7FFFFFFF;
-    int i = hash % length;
-    int decrement = hash % (length - 2); // double hashing, see http://www.eece.unm.edu/faculty/heileman/hash/node4.html
-    //int decrement = (hash / length) % length;
-    if (decrement == 0) {
-      decrement = 1;
-    }
-
-    // stop if we find a free slot, or if we find the key itself.
-    // do skip over removed slots (yes, open addressing is like that...)
-    while (stat[i] != FREE && (stat[i] == REMOVED || !equalsMindTheNull(key, tab[i]))) {
-      i -= decrement;
-      //hashCollisions++;
-      if (i < 0) {
-        i += length;
-      }
-    }
-
-    if (stat[i] == FREE) {
-      return -1;
-    } // not found
-    return i; //found, return index where key is contained
-  }
-
-  /**
-   * @param value the value to be searched in the receiver.
-   * @return the index where the value is contained in the receiver, returns -1 if the value was not found.
-   */
-  protected int indexOfValue(V value) {
-    Object[] val = values;
-    byte[] stat = state;
-
-    for (int i = stat.length; --i >= 0;) {
-      if (stat[i] == FULL && equalsMindTheNull(val[i], value)) {
-        return i;
-      }
-    }
-
-    return -1; // not found
-  }
-
-  /**
-   * Fills all keys contained in the receiver into the specified list. Fills the list, starting at index 0. After this
-   * call returns the specified list has a new size that equals <tt>this.size()</tt>. 
-   * This method can be used
-   * to iterate over the keys of the receiver.
-   *
-   * @param list the list to be filled, can have any size.
-   */
-  @SuppressWarnings("unchecked")
-  public void keys(List<K> list) {
-    list.clear();
-  
-
-    Object [] tab = table;
-    byte[] stat = state;
-
-    for (int i = tab.length; i-- > 0;) {
-      if (stat[i] == FULL) {
-        list.add((K)tab[i]);
-      }
-    }
-  }
-
-  /**
-   * Associates the given key with the given value. Replaces any old <tt>(key,someOtherValue)</tt> association, if
-   * existing.
-   *
-   * @param key   the key the value shall be associated with.
-   * @param value the value to be associated.
-   * @return <tt>true</tt> if the receiver did not already contain such a key; <tt>false</tt> if the receiver did
-   *         already contain such a key - the new value has now replaced the formerly associated value.
-   */
-  @SuppressWarnings("unchecked")
-  @Override
-  public V put(K key, V value) {
-    int i = indexOfInsertion(key);
-    if (i < 0) { //already contained
-      i = -i - 1;
-      V previous = (V) this.values[i];
-      this.values[i] = value;
-      return previous;
-    }
-
-    if (this.distinct > this.highWaterMark) {
-      int newCapacity = chooseGrowCapacity(this.distinct + 1, this.minLoadFactor, this.maxLoadFactor);
-      rehash(newCapacity);
-      return put(key, value);
-    }
-
-    this.table[i] = key;
-    this.values[i] = value;
-    if (this.state[i] == FREE) {
-      this.freeEntries--;
-    }
-    this.state[i] = FULL;
-    this.distinct++;
-
-    if (this.freeEntries < 1) { //delta
-      int newCapacity = chooseGrowCapacity(this.distinct + 1, this.minLoadFactor, this.maxLoadFactor);
-      rehash(newCapacity);
-    }
-
-    return null;
-  }
-
-  /**
-   * Rehashes the contents of the receiver into a new table with a smaller or larger capacity. This method is called
-   * automatically when the number of keys in the receiver exceeds the high water mark or falls below the low water
-   * mark.
-   */
-  @SuppressWarnings("unchecked")
-  protected void rehash(int newCapacity) {
-    int oldCapacity = table.length;
-    //if (oldCapacity == newCapacity) return;
-
-    Object[] oldTable = table;
-    Object[] oldValues = values;
-    byte[] oldState = state;
-
-    Object[] newTable = new Object[newCapacity];
-    Object[] newValues = new Object[newCapacity];
-    byte[] newState = new byte[newCapacity];
-
-    this.lowWaterMark = chooseLowWaterMark(newCapacity, this.minLoadFactor);
-    this.highWaterMark = chooseHighWaterMark(newCapacity, this.maxLoadFactor);
-
-    this.table = newTable;
-    this.values = newValues;
-    this.state = newState;
-    this.freeEntries = newCapacity - this.distinct; // delta
-
-    for (int i = oldCapacity; i-- > 0;) {
-      if (oldState[i] == FULL) {
-        Object element = oldTable[i];
-        int index = indexOfInsertion((K)element);
-        newTable[index] = element;
-        newValues[index] = oldValues[i];
-        newState[index] = FULL;
-      }
-    }
-  }
-
-  /**
-   * Removes the given key with its associated element from the receiver, if present.
-   *
-   * @param key the key to be removed from the receiver.
-   * @return <tt>true</tt> if the receiver contained the specified key, <tt>false</tt> otherwise.
-   */
-  @SuppressWarnings("unchecked")
-  @Override
-  public V remove(Object key) {
-    int i = indexOfKey((K)key);
-    if (i < 0) {
-      return null;
-    }
-    // key not contained
-    V removed = (V) values[i];
-
-    this.state[i] = REMOVED;
-    //this.values[i]=0; // delta
-    this.distinct--;
-
-    if (this.distinct < this.lowWaterMark) {
-      int newCapacity = chooseShrinkCapacity(this.distinct, this.minLoadFactor, this.maxLoadFactor);
-      rehash(newCapacity);
-    }
-
-    return removed;
-  }
-
-  /**
-   * Initializes the receiver.
-   *
-   * @param initialCapacity the initial capacity of the receiver.
-   * @param minLoadFactor   the minLoadFactor of the receiver.
-   * @param maxLoadFactor   the maxLoadFactor of the receiver.
-   * @throws IllegalArgumentException if <tt>initialCapacity < 0 || (minLoadFactor < 0.0 || minLoadFactor >= 1.0) ||
-   *                                  (maxLoadFactor <= 0.0 || maxLoadFactor >= 1.0) || (minLoadFactor >=
-   *                                  maxLoadFactor)</tt>.
-   */
-  @Override
-  protected void setUp(int initialCapacity, double minLoadFactor, double maxLoadFactor) {
-    int capacity = initialCapacity;
-    super.setUp(capacity, minLoadFactor, maxLoadFactor);
-    capacity = nextPrime(capacity);
-    if (capacity == 0) {
-      capacity = 1;
-    } // open addressing needs at least one FREE slot at any time.
-
-    this.table = new Object[capacity];
-    this.values = new Object[capacity];
-    this.state = new byte[capacity];
-
-    // memory will be exhausted long before this pathological case happens, anyway.
-    this.minLoadFactor = minLoadFactor;
-    if (capacity == PrimeFinder.LARGEST_PRIME) {
-      this.maxLoadFactor = 1.0;
-    } else {
-      this.maxLoadFactor = maxLoadFactor;
-    }
-
-    this.distinct = 0;
-    this.freeEntries = capacity; // delta
-
-    // lowWaterMark will be established upon first expansion.
-    // establishing it now (upon instance construction) would immediately make the table shrink upon first put(...).
-    // After all the idea of an "initialCapacity" implies violating lowWaterMarks when an object is young.
-    // See ensureCapacity(...)
-    this.lowWaterMark = 0;
-    this.highWaterMark = chooseHighWaterMark(capacity, this.maxLoadFactor);
-  }
-
-  /**
-   * Trims the capacity of the receiver to be the receiver's current size. Releases any superfluous internal memory. An
-   * application can use this operation to minimize the storage of the receiver.
-   */
-  @Override
-  public void trimToSize() {
-    // * 1.2 because open addressing's performance exponentially degrades beyond that point
-    // so that even rehashing the table can take very long
-    int newCapacity = nextPrime((int) (1 + 1.2 * size()));
-    if (table.length > newCapacity) {
-      rehash(newCapacity);
-    }
-  }
-
-  /**
-   * Access for unit tests.
-   * @param capacity
-   * @param minLoadFactor
-   * @param maxLoadFactor
-   */
-  void getInternalFactors(int[] capacity, 
-      double[] minLoadFactor, 
-      double[] maxLoadFactor) {
-    capacity[0] = table.length;
-    minLoadFactor[0] = this.minLoadFactor;
-    maxLoadFactor[0] = this.maxLoadFactor;
-  }
-
-  private class MapEntry implements Entry<K,V> {
-    private final K key;
-    private final V value;
-    
-    MapEntry(K key, V value) {
-      this.key = key;
-      this.value = value;
-    }
-
-    @Override
-    public K getKey() {
-      return key;
-    }
-
-    @Override
-    public V getValue() {
-      return value;
-    }
-
-    @Override
-    public V setValue(V value) {
-      throw new UnsupportedOperationException("Map.Entry.setValue not supported for OpenHashMap");
-    }
-    
-  }
-
-  /**
-   * Allocate a set to contain Map.Entry objects for the pairs and return it.
-   */
-  @Override
-  public Set<Entry<K,V>> entrySet() {
-    final Set<Entry<K, V>> entries = new OpenHashSet<>();
-    forEachPair(new ObjectObjectProcedure<K,V>() {
-      @Override
-      public boolean apply(K key, V value) {
-        entries.add(new MapEntry(key, value));
-        return true;
-      }
-    });
-    return entries;
-  }
-
-  /**
-   * Allocate a set to contain keys and return it.
-   * This violates the 'backing' provisions of the map interface.
-   */
-  @Override
-  public Set<K> keySet() {
-    final Set<K> keys = new OpenHashSet<>();
-    forEachKey(new ObjectProcedure<K>() {
-      @Override
-      public boolean apply(K element) {
-        keys.add(element);
-        return true;
-      }
-    });
-    return keys;
-  }
-
-  @Override
-  public void putAll(Map<? extends K,? extends V> m) {
-    for (Entry<? extends K, ? extends V> e : m.entrySet()) {
-      put(e.getKey(), e.getValue());
-    }
-  }
-
-  /**
-   * Allocate a list to contain the values and return it.
-   * This violates the 'backing' provision of the Map interface.
-   */
-  @Override
-  public Collection<V> values() {
-    final List<V> valueList = new ArrayList<>();
-    forEachPair(new ObjectObjectProcedure<K,V>() {
-      @Override
-      public boolean apply(K key, V value) {
-        valueList.add(value);
-        return true;
-      }
-    });
-    return valueList;
-  }
-
-  @SuppressWarnings("unchecked")
-  @Override
-  public boolean equals(Object obj) {
-    if (!(obj instanceof OpenHashMap)) {
-      return false;
-    }
-    final OpenHashMap<K,V> o = (OpenHashMap<K,V>) obj;
-    if (o.size() != size()) {
-      return false;
-    }
-    final boolean[] equal = new boolean[1];
-    equal[0] = true;
-    forEachPair(new ObjectObjectProcedure<K,V>() {
-      @Override
-      public boolean apply(K key, V value) {
-        Object ov = o.get(key);
-        if (!value.equals(ov)) {
-          equal[0] = false;
-          return false;
-        }
-        return true;
-      }
-    });
-    return equal[0];
-  }
-
-  @Override
-  public String toString() {
-    final StringBuilder sb = new StringBuilder();
-    sb.append('{');
-    forEachPair(new ObjectObjectProcedure<K,V>() {
-      @Override
-      public boolean apply(K key, V value) {
-        sb.append('[');
-        sb.append(key);
-        sb.append(" -> ");
-        sb.append(value);
-        sb.append("] ");
-        return true;
-      }
-    });
-    sb.append('}');
-    return sb.toString();
-  }
-}

http://git-wip-us.apache.org/repos/asf/mahout/blob/49ad8cb4/core/src/main/java/org/apache/mahout/math/map/PrimeFinder.java
----------------------------------------------------------------------
diff --git a/core/src/main/java/org/apache/mahout/math/map/PrimeFinder.java b/core/src/main/java/org/apache/mahout/math/map/PrimeFinder.java
deleted file mode 100644
index 8f32e86..0000000
--- a/core/src/main/java/org/apache/mahout/math/map/PrimeFinder.java
+++ /dev/null
@@ -1,145 +0,0 @@
-/**
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied. See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-
-package org.apache.mahout.math.map;
-
-import java.util.Arrays;
-
-/**
- * Not of interest for users; only for implementors of hashtables.
- * Used to keep hash table capacities prime numbers.
- *
- * <p>Choosing prime numbers as hash table capacities is a good idea to keep them working fast,
- * particularly under hash table expansions.
- *
- * <p>However, JDK 1.2, JGL 3.1 and many other toolkits do nothing to keep capacities prime.
- * This class provides efficient means to choose prime capacities.
- *
- * <p>Choosing a prime is <tt>O(log 300)</tt> (binary search in a list of 300 int's).
- * Memory requirements: 1 KB static memory.
- *
- */
-public final class PrimeFinder {
-
-  /** The largest prime this class can generate; currently equal to <tt>Integer.MAX_VALUE</tt>. */
-  public static final int LARGEST_PRIME = Integer.MAX_VALUE; //yes, it is prime.
-
-  /**
-   * The prime number list consists of 11 chunks. Each chunk contains prime numbers. A chunk starts with a prime P1. The
-   * next element is a prime P2. P2 is the smallest prime for which holds: P2 >= 2*P1. The next element is P3, for which
-   * the same holds with respect to P2, and so on.
-   *
-   * Chunks are chosen such that for any desired capacity >= 1000 the list includes a prime number <= desired capacity *
-   * 1.11 (11%). For any desired capacity >= 200 the list includes a prime number <= desired capacity * 1.16 (16%). For
-   * any desired capacity >= 16 the list includes a prime number <= desired capacity * 1.21 (21%).
-   *
-   * Therefore, primes can be retrieved which are quite close to any desired capacity, which in turn avoids wasting
-   * memory. For example, the list includes 1039,1117,1201,1277,1361,1439,1523,1597,1759,1907,2081. So if you need a
-   * prime >= 1040, you will find a prime <= 1040*1.11=1154.
-   *
-   * Chunks are chosen such that they are optimized for a hashtable growthfactor of 2.0; If your hashtable has such a
-   * growthfactor then, after initially "rounding to a prime" upon hashtable construction, it will later expand to prime
-   * capacities such that there exist no better primes.
-   *
-   * In total these are about 32*10=320 numbers -> 1 KB of static memory needed. If you are stingy, then delete every
-   * second or fourth chunk.
-   */
-
-  private static final int[] PRIME_CAPACITIES = {
-    //chunk #0
-    LARGEST_PRIME,
-
-    //chunk #1
-    5, 11, 23, 47, 97, 197, 397, 797, 1597, 3203, 6421, 12853, 25717, 51437, 102877, 205759,
-    411527, 823117, 1646237, 3292489, 6584983, 13169977, 26339969, 52679969, 105359939,
-    210719881, 421439783, 842879579, 1685759167,
-
-    //chunk #2
-    433, 877, 1759, 3527, 7057, 14143, 28289, 56591, 113189, 226379, 452759, 905551, 1811107,
-    3622219, 7244441, 14488931, 28977863, 57955739, 115911563, 231823147, 463646329, 927292699,
-    1854585413,
-
-    //chunk #3
-    953, 1907, 3821, 7643, 15287, 30577, 61169, 122347, 244703, 489407, 978821, 1957651, 3915341,
-    7830701, 15661423, 31322867, 62645741, 125291483, 250582987, 501165979, 1002331963,
-    2004663929,
-
-    //chunk #4
-    1039, 2081, 4177, 8363, 16729, 33461, 66923, 133853, 267713, 535481, 1070981, 2141977, 4283963,
-    8567929, 17135863, 34271747, 68543509, 137087021, 274174111, 548348231, 1096696463,
-
-    //chunk #5
-    31, 67, 137, 277, 557, 1117, 2237, 4481, 8963, 17929, 35863, 71741, 143483, 286973, 573953,
-    1147921, 2295859, 4591721, 9183457, 18366923, 36733847, 73467739, 146935499, 293871013,
-    587742049, 1175484103,
-
-    //chunk #6
-    599, 1201, 2411, 4831, 9677, 19373, 38747, 77509, 155027, 310081, 620171, 1240361, 2480729,
-    4961459, 9922933, 19845871, 39691759, 79383533, 158767069, 317534141, 635068283, 1270136683,
-
-    //chunk #7
-    311, 631, 1277, 2557, 5119, 10243, 20507, 41017, 82037, 164089, 328213, 656429, 1312867,
-    2625761, 5251529, 10503061, 21006137, 42012281, 84024581, 168049163, 336098327, 672196673,
-    1344393353,
-
-    //chunk #8
-    3, 7, 17, 37, 79, 163, 331, 673, 1361, 2729, 5471, 10949, 21911, 43853, 87719, 175447, 350899,
-    701819, 1403641, 2807303, 5614657, 11229331, 22458671, 44917381, 89834777, 179669557,
-    359339171, 718678369, 1437356741,
-
-    //chunk #9
-    43, 89, 179, 359, 719, 1439, 2879, 5779, 11579, 23159, 46327, 92657, 185323, 370661, 741337,
-    1482707, 2965421, 5930887, 11861791, 23723597, 47447201, 94894427, 189788857, 379577741,
-    759155483, 1518310967,
-
-    //chunk #10
-    379, 761, 1523, 3049, 6101, 12203, 24407, 48817, 97649, 195311, 390647, 781301, 1562611,
-    3125257, 6250537, 12501169, 25002389, 50004791, 100009607, 200019221, 400038451, 800076929,
-    1600153859
-  };
-
-
-  static { //initializer
-    // The above prime numbers are formatted for human readability.
-    // To find numbers fast, we sort them once and for all.
-
-    Arrays.sort(PRIME_CAPACITIES);
-  }
-
-  /** Makes this class non instantiable, but still let's others inherit from it. */
-  private PrimeFinder() {
-  }
-
-  /**
-   * Returns a prime number which is {@code <= desiredCapacity} and very close to {@code desiredCapacity}
-   * (within 11% if {@code desiredCapacity <= 1000}).
-   *
-   * @param desiredCapacity the capacity desired by the user.
-   * @return the capacity which should be used for a hashtable.
-   */
-  public static int nextPrime(int desiredCapacity) {
-    int i = Arrays.binarySearch(PRIME_CAPACITIES, desiredCapacity);
-    if (i < 0) {
-      // desired capacity not found, choose next prime greater than desired capacity
-      i = -i - 1; // remember the semantics of binarySearch...
-    }
-    return PRIME_CAPACITIES[i];
-  }
-
-}

http://git-wip-us.apache.org/repos/asf/mahout/blob/49ad8cb4/core/src/main/java/org/apache/mahout/math/map/QuickOpenIntIntHashMap.java
----------------------------------------------------------------------
diff --git a/core/src/main/java/org/apache/mahout/math/map/QuickOpenIntIntHashMap.java b/core/src/main/java/org/apache/mahout/math/map/QuickOpenIntIntHashMap.java
deleted file mode 100644
index a5a54af..0000000
--- a/core/src/main/java/org/apache/mahout/math/map/QuickOpenIntIntHashMap.java
+++ /dev/null
@@ -1,215 +0,0 @@
-///*
-//Copyright � 1999 CERN - European Organization for Nuclear Research.
-//Permission to use, copy, modify, distribute and sell this software and its documentation for any purpose
-//is hereby granted without fee, provided that the above copyright notice appear in all copies and
-//that both that copyright notice and this permission notice appear in supporting documentation.
-//CERN makes no representations about the suitability of this software for any purpose.
-//It is provided "as is" without expressed or implied warranty.
-//*/
-//package org.apache.mahout.math.map;
-//
-///**
-// * Status: Experimental; Do not use for production yet. Hash map holding (key,value) associations of type
-// * <tt>(int-->int)</tt>; Automatically grows and shrinks as needed; Implemented using open addressing with double
-// * hashing. First see the <a href="package-summary.html">package summary</a> and javadoc <a
-// * href="package-tree.html">tree view</a> to get the broad picture.
-// *
-// * Implements open addressing with double hashing, using "Brent's variation". Brent's variation slows insertions a bit
-// * down (not much) but reduces probes (collisions) for successful searches, in particular for large load factors. (It
-// * does not improve unsuccessful searches.) See D. Knuth, Searching and Sorting, 3rd ed., p.533-545
-// *
-// * @author wolfgang.hoschek@cern.ch
-// * @version 1.0, 09/24/99
-// * @see java.util.HashMap
-// */
-//class QuickOpenIntIntHashMap extends OpenIntIntHashMap {
-//  //public int totalProbesSaved = 0; // benchmark only
-//
-//  /** Constructs an empty map with default capacity and default load factors. */
-//  QuickOpenIntIntHashMap() {
-//    this(DEFAULT_CAPACITY);
-//  }
-//
-//  /**
-//   * Constructs an empty map with the specified initial capacity and default load factors.
-//   *
-//   * @param initialCapacity the initial capacity of the map.
-//   * @throws IllegalArgumentException if the initial capacity is less than zero.
-//   */
-//  QuickOpenIntIntHashMap(int initialCapacity) {
-//    this(initialCapacity, DEFAULT_MIN_LOAD_FACTOR, DEFAULT_MAX_LOAD_FACTOR);
-//  }
-//
-//  /**
-//   * Constructs an empty map with the specified initial capacity and the specified minimum and maximum load factor.
-//   *
-//   * @param initialCapacity the initial capacity.
-//   * @param minLoadFactor   the minimum load factor.
-//   * @param maxLoadFactor   the maximum load factor.
-//   * @throws IllegalArgumentException if <tt>initialCapacity < 0 || (minLoadFactor < 0.0 || minLoadFactor >= 1.0) ||
-//   *                                  (maxLoadFactor <= 0.0 || maxLoadFactor >= 1.0) || (minLoadFactor >=
-//   *                                  maxLoadFactor)</tt>.
-//   */
-//  QuickOpenIntIntHashMap(int initialCapacity, double minLoadFactor, double maxLoadFactor) {
-//    setUp(initialCapacity, minLoadFactor, maxLoadFactor);
-//  }
-//
-//  /**
-//   * Associates the given key with the given value. Replaces any old <tt>(key,someOtherValue)</tt> association, if
-//   * existing.
-//   *
-//   * @param key   the key the value shall be associated with.
-//   * @param value the value to be associated.
-//   * @return <tt>true</tt> if the receiver did not already contain such a key; <tt>false</tt> if the receiver did
-//   *         already contain such a key - the new value has now replaced the formerly associated value.
-//   */
-//  @Override
-//  public boolean put(int key, int value) {
-//    /*
-//       This is open addressing with double hashing, using "Brent's variation".
-//       Brent's variation slows insertions a bit down (not much) but reduces probes (collisions) for successful searches,
-//       in particular for large load factors.
-//       (It does not improve unsuccessful searches.)
-//       See D. Knuth, Searching and Sorting, 3rd ed., p.533-545
-//
-//       h1(key) = hash % M
-//       h2(key) = decrement = Max(1, hash/M % M)
-//       M is prime = capacity = table.length
-//       probing positions are table[(h1-j*h2) % M] for j=0,1,...
-//       (M and h2 could also be chosen differently, but h2 is required to be relative prime to M.)
-//    */
-//
-//    int[] tab = table;
-//    byte[] stat = state;
-//    int length = tab.length;
-//
-//    int hash = HashFunctions.hash(key) & 0x7FFFFFFF;
-//    int i = hash % length;
-//    int decrement = (hash / length) % length;
-//    if (decrement == 0) {
-//      decrement = 1;
-//    }
-//
-//    // stop if we find a removed or free slot, or if we find the key itself
-//    // do NOT skip over removed slots (yes, open addressing is like that...)
-//    //int comp = comparisons;
-//    int t = 0;  // the number of probes
-//    int p0 = i; // the first position to probe
-//    while (stat[i] == FULL && tab[i] != key) {
-//      t++;
-//      i -= decrement;
-//      //hashCollisions++;
-//      if (i < 0) {
-//        i += length;
-//      }
-//    }
-//    if (stat[i] == FULL) {
-//      // key already contained at slot i.
-//      this.values[i] = value;
-//      return false;
-//    }
-//    // not already contained, should be inserted at slot i.
-//
-//    if (this.distinct > this.highWaterMark) {
-//      int newCapacity = chooseGrowCapacity(this.distinct + 1, this.minLoadFactor, this.maxLoadFactor);
-//      rehash(newCapacity);
-//      return put(key, value);
-//    }
-//
-//    /*
-//    Brent's variation does a local reorganization to reduce probes. It essentially means:
-//    We test whether it is possible to move the association we probed first (table[p0]) out of the way.
-//    If this is possible, it will reduce probes for the key to be inserted, since it takes its place;
-//    it gets hit earlier.
-//    However, future probes for the key that we move out of the way will increase.
-//    Thus we only move it out of the way, if we have a net gain, that is, if we save more probes than we loose.
-//    For the first probe we safe more than we loose if the number of probes we needed was >=2 (t>=2).
-//    If the first probe cannot be moved out of the way, we try the next probe (p1).
-//    Now we safe more than we loose if t>=3.
-//    We repeat this until we find that we cannot gain or that we can indeed move p(x) out of the way.
-//
-//    Note: Under the great majority of insertions t<=1, so the loop is entered very infrequently.
-//    */
-//    while (t > 1) {
-//      int key0 = tab[p0];
-//      hash = HashFunctions.hash(key0) & 0x7FFFFFFF;
-//      decrement = (hash / length) % length;
-//      if (decrement == 0) {
-//        decrement = 1;
-//      }
-//      int pc = p0 - decrement; // pc = (p0-j*decrement) % M, j=1,2,..
-//      if (pc < 0) {
-//        pc += length;
-//      }
-//
-//      if (stat[pc] != FREE) { // not a free slot, continue searching for free slot to move to, or break.
-//        p0 = pc;
-//        t--;
-//      } else { // free or removed slot found, now move...
-//        tab[pc] = key0;
-//        stat[pc] = FULL;
-//        values[pc] = values[p0];
-//        i = p0; // prepare to insert: table[p0]=key
-//        t = 0; // break loop
-//      }
-//    }
-//
-//    this.table[i] = key;
-//    this.values[i] = value;
-//    if (this.state[i] == FREE) {
-//      this.freeEntries--;
-//    }
-//    this.state[i] = FULL;
-//    this.distinct++;
-//
-//    if (this.freeEntries < 1) { //delta
-//      int newCapacity = chooseGrowCapacity(this.distinct + 1, this.minLoadFactor, this.maxLoadFactor);
-//      rehash(newCapacity);
-//    }
-//
-//    return true;
-//  }
-//
-//  /**
-//   * Rehashes the contents of the receiver into a new table with a smaller or larger capacity. This method is called
-//   * automatically when the number of keys in the receiver exceeds the high water mark or falls below the low water
-//   * mark.
-//   */
-//  @Override
-//  protected void rehash(int newCapacity) {
-//    int oldCapacity = table.length;
-//    //if (oldCapacity == newCapacity) return;
-//
-//    int[] oldTable = table;
-//    int[] oldValues = values;
-//    byte[] oldState = state;
-//
-//    int[] newTable = new int[newCapacity];
-//    int[] newValues = new int[newCapacity];
-//    byte[] newState = new byte[newCapacity];
-//
-//    this.lowWaterMark = chooseLowWaterMark(newCapacity, this.minLoadFactor);
-//    this.highWaterMark = chooseHighWaterMark(newCapacity, this.maxLoadFactor);
-//
-//    this.table = newTable;
-//    this.values = newValues;
-//    this.state = newState;
-//    this.freeEntries = newCapacity - this.distinct; // delta
-//
-//    int tmp = this.distinct;
-//    this.distinct = Integer.MIN_VALUE; // switch of watermarks
-//    for (int i = oldCapacity; i-- > 0;) {
-//      if (oldState[i] == FULL) {
-//        put(oldTable[i], oldValues[i]);
-//        /*
-//        int element = oldTable[i];
-//        int index = indexOfInsertion(element);
-//        newTable[index]=element;
-//        newValues[index]=oldValues[i];
-//        newState[index]=FULL;
-//        */
-//      }
-//    }
-//    this.distinct = tmp;
-//  }
-//}

http://git-wip-us.apache.org/repos/asf/mahout/blob/49ad8cb4/core/src/main/java/org/apache/mahout/math/map/package-info.java
----------------------------------------------------------------------
diff --git a/core/src/main/java/org/apache/mahout/math/map/package-info.java b/core/src/main/java/org/apache/mahout/math/map/package-info.java
deleted file mode 100644
index 9356f45..0000000
--- a/core/src/main/java/org/apache/mahout/math/map/package-info.java
+++ /dev/null
@@ -1,250 +0,0 @@
-/**
- * <HTML>
- * <BODY>
- * Automatically growing and shrinking maps holding objects or primitive
- * data types such as <tt>int</tt>, <tt>double</tt>, etc. Currently all maps are
- * based upon hashing.
- * <h2><a name="Overview"></a>1. Overview</h2>
- * <p>The map package offers flexible object oriented abstractions modelling automatically
- * resizing maps. It is designed to be scalable in terms of performance and memory
- * requirements.</p>
- * <p>Features include: </p>
- * <p></p>
- * <ul>
- * <li>Maps operating on objects as well as all primitive data types such as <code>int</code>,
- * <code>double</code>, etc.
- * </li>
- * <li>Compact representations</li>
- * <li>Support for quick access to associations</li>
- * <li>A number of general purpose map operations</li>
- * </ul>
- * <p>File-based I/O can be achieved through the standard Java built-in serialization
- * mechanism. All classes implement the {@link java.io.Serializable} interface.
- * However, the toolkit is entirely decoupled from advanced I/O. It provides data
- * structures and algorithms only.
- * <p> This toolkit borrows some terminology from the Javasoft <a
- * href="http://www.javasoft.com/products/jdk/1.2/docs/guide/collections/index.html">
- * Collections framework</a> written by Josh Bloch and introduced in JDK 1.2.
- * <h2>2. Introduction</h2>
- * <p>A map is an associative container that manages a set of (key,value) pairs.
- * It is useful for implementing a collection of one-to-one mappings. A (key,value)
- * pair is called an <i>association</i>. A value can be looked up up via its key.
- * Associations can quickly be set, removed and retrieved. They are stored in a
- * hashing structure based on the hash code of their keys, which is obtained by
- * using a hash function. </p>
- * <p> A map can, for example, contain <tt>Name-->Location</tt> associations like
- * <tt>{("Pete", "Geneva"), ("Steve", "Paris"), ("Robert", "New York")}</tt> used
- * in address books or <tt>Index-->Value</tt> mappings like <tt>{(0, 100), (3,
- * 1000), (100000, 70)}</tt> representing sparse lists or matrices. For example
- * this could mean at index 0 we have a value of 100, at index 3 we have a value
- * of 1000, at index 1000000 we have a value of 70, and at all other indexes we
- * have a value of, say, zero. Another example is a map of IP addresses to domain
- * names (DNS). Maps can also be useful to represent<i> multi sets</i>, that is,
- * sets where elements can occur more than once. For multi sets one would have
- * <tt>Value-->Frequency</tt> mappings like <tt>{(100, 1), (50, 1000), (101, 3))}</tt>
- * meaning element 100 occurs 1 time, element 50 occurs 1000 times, element 101
- * occurs 3 times. Further, maps can also manage <tt>ObjectIdentifier-->Object</tt>
- * mappings like <tt>{(12, obj1), (7, obj2), (10000, obj3), (9, obj4)}</tt> used
- * in Object Databases.
- * <p> A map cannot contain two or more <i>equal</i> keys; a key can map to at most
- * one value. However, more than one key can map to identical values. For primitive
- * data types "equality" of keys is defined as identity (operator <tt>==</tt>).
- * For maps using <tt>Object</tt> keys, the meaning of "equality" can be specified
- * by the user upon instance construction. It can either be defined to be identity
- * (operator <tt>==</tt>) or to be given by the method {@link java.lang.Object#equals(Object)}.
- * Associations of kind <tt>(AnyType,Object)</tt> can be of the form <tt>(AnyKey,null)
- * </tt>, i.e. values can be <tt>null</tt>.
- * <p> The classes of this package make no guarantees as to the order of the elements
- * returned by iterators; in particular, they do not guarantee that the order will
- * remain constant over time.
- * <h2></h2>
- * <h4>Copying</h4>
- * <p>
- * <p>Any map can be copied. A copy is <i>equal</i> to the original but entirely
- * independent of the original. So changes in the copy are not reflected in the
- * original, and vice-versa.
- * <h2>3. Package organization</h2>
- * <p>For most primitive data types and for objects there exists a separate map version.
- * All versions are just the same, except that they operate on different data types.
- * Colt includes two kinds of implementations for maps: The two different implementations
- * are tagged <b>Chained</b> and <b>Open</b>.
- * Note: Chained is no more included. Wherever it is mentioned it is of historic interest only.</p>
- * <ul>
- * <li><b>Chained</b> uses extendible separate chaining with chains holding unsorted
- * dynamically linked collision lists.
- * <li><b>Open</b> uses extendible open addressing with double hashing.
- * </ul>
- * <p>Class naming follows the schema <tt>&lt;Implementation&gt;&lt;KeyType&gt;&lt;ValueType&gt;HashMap</tt>.
- * For example, a {@link org.apache.mahout.math.map.OpenIntDoubleHashMap} holds <tt>(int-->double)</tt>
- * associations and is implemented with open addressing. A {@link org.apache.mahout.math.map.OpenIntObjectHashMap}
- * holds <tt>(int-->Object)</tt> associations and is implemented with open addressing.
- * </p>
- * <p>The classes for maps of a given (key,value) type are derived from a common
- * abstract base class tagged <tt>Abstract&lt;KeyType&gt;&lt;ValueType&gt;</tt><tt>Map</tt>.
- * For example, all maps operating on <tt>(int-->double)</tt> associations are
- * derived from {@link org.apache.mahout.math.map.AbstractIntDoubleMap}, which in turn is derived
- * from an abstract base class tying together all maps regardless of assocation
- * type, {@link org.apache.mahout.math.set.AbstractSet}. The abstract base classes provide skeleton
- * implementations for all but few methods. Experimental layouts (such as chaining,
- * open addressing, extensible hashing, red-black-trees, etc.) can easily be implemented
- * and inherit a rich set of functionality. Have a look at the javadoc <a href="package-tree.html">tree
- * view</a> to get the broad picture.</p>
- * <h2>4. Example usage</h2>
- * <TABLE>
- * <TD CLASS="PRE">
- * <PRE>
- * int[]    keys   = {0    , 3     , 100000, 9   };
- * double[] values = {100.0, 1000.0, 70.0  , 71.0};
- * AbstractIntDoubleMap map = new OpenIntDoubleHashMap();
- * // add several associations
- * for (int i=0; i &lt; keys.length; i++) map.put(keys[i], values[i]);
- * log.info("map="+map);
- * log.info("size="+map.size());
- * log.info(map.containsKey(3));
- * log.info("get(3)="+map.get(3));
- * log.info(map.containsKey(4));
- * log.info("get(4)="+map.get(4));
- * log.info(map.containsValue(71.0));
- * log.info("keyOf(71.0)="+map.keyOf(71.0));
- * // remove one association
- * map.removeKey(3);
- * log.info("\nmap="+map);
- * log.info(map.containsKey(3));
- * log.info("get(3)="+map.get(3));
- * log.info(map.containsValue(1000.0));
- * log.info("keyOf(1000.0)="+map.keyOf(1000.0));
- * // clear
- * map.clear();
- * log.info("\nmap="+map);
- * log.info("size="+map.size());
- * </PRE>
- * </TD>
- * </TABLE>
- * yields the following output
- * <TABLE>
- * <TD CLASS="PRE">
- * <PRE>
- * map=[0->100.0, 3->1000.0, 9->71.0, 100000->70.0]
- * size=4
- * true
- * get(3)=1000.0
- * false
- * get(4)=0.0
- * true
- * keyOf(71.0)=9
- * map=[0->100.0, 9->71.0, 100000->70.0]
- * false
- * get(3)=0.0
- * false
- * keyOf(1000.0)=-2147483648
- * map=[]
- * size=0
- * </PRE>
- * </TD>
- * </TABLE>
- * <h2> 5. Notes </h2>
- * <p>
- * Note that implementations are not synchronized.
- * <p>
- * Choosing efficient parameters for hash maps is not always easy.
- * However, since parameters determine efficiency and memory requirements, here is a quick guide how to choose them.
- * If your use case does not heavily operate on hash maps but uses them just because they provide
- * convenient functionality, you can safely skip this section.
- * For those of you who care, read on.
- * <p>
- * There are three parameters that can be customized upon map construction: <tt>initialCapacity</tt>,
- * <tt>minLoadFactor</tt> and <tt>maxLoadFactor</tt>.
- * The more memory one can afford, the faster a hash map.
- * The hash map's capacity is the maximum number of associations that can be added without needing to allocate new
- * internal memory.
- * A larger capacity means faster adding, searching and removing.
- * The <tt>initialCapacity</tt> corresponds to the capacity used upon instance construction.
- * <p>
- * The <tt>loadFactor</tt> of a hash map measures the degree of "fullness".
- * It is given by the number of assocations (<tt>size()</tt>)
- * divided by the hash map capacity <tt>(0.0 &lt;= loadFactor &lt;= 1.0)</tt>.
- * The more associations are added, the larger the loadFactor and the more hash map performance degrades.
- * Therefore, when the loadFactor exceeds a customizable threshold (<tt>maxLoadFactor</tt>), the hash map is
- * automatically grown.
- * In such a way performance degradation can be avoided.
- * Similarly, when the loadFactor falls below a customizable threshold (<tt>minLoadFactor</tt>), the hash map is
- * automatically shrinked.
- * In such a way excessive memory consumption can be avoided.
- * Automatic resizing (both growing and shrinking) obeys the following invariant:
- * <p>
- * <tt>capacity * minLoadFactor <= size() <= capacity * maxLoadFactor</tt>
- * <p> The term <tt>capacity * minLoadFactor</tt> is called the <i>low water mark</i>,
- * <tt>capacity * maxLoadFactor</tt> is called the <i>high water mark</i>. In other
- * words, the number of associations may vary within the water mark constraints.
- * When it goes out of range, the map is automatically resized and memory consumption
- * changes proportionally.
- * <ul>
- * <li>To tune for memory at the expense of performance, both increase <tt>minLoadFactor</tt> and
- * <tt>maxLoadFactor</tt>.
- * <li>To tune for performance at the expense of memory, both decrease <tt>minLoadFactor</tt> and
- * <tt>maxLoadFactor</tt>.
- * As as special case set <tt>minLoadFactor=0</tt> to avoid any automatic shrinking.
- * </ul>
- * Resizing large hash maps can be time consuming, <tt>O(size())</tt>, and should be avoided if possible (maintaining
- * primes is not the reason).
- * Unnecessary growing operations can be avoided if the number of associations is known before they are added, or can be
- * estimated.<p>
- * In such a case good parameters are as follows:
- * <p>
- * <i>For chaining:</i>
- * <br>Set the <tt>initialCapacity = 1.4*expectedSize</tt> or greater.
- * <br>Set the <tt>maxLoadFactor = 0.8</tt> or greater.
- * <p>
- * <i>For open addressing:</i>
- * <br>Set the <tt>initialCapacity = 2*expectedSize</tt> or greater. Alternatively call <tt>ensureCapacity(...)</tt>.
- * <br>Set the <tt>maxLoadFactor = 0.5</tt>.
- * <br>Never set <tt>maxLoadFactor &gt; 0.55</tt>; open addressing exponentially slows down beyond that point.
- * <p>
- * In this way the hash map will never need to grow and still stay fast.
- * It is never a good idea to set <tt>maxLoadFactor &lt; 0.1</tt>,
- * because the hash map would grow too often.
- * If it is entirelly unknown how many associations the application will use,
- * the default constructor should be used. The map will grow and shrink as needed.
- * <p>
- * <b>Comparision of chaining and open addressing</b>
- * <p> Chaining is faster than open addressing, when assuming unconstrained memory
- * consumption. Open addressing is more space efficient than chaining, because
- * it does not create entry objects but uses primitive arrays which are considerably
- * smaller. Entry objects consume significant amounts of memory compared to the
- * information they actually hold. Open addressing also poses no problems to the
- * garbage collector. In contrast, chaining can create millions of entry objects
- * which are linked; a nightmare for any garbage collector. In addition, entry
- * object creation is a bit slow. <br>
- * Therefore, with the same amount of memory, or even less memory, hash maps with
- * larger capacity can be maintained under open addressing, which yields smaller
- * loadFactors, which in turn keeps performance competitive with chaining. In our
- * benchmarks, using significantly less memory, open addressing usually is not
- * more than 1.2-1.5 times slower than chaining.
- * <p><b>Further readings</b>:
- * <br>Knuth D., The Art of Computer Programming: Searching and Sorting, 3rd ed.
- * <br>Griswold W., Townsend G., The Design and Implementation of Dynamic Hashing for Sets and Tables in Icon,
- * Software - Practice and Experience, Vol. 23(4), 351-367 (April 1993).
- * <br>Larson P., Dynamic hash tables, Comm. of the ACM, 31, (4), 1988.
- * <p>
- * <b>Performance:</b>
- * <p>
- * Time complexity:
- * <br>The classes offer <i>expected</i> time complexity <tt>O(1)</tt> (i.e. constant time) for the basic operations
- * <tt>put</tt>, <tt>get</tt>, <tt>removeKey</tt>, <tt>containsKey</tt> and <tt>size</tt>,
- * assuming the hash function disperses the elements properly among the buckets.
- * Otherwise, pathological cases, although highly improbable, can occur, degrading performance to <tt>O(N)</tt> in the
- * worst case.
- * Operations <tt>containsValue</tt> and <tt>keyOf</tt> are <tt>O(N)</tt>.
- * <p>
- * Memory requirements for <i>open addressing</i>:
- * <br>worst case: <tt>memory [bytes] = (1/minLoadFactor) * size() * (1 + sizeOf(key) + sizeOf(value))</tt>.
- * <br>best case: <tt>memory [bytes] = (1/maxLoadFactor) * size() * (1 + sizeOf(key) + sizeOf(value))</tt>.
- * Where <tt>sizeOf(int) = 4</tt>, <tt>sizeOf(double) = 8</tt>, <tt>sizeOf(Object) = 4</tt>, etc.
- * Thus, an <tt>OpenIntIntHashMap</tt> with minLoadFactor=0.25 and maxLoadFactor=0.5 and 1000000 associations uses
- * between 17 MB and 34 MB.
- * The same map with 1000 associations uses between 17 and 34 KB.
- * <p>
- * </BODY>
- * </HTML>
- */
-package org.apache.mahout.math.map;

http://git-wip-us.apache.org/repos/asf/mahout/blob/49ad8cb4/core/src/main/java/org/apache/mahout/math/package-info.java
----------------------------------------------------------------------
diff --git a/core/src/main/java/org/apache/mahout/math/package-info.java b/core/src/main/java/org/apache/mahout/math/package-info.java
deleted file mode 100644
index de664f0..0000000
--- a/core/src/main/java/org/apache/mahout/math/package-info.java
+++ /dev/null
@@ -1,4 +0,0 @@
-/**
- * Core base classes; Operations on primitive arrays such as sorting, partitioning and permuting.
- */
-package org.apache.mahout.math;

http://git-wip-us.apache.org/repos/asf/mahout/blob/49ad8cb4/core/src/main/java/org/apache/mahout/math/set/AbstractSet.java
----------------------------------------------------------------------
diff --git a/core/src/main/java/org/apache/mahout/math/set/AbstractSet.java b/core/src/main/java/org/apache/mahout/math/set/AbstractSet.java
deleted file mode 100644
index 7691420..0000000
--- a/core/src/main/java/org/apache/mahout/math/set/AbstractSet.java
+++ /dev/null
@@ -1,188 +0,0 @@
-/**
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-/*
-Copyright 1999 CERN - European Organization for Nuclear Research.
-Permission to use, copy, modify, distribute and sell this software and its documentation for any purpose 
-is hereby granted without fee, provided that the above copyright notice appear in all copies and 
-that both that copyright notice and this permission notice appear in supporting documentation. 
-CERN makes no representations about the suitability of this software for any purpose. 
-It is provided "as is" without expressed or implied warranty.
-*/
-package org.apache.mahout.math.set;
-
-import org.apache.mahout.math.PersistentObject;
-import org.apache.mahout.math.map.PrimeFinder;
-
-public abstract class AbstractSet extends PersistentObject {
-  //public static boolean debug = false; // debug only
-
-  /** The number of distinct associations in the map; its "size()". */
-  protected int distinct;
-
-  /**
-   * The table capacity c=table.length always satisfies the invariant <tt>c * minLoadFactor <= s <= c *
-   * maxLoadFactor</tt>, where s=size() is the number of associations currently contained. The term "c * minLoadFactor"
-   * is called the "lowWaterMark", "c * maxLoadFactor" is called the "highWaterMark". In other words, the table capacity
-   * (and proportionally the memory used by this class) oscillates within these constraints. The terms are precomputed
-   * and cached to avoid recalculating them each time put(..) or removeKey(...) is called.
-   */
-  protected int lowWaterMark;
-  protected int highWaterMark;
-
-  /** The minimum load factor for the hashtable. */
-  protected double minLoadFactor;
-
-  /** The maximum load factor for the hashtable. */
-  protected double maxLoadFactor;
-
-  // these are public access for unit tests.
-  public static final int DEFAULT_CAPACITY = 277;
-  public static final double DEFAULT_MIN_LOAD_FACTOR = 0.2;
-  public static final double DEFAULT_MAX_LOAD_FACTOR = 0.5;
-
-  /**
-   * Chooses a new prime table capacity optimized for growing that (approximately) satisfies the invariant <tt>c *
-   * minLoadFactor <= size <= c * maxLoadFactor</tt> and has at least one FREE slot for the given size.
-   */
-  protected int chooseGrowCapacity(int size, double minLoad, double maxLoad) {
-    return nextPrime(Math.max(size + 1, (int) ((4 * size / (3 * minLoad + maxLoad)))));
-  }
-
-  /**
-   * Returns new high water mark threshold based on current capacity and maxLoadFactor.
-   *
-   * @return int the new threshold.
-   */
-  protected int chooseHighWaterMark(int capacity, double maxLoad) {
-    return Math.min(capacity - 2, (int) (capacity * maxLoad)); //makes sure there is always at least one FREE slot
-  }
-
-  /**
-   * Returns new low water mark threshold based on current capacity and minLoadFactor.
-   *
-   * @return int the new threshold.
-   */
-  protected int chooseLowWaterMark(int capacity, double minLoad) {
-    return (int) (capacity * minLoad);
-  }
-
-  /**
-   * Chooses a new prime table capacity neither favoring shrinking nor growing, that (approximately) satisfies the
-   * invariant <tt>c * minLoadFactor <= size <= c * maxLoadFactor</tt> and has at least one FREE slot for the given
-   * size.
-   */
-  protected int chooseMeanCapacity(int size, double minLoad, double maxLoad) {
-    return nextPrime(Math.max(size + 1, (int) ((2 * size / (minLoad + maxLoad)))));
-  }
-
-  /**
-   * Chooses a new prime table capacity optimized for shrinking that (approximately) satisfies the invariant <tt>c *
-   * minLoadFactor <= size <= c * maxLoadFactor</tt> and has at least one FREE slot for the given size.
-   */
-  protected int chooseShrinkCapacity(int size, double minLoad, double maxLoad) {
-    return nextPrime(Math.max(size + 1, (int) ((4 * size / (minLoad + 3 * maxLoad)))));
-  }
-
-  /** Removes all (key,value) associations from the receiver. */
-  public abstract void clear();
-
-  /**
-   * Ensures that the receiver can hold at least the specified number of elements without needing to allocate new
-   * internal memory. If necessary, allocates new internal memory and increases the capacity of the receiver. <p> This
-   * method never need be called; it is for performance tuning only. Calling this method before <tt>put()</tt>ing a
-   * large number of associations boosts performance, because the receiver will grow only once instead of potentially
-   * many times. <p> <b>This default implementation does nothing.</b> Override this method if necessary.
-   *
-   * @param minCapacity the desired minimum capacity.
-   */
-  public void ensureCapacity(int minCapacity) {
-  }
-
-  /**
-   * Returns <tt>true</tt> if the receiver contains no (key,value) associations.
-   *
-   * @return <tt>true</tt> if the receiver contains no (key,value) associations.
-   */
-  public boolean isEmpty() {
-    return distinct == 0;
-  }
-
-  /**
-   * Returns a prime number which is <code>&gt;= desiredCapacity</code> and very close to <code>desiredCapacity</code>
-   * (within 11% if <code>desiredCapacity &gt;= 1000</code>).
-   *
-   * @param desiredCapacity the capacity desired by the user.
-   * @return the capacity which should be used for a hashtable.
-   */
-  protected int nextPrime(int desiredCapacity) {
-    return PrimeFinder.nextPrime(desiredCapacity);
-  }
-
-  /**
-   * Initializes the receiver. You will almost certainly need to override this method in subclasses to initialize the
-   * hash table.
-   *
-   * @param initialCapacity the initial capacity of the receiver.
-   * @param minLoadFactor   the minLoadFactor of the receiver.
-   * @param maxLoadFactor   the maxLoadFactor of the receiver.
-   * @throws IllegalArgumentException if <tt>initialCapacity < 0 || (minLoadFactor < 0.0 || minLoadFactor >= 1.0) ||
-   *                                  (maxLoadFactor <= 0.0 || maxLoadFactor >= 1.0) || (minLoadFactor >=
-   *                                  maxLoadFactor)</tt>.
-   */
-  protected void setUp(int initialCapacity, double minLoadFactor, double maxLoadFactor) {
-    if (initialCapacity < 0) {
-      throw new IllegalArgumentException("Initial Capacity must not be less than zero: " + initialCapacity);
-    }
-    if (minLoadFactor < 0.0 || minLoadFactor >= 1.0) {
-      throw new IllegalArgumentException("Illegal minLoadFactor: " + minLoadFactor);
-    }
-    if (maxLoadFactor <= 0.0 || maxLoadFactor >= 1.0) {
-      throw new IllegalArgumentException("Illegal maxLoadFactor: " + maxLoadFactor);
-    }
-    if (minLoadFactor >= maxLoadFactor) {
-      throw new IllegalArgumentException(
-          "Illegal minLoadFactor: " + minLoadFactor + " and maxLoadFactor: " + maxLoadFactor);
-    }
-  }
-
-  /**
-   * Returns the number of (key,value) associations currently contained.
-   *
-   * @return the number of (key,value) associations currently contained.
-   */
-  public int size() {
-    return distinct;
-  }
-
-  /**
-   * Trims the capacity of the receiver to be the receiver's current size. Releases any superfluous internal memory. An
-   * application can use this operation to minimize the storage of the receiver. <p> This default implementation does
-   * nothing. Override this method if necessary.
-   */
-  public void trimToSize() {
-  }
-  
-  protected static boolean equalsMindTheNull(Object a, Object b) {
-    if (a == null && b == null) {
-      return true;
-    }
-    if (a == null || b == null) {
-      return false;
-    }
-    return a.equals(b);
-  }
-}

http://git-wip-us.apache.org/repos/asf/mahout/blob/49ad8cb4/core/src/main/java/org/apache/mahout/math/set/HashUtils.java
----------------------------------------------------------------------
diff --git a/core/src/main/java/org/apache/mahout/math/set/HashUtils.java b/core/src/main/java/org/apache/mahout/math/set/HashUtils.java
deleted file mode 100644
index f5dfeb0..0000000
--- a/core/src/main/java/org/apache/mahout/math/set/HashUtils.java
+++ /dev/null
@@ -1,56 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.mahout.math.set;
-
-/**
- * Computes hashes of primitive values.  Providing these as statics allows the templated code
- * to compute hashes of sets.
- */
-public final class HashUtils {
-
-  private HashUtils() {
-  }
-
-  public static int hash(byte x) {
-    return x;
-  }
-
-  public static int hash(short x) {
-    return x;
-  }
-
-  public static int hash(char x) {
-    return x;
-  }
-
-  public static int hash(int x) {
-    return x;
-  }
-
-  public static int hash(float x) {
-    return Float.floatToIntBits(x) >>> 3 + Float.floatToIntBits((float) (Math.PI * x));
-  }
-
-  public static int hash(double x) {
-    return hash(17 * Double.doubleToLongBits(x));
-  }
-
-  public static int hash(long x) {
-    return (int) ((x * 11) >>> 32 ^ x);
-  }
-}